Case study · Role-scoped assistants

Meridian Operations · internal AI assistant

One chat surface. Zero expertise bleed.

Five distinct specialists, each with a whitelist of tool-bound skills. Picking the wrong specialist for a task is architecturally impossible — the tool list prevents it.

Specialists deployed in production

Tool-bound skills across the set

Cross-domain hallucinations measured

Built with

The problem

A general-purpose assistant answers every question with the same confidence — even the wrong ones

Before role scoping, one chat surface answered everything. A research question. A catalog edit. A delivery order. Same agent, same tools, same confidence on every reply. Users learned which answers to trust by listening to which ones had been wrong before. The cognitive load was a tax on the human, not a feature of the system.

Competence leakage
A general-purpose assistant given five types of question answers all five with the same confidence — even when three of those answers are wrong. Users learned to distrust the confident wrong answers more than the honest 'I don't know.'
Tool sprawl with no guardrails
Before gating, a single agent session had access to every tool. A research conversation could accidentally call a mutation. A catalog session could reach into submission history. Every session carried the risk of every session's worst outcome.
No accountability surface
When a general assistant gives a wrong answer, the failure is invisible — no record of which tool was called or which persona was active. The next person asks the same question and gets the same wrong answer.
Re-training the human, not the system
Without role scoping, users developed informal rules: 'don't ask about X before 9am,' 'always double-check the catalog answers.' The cognitive overhead shifted back to the human.

Generic chat · all tools available

5 unrelated asks · 1 chat

look up classification code for shipment 48217lookup

and update the catalog entry to matchmutation

also issue the delivery ordermutation

what's the threshold default for this account?lookup

file a bug — search is slow on this pagetracker

One agent · every tool · no scopeRisk = max

The pipeline

From inbox to verified record in one pass

Six steps from authentication to message routing. The tool whitelist is enforced at the API layer, not at the prompt layer.

01Authenticate

Token validated server-side

User opens the assistant panel; Azure AD token validated server-side before the specialist grid renders.

Azure AD

02Resolve

Per-user grants resolved

Database query against per-user specialist and skill grants returns a filtered set of (specialist, skill[]) pairs for this exact user.

03Render

Picker shows only granted specialists

Only specialists the user has grants for appear in the grid. No 'access denied' for the others — the option simply isn't there.

04Pin

Specialist pinned to conversation

Conversation row created in the database with character pinned. The pin persists for the lifetime of the conversation.

05Whitelist

Tool list filtered server-side

The Anthropic API call is constructed with exactly the picked specialist's bundled skills and no others. Out-of-scope tools are absent — not gated.

Claude

06Route

Messages routed through pinned persona

Every message goes through the pinned specialist's system prompt and the filtered tool list. The model cannot call tools outside the whitelist.

Claude

In-scope request

Specialist answers within its tool whitelist

Tool calls execute, results return, the user gets a domain-appropriate answer. The conversation stays open.

Out-of-scope request

Decline and redirect — no adjacent-tool substitution

If a user asks the Research Specialist to modify a catalog record, the model acknowledges and explains it doesn't have that tool. It does not try to use an adjacent tool as a substitute. It tells the user to open a new conversation with the right specialist. No mutation happens.

Validation review

Tool-selection confidence per specialist

Every week, production conversations are sampled against a 'did the specialist stay in lane' rubric. A specialist that answers outside its tool scope is a hallucination; a specialist that declines and redirects is a pass.

Field-level confidence

Pass 2 — Claude self-review

HighMediumLow

Research SpecialistRead-only research and reconciliation

99%High

Onboarding SpecialistNew-account setup from historical packets

96%High

Product SpecialistCatalog curation, one item per turn

97%High

Business AnalystVague request to filed work item

94%High

Exception-Handler SpecialistAutonomous-pipeline exception path

68%Low

Routed to human review. Broadest tool set in the system; occasionally calls a parsing tool when a lookup tool would have sufficed. Flagged for prompt refinement — the watch metric for the bundle.

4 of 5 fields cleared the 0.85 threshold

model: Tool-selection accuracy

The stack

Boring tech, glued together well

Each vendor handles what it's best at. Aisyst owns the orchestration layer in between.

Claude

Executes the specialist persona, routes tool calls, generates responses

Azure AD

User authentication and session validation before the specialist grid renders

PostgreSQL

Specialist grants, skill grants, conversation pinning, audit rows

Drizzle ORM

Type-safe queries against the grants and conversations tables

Microsoft Teams

Notification surface for specialist-triggered side effects and handoffs

Third-party logos are trademarks of their respective owners and appear here only to indicate integration.

Outcomes

What changed when scoping moved from prompt to API

The whitelist isn't a system-prompt instruction the model could ignore. It's the literal list of tools sent to the API. The model doesn't know about anything else.

Distinct specialists in production

Tool-bound skills across the set

Cross-domain hallucinations measured

Conversations bound to a specialist

Watch the cross-domain hallucination rate

The rate is currently zero. If it ever ticks above zero, the tool whitelist has a leak — either a tool was added to a specialist bundle that creates ambiguity with another specialist's scope, or a system prompt was edited in a way that invites out-of-scope reasoning. The metric is a leading indicator for prompt drift, not just for model quality.

Related cases

Read related case studies

Role-scoped AI assistant

Read-only research specialist

Reconciliation lookups in thirty seconds, not eight minutes. Never edits, only reports.

Read the case

Role-scoped AI assistant

Catalog curation, one item at a time

A curator that refuses bulk operations on purpose. Ninety-six percent classification accuracy with a full audit trail.

Read the case

Role-scoped AI assistant

Exception-path handler

Conversational front door for the autonomous pipeline's failures. Eighty seconds from operator click to issued document.

Read the case

If a general assistant is answering everything in your team with the same confidence, this pattern fits

Role-scoped expertise isn't 'better prompting.' It's a tool whitelist enforced at the API layer, a specialist pin enforced in the database, and a grant system that makes out-of-scope options invisible rather than denied.