Skip to main content
    All case studies
    Case study · Role-scoped assistants
    Meridian Operations · internal AI assistant

    One chat surface. Zero expertise bleed.

    Five distinct specialists, each with a whitelist of tool-bound skills. Picking the wrong specialist for a task is architecturally impossible — the tool list prevents it.

    0
    Specialists deployed in production
    0
    Tool-bound skills across the set
    0
    Cross-domain hallucinations measured
    Built with
    Claude by AnthropicClaudeMicrosoft AzureAzurePostgreSQLPostgresDrizzle ORMDrizzleORMMicrosoft TeamsTeams
    The problem

    A general-purpose assistant answers every question with the same confidence — even the wrong ones

    Before role scoping, one chat surface answered everything. A research question. A catalog edit. A delivery order. Same agent, same tools, same confidence on every reply. Users learned which answers to trust by listening to which ones had been wrong before. The cognitive load was a tax on the human, not a feature of the system.

    • Competence leakage
      A general-purpose assistant given five types of question answers all five with the same confidence — even when three of those answers are wrong. Users learned to distrust the confident wrong answers more than the honest 'I don't know.'
    • Tool sprawl with no guardrails
      Before gating, a single agent session had access to every tool. A research conversation could accidentally call a mutation. A catalog session could reach into submission history. Every session carried the risk of every session's worst outcome.
    • No accountability surface
      When a general assistant gives a wrong answer, the failure is invisible — no record of which tool was called or which persona was active. The next person asks the same question and gets the same wrong answer.
    • Re-training the human, not the system
      Without role scoping, users developed informal rules: 'don't ask about X before 9am,' 'always double-check the catalog answers.' The cognitive overhead shifted back to the human.
    Generic chat · all tools available
    5 unrelated asks · 1 chat
    look up classification code for shipment 48217lookup
    and update the catalog entry to matchmutation
    also issue the delivery ordermutation
    what's the threshold default for this account?lookup
    file a bug — search is slow on this pagetracker
    One agent · every tool · no scopeRisk = max
    The pipeline

    From inbox to verified record in one pass

    Six steps from authentication to message routing. The tool whitelist is enforced at the API layer, not at the prompt layer.

    01Authenticate
    Token validated server-side
    User opens the assistant panel; Azure AD token validated server-side before the specialist grid renders.
    Azure AD
    02Resolve
    Per-user grants resolved
    Database query against per-user specialist and skill grants returns a filtered set of (specialist, skill[]) pairs for this exact user.
    03Render
    Picker shows only granted specialists
    Only specialists the user has grants for appear in the grid. No 'access denied' for the others — the option simply isn't there.
    04Pin
    Specialist pinned to conversation
    Conversation row created in the database with character pinned. The pin persists for the lifetime of the conversation.
    05Whitelist
    Tool list filtered server-side
    The Anthropic API call is constructed with exactly the picked specialist's bundled skills and no others. Out-of-scope tools are absent — not gated.
    Claude
    06Route
    Messages routed through pinned persona
    Every message goes through the pinned specialist's system prompt and the filtered tool list. The model cannot call tools outside the whitelist.
    Claude
    In-scope request
    Specialist answers within its tool whitelist
    Tool calls execute, results return, the user gets a domain-appropriate answer. The conversation stays open.
    Out-of-scope request
    Decline and redirect — no adjacent-tool substitution
    If a user asks the Research Specialist to modify a catalog record, the model acknowledges and explains it doesn't have that tool. It does not try to use an adjacent tool as a substitute. It tells the user to open a new conversation with the right specialist. No mutation happens.
    Validation review

    Tool-selection confidence per specialist

    Every week, production conversations are sampled against a 'did the specialist stay in lane' rubric. A specialist that answers outside its tool scope is a hallucination; a specialist that declines and redirects is a pass.

    Field-level confidence
    Pass 2 — Claude self-review
    Research SpecialistRead-only research and reconciliation
    99%High
    Onboarding SpecialistNew-account setup from historical packets
    96%High
    Product SpecialistCatalog curation, one item per turn
    97%High
    Business AnalystVague request to filed work item
    94%High
    Exception-Handler SpecialistAutonomous-pipeline exception path
    68%Low
    Routed to human review. Broadest tool set in the system; occasionally calls a parsing tool when a lookup tool would have sufficed. Flagged for prompt refinement — the watch metric for the bundle.
    4 of 5 fields cleared the 0.85 threshold
    model: Tool-selection accuracy
    The stack

    Boring tech, glued together well

    Each vendor handles what it's best at. Aisyst owns the orchestration layer in between.

    Claude by AnthropicClaude
    Claude
    Executes the specialist persona, routes tool calls, generates responses
    Microsoft AzureAzure
    Azure AD
    User authentication and session validation before the specialist grid renders
    PostgreSQLPostgres
    PostgreSQL
    Specialist grants, skill grants, conversation pinning, audit rows
    Drizzle ORMDrizzleORM
    Drizzle ORM
    Type-safe queries against the grants and conversations tables
    Microsoft TeamsTeams
    Microsoft Teams
    Notification surface for specialist-triggered side effects and handoffs

    Third-party logos are trademarks of their respective owners and appear here only to indicate integration.

    Outcomes

    What changed when scoping moved from prompt to API

    The whitelist isn't a system-prompt instruction the model could ignore. It's the literal list of tools sent to the API. The model doesn't know about anything else.

    0
    Distinct specialists in production
    0
    Tool-bound skills across the set
    0
    Cross-domain hallucinations measured
    0%
    Conversations bound to a specialist
    Watch the cross-domain hallucination rate

    The rate is currently zero. If it ever ticks above zero, the tool whitelist has a leak — either a tool was added to a specialist bundle that creates ambiguity with another specialist's scope, or a system prompt was edited in a way that invites out-of-scope reasoning. The metric is a leading indicator for prompt drift, not just for model quality.

    If a general assistant is answering everything in your team with the same confidence, this pattern fits

    Role-scoped expertise isn't 'better prompting.' It's a tool whitelist enforced at the API layer, a specialist pin enforced in the database, and a grant system that makes out-of-scope options invisible rather than denied.