CCA-F Cheat Sheet
17-section concept reference for the Claude Certified Architect — Foundations exam.
The one shift that fixes most pattern-identification confusion
Anthropic's framework uses two orthogonal dimensions, not one big tier list:
- Dimension A — Coordination shape (the workflow pattern): chain · routing · parallelization · hub-and-spoke (orchestrator-workers) · evaluator-optimizer.
- Dimension B — Worker sophistication: each worker can be a single LLM call or a full agent (LLM + tools + own loop + own context window).
"Multi-agent system" is not a separate pattern. It's a label for hub-and-spoke (Dimension A) implemented with full agents as the workers (Dimension B). Same shape, same orchestrator, same flow — just sophisticated workers.
The corrected hierarchy
| Workflow shape | Workers are LLM calls | Workers are full agents | Anti-pattern variant |
|---|---|---|---|
| Hub-and-spoke (orchestrator-workers) | "Hub-and-spoke workflow" — one orchestrator LLM call spawns N worker LLM calls, synthesizes. | "Multi-agent system" — one orchestrator agent spawns N worker agents, each with own tools and loop, synthesizes. | Peer-to-peer — no central orchestrator. Agents self-organize, debate, negotiate. |
| Single agent with subagents-as-tools | — | A primary agent treats sub-agents as callable tools in its own tool list. Sequential or on-demand, not parallel-fan-out. | Same peer-to-peer risk if sub-agents call each other instead of being called by the primary. |
The three concrete examples
| Example | Structure | Worker depth | Verdict |
|---|---|---|---|
| 1. Workflow hub-and-spoke | 1 orchestrator LLM call → spawns 3 worker LLM calls → orchestrator synthesizes | Each worker is one LLM inference. No tools, no loop. | Valid workflow. Predictable, low-cost. |
| 2. Multi-agent system | Same shape as #1: 1 orchestrator agent → spawns 3 worker agents → orchestrator synthesizes | Each worker is a full agent (LLM + tools + loop + own context window). | Valid. Same hub-and-spoke shape, agent-class workers. ~15× more tokens than #1. |
| 3. Peer-to-peer | 3 agents spun up, talk to each other directly, no central authority decides decomposition or synthesizes | Agents (full) | Anti-pattern. Drift, redundancy, infinite loops, no decision authority. |
The single tell for peer-to-peer in exam stems
If the stem describes agents that collaborate · negotiate · debate · discuss · collectively decide — that's peer-to-peer. Always wrong. The right answer is usually "hub-and-spoke with an orchestrator" or "primary agent with subagents-as-tools."
A "moderator agent" added to peer-to-peer does NOT fix it — moderation isn't the same as orchestration with decomposition + synthesis authority.
Three questions you should be able to answer after this section
| Question | Answer |
|---|---|
| Is "multi-agent system" a 4th tier above "single agent"? | No. It's hub-and-spoke (a workflow shape) with agents as workers. Two orthogonal dimensions, not a tier. |
| If a multi-agent system has a central orchestrator agent, how is it different from hub-and-spoke? | It's not. They're the same structure. "Multi-agent" just means the workers are full agents (not LLM calls). |
| Is a multi-agent system with an orchestrator a peer-to-peer anti-pattern? | No. Peer-to-peer means no central orchestrator. If there's an orchestrator with decomposition and synthesis authority, you're hub-and-spoke (valid). |
| What's the difference between hub-and-spoke at the multi-agent level vs subagents-as-tools? | Hub-and-spoke fans out to multiple workers in parallel and synthesizes their results. Subagents-as-tools means one primary agent calls sub-agents one at a time as if they were tools — no parallel fan-out, no synthesis step. |
| How is multi-agent different from a single full agent? | A single agent has one context window, one tool set, one loop. Multi-agent splits across multiple agents — useful when you need context isolation, genuine parallelism, or specialized tool sets per role. Need at least one of those three justifications. |
| Pattern | Who decides decomposition | When decided | Cardinality (handlers per input) |
When to use | Cost / Latency / Predictability | Anti-pattern flag |
|---|---|---|---|---|---|---|
| Single prompt | None — no decomposition | — | 1 LLM call, no loop | One-step tasks. Knowledge already in model. No tools needed. | Cheapest, fastest, most predictable. Start here. | Adding tools or loops "just in case." Climbing the ladder unnecessarily. |
| Prompt chaining | Designer | Design-time | All N steps, sequential | Task decomposes into a fixed, known sequence of steps. Each step needs prior step's output. | Low–medium cost. Predictable. | If steps need to branch based on input type → that's routing, not chain. |
| Routing | Classifier (LLM) | Runtime (per input) | Exactly 1 of N (mutually exclusive) | Different input types need different specialized handlers. Triage problem. | Low–medium. Classifier adds 1 hop. | If >1 handler fires → not routing (that's parallelization). |
| Parallelization (sectioning / voting / ensembling) |
Designer (or deterministic code) | Design-time | All N, concurrent | Sectioning: split input into chunks. Voting: same task N×, take majority. Ensembling: different prompts/models, fuse. | Medium cost. Lower wall-clock latency (concurrent). | Hidden sequential dependency (Section 2 needs Section 1's output) — that's actually a chain. |
| Hub-and-spoke (orchestrator + workers) ⭐ |
Orchestrator LLM | Runtime | Variable (orchestrator decides) | Sub-tasks NOT known in advance. Orchestrator plans dynamically per input, spawns workers, synthesizes. | Medium–high cost. Latency depends on parallel vs sequential worker calls. | Treating it as parallelization (sub-tasks aren't fixed). Or as full agent (no real planning loop). |
| Evaluator-optimizer | Designer (loop structure); Generator + Evaluator each LLM | Runtime iteration | 1 generator + 1 evaluator, looping | Clear quality criteria + iterative refinement helps. Output quality matters more than latency. | Medium–high (loop multiplies calls). Bounded iterations critical. | Evaluator paradox: same model family on both = shared blind spots. Diversify. |
| Full agent | Agent (LLM) | Runtime, dynamic | Variable, dynamic | Path is open-ended, varies per input. Needs LLM judgment + tools + loop. | High (5–20× single prompt). Hardest to test. | No stop conditions = runaway cost. Always bound: iterations, $, wall-clock, error threshold. |
| Multi-agent: orchestrator + parallel subagents | Lead orchestrator (LLM) + designer for sub-agent specs | Runtime decomposition | Variable | Independent sub-tasks needing context isolation, parallelism, or specialized tool sets. | High. ~15× tokens of single-agent. | (Valid pattern A) — but justify with at least 1 of: context isolation / parallelism / specialized tools. |
| Multi-agent: subagents-as-tools | Primary agent at runtime | Runtime, on-demand | Sequential or on-demand | Primary agent treats sub-agents as callable expertise. Specialized prompts/tools. | High but bounded by primary agent's tool-call decisions. | (Valid pattern B) — but each sub-agent must be a clean, focused tool. |
| Multi-agent: peer-to-peer | Agents among themselves | Runtime, undirected | Free | NEVER (anti-pattern) | Unpredictable, highest cost. | The anti-pattern. Tell-words in stem: "collaborate," "negotiate," "debate," "discuss," "collectively decide." Adding a "moderator" doesn't fix it. |
Pattern composition — identifying the PRIMARY pattern
When patterns nest, the PRIMARY pattern is the OUTERMOST coordination logic.
| Composition | PRIMARY |
|---|---|
| Routing on the outside → chains nested as routed-to handlers | Routing |
| Chain on the outside → routing nested inside one step | Chain |
| Hub-and-spoke outside → eval-optimizer nested inside each worker | Hub-and-spoke |
| Single agent → subagents-as-tools called sequentially | Single agent with tools (NOT hub-and-spoke) |
| Hierarchical hub-and-spoke → orchestrators delegating to mini-orchestrators | Hub-and-spoke |
| Question | If yes | Trade-offs |
|---|---|---|
| 1. Can a single prompt do it? | → Use a prompt | Highest predictability, lowest cost, easiest to test. |
| 2. Is the control flow fixed and known? | → Use a workflow (chain / route / parallel / orchestrator-workers / eval-opt) | Still predictable but more moving parts. Medium cost. |
| 3. Does the task need dynamic planning based on intermediate results? | → Use an agent (LLM in loop with tools) | Open-ended, hardest to test, 5–20× the cost of a single prompt. |
"Where does multi-agent fit on this ladder?"
It doesn't get its own rung. Multi-agent is a scaling choice within Tier 3 (Agent) — you use it when a single agent can't handle the task because you need context isolation, genuine parallelism, or specialized tool sets per role (need at least one of those three).
The coordination layer of a multi-agent system follows the orchestrator-workers (hub-and-spoke) workflow pattern from Tier 2. The workers are full agents from Tier 3. So a multi-agent system is a composition:
orchestrator-workers (workflow shape from Tier 2) + agents as workers (Tier 3) = multi-agent system
Without a central orchestrator, it becomes peer-to-peer — always wrong.
Note on terminology: some educators use a "4-tier" model (Single Prompt → Workflow → Single Agent → Multi-Agent). It's pedagogically clean but not Anthropic's framing. For the CCA-F exam, use the 3-tier ladder; treat multi-agent as a composition.
Stop conditions — every agent / eval-opt / retry loop needs ALL of these
- Max iterations
- Max cost / token budget
- Max wall-clock time
- Error threshold (N consecutive failures)
- Graceful failure path when any bound trips
Missing stop conditions = #1 cause of runaway-cost incidents.
| Responsibility | Right layer | Why / mechanism |
|---|---|---|
| Idempotency | Tool layer | Dedup key per logical operation; retries become no-ops. Agent can't be trusted with retries. |
| Audit logging | Tool / runtime layer | Wraps every tool invocation. Need hard guarantees, not soft prompts. |
| Authentication | Server layer (MCP server) | Never trust client to self-authenticate. Server validates credentials on every request. |
| Schema validation | API layer (tool use) + consumer layer | Defense in depth: API enforces structure; consumer validates semantics + edge cases. |
| Retry on transient errors | Application / infrastructure layer | Bounded, deterministic, observable. Wraps tool calls; agent only sees final result. |
| PII redaction | Database / tool layer | Don't let PII reach the model in the first place. |
| Format enforcement | API layer (tool use schema) | Free reliability — schema fails fast at the model boundary. |
| HITL approval | Application / orchestrator layer | Hard gate, not a soft instruction the model can rationalize past. |
| Compliance "never" rules | Hook + tool layer (and data layer) | Soft enforcement in CLAUDE.md / system prompt is necessary but insufficient. Deterministic layers are primary. |
| Fail-closed default for safety hooks | Hook layer | If hook crashes (segfault, non-zero exit), treat as "block" — never default to allow. |
| Need | Mechanism | Notes |
|---|---|---|
| Teach Claude project conventions, vocabulary, build commands | CLAUDE.md | Soft enforcement (model-dependent). For hard guarantees, use hooks or permissions deny. |
| Save and reuse a prompt with parameters | Slash command | User-invoked. /<name> at .claude/commands/<name>.md. |
| Delegate specialized task with isolated context | Subagent | LLM-style judgment. Own context window. Tools = INTERSECTION of (declared) ∩ (parent's permissions). |
| Run automatic deterministic action at a lifecycle event | Hook | Shell command, not an LLM call. PreToolUse / PostToolUse / SessionStart / Stop / Notification / UserPromptSubmit. |
| Apply rules only when certain paths are touched | .claude/rules/<name>.md with paths: YAML frontmatter | Path-scoped rules — the native mechanism for context-bloat reduction. |
| Restrict what tools a specific skill can call | allowed-tools: frontmatter on SKILL.md | Config-layer security. Cannot be overridden by user prompt (unlike soft instructions). |
CLAUDE.md hierarchy (4 levels + local override)
| Level | Location | Scope |
|---|---|---|
| Enterprise | OS-managed | All sessions on machine (org policy, often authoritative on security) |
| Project | <repo-root>/CLAUDE.md | Sessions inside the repo (committed) |
| Subdirectory | <repo>/<subdir>/CLAUDE.md | Loaded when files in that subdir are read — NOT at session start |
| User | ~/.claude/CLAUDE.md | Personal, all your projects |
| Project-local override | <repo>/CLAUDE.local.md | Gitignored. Personal overrides for this repo only. |
Permissions precedence
- Deny ALWAYS wins. Across all scopes. Pattern specificity does NOT override deny.
- Otherwise, more-specific scope wins (project > user > defaults).
- Subagent tools = INTERSECTION of (subagent's declared tools) ∩ (parent's permissions). Subagents narrow further — never expand.
- Headless mode (
-p): anything that would normally prompt the user is treated as denied. Pre-allow everything CI needs. - Fail-closed default for hooks: if a hook crashes, treat as block.
CI/CD permissioning by trust level
| Scenario | Permissions posture |
|---|---|
| Untrusted external PRs | Read-only allow (Read, Grep, Glob); deny everything else; --max-turns; sandboxed container |
| Trusted internal services | Allow tools needed for workflow; deny destructive; --max-turns; least-privilege |
| Developer workstation (interactive) | Broader allow; trust dev intent + interactive prompts |
MCP server scoping
| Config file | Scope | Best for |
|---|---|---|
<repo>/.mcp.json | Project / team — committed to repo | Shared team MCP servers (e.g., corporate GitLab) |
~/.claude.json | User-level / personal — never committed | Experimental, personal, role-specific tools |
| System-wide | /etc/claude/mcp.json — all users on machine | Org-wide enforcement |
Additional Claude Code mechanisms (from official exam guide)
/memorycommand — verify which CLAUDE.md / rules files are loaded into the session. Use to diagnose inconsistent behavior across machines (e.g., "user A's instructions aren't being applied" → check whether they're in~/.claude/CLAUDE.mdonly, not in the project repo).--resume <session-name>— named session resumption. Continue a specific prior investigation by name. Choose resume when prior context is mostly valid; choose new session with injected summary when prior tool results are stale (e.g., files changed since last analysis).- Explore subagent — named primitive for isolating verbose discovery output. Use when codebase is unfamiliar and you need to map structure before changes. Returns summaries to the main agent, preserving context budget. Different from plan mode: Explore is for unknown-codebase navigation; plan mode is for ambiguous-approach design.
@importsyntax in CLAUDE.md — pull in topic-specific standards files (e.g.,@.claude/rules/api-conventions.md) so the root CLAUDE.md stays small. Each package can selectively import only the standards relevant to it.
3 capability types — memorize "who decides invocation"
| Capability | Who decides invocation | Use for |
|---|---|---|
| Tools | The model (LLM-invoked at runtime) | Active, side-effecting actions. Read or write data, take effects. |
| Resources | The host / user (attached as passive context) | Read-only data the host pins into context (file, URL, document). |
| Prompts | The user (invokes via UI) | Reusable parameterized prompt templates / fragments. |
Transports
- stdio — local subprocess (most common for personal/local servers)
- HTTP / SSE — remote / multi-user
- Streamable HTTP — newer variant of HTTP/SSE
- MCP is NOT HTTP-only. Transport-flexible. Encryption depends on the transport (not guaranteed by MCP itself).
Auth patterns
| Scenario | Auth |
|---|---|
| Local stdio server, single-user | None (the OS user IS the auth) |
| Remote server, machine-to-machine | API key / token in env |
| Remote server, user-private data | OAuth — host walks user through auth at connect |
Tool design — 5 rules for descriptions
- Clear, descriptive name (action-verb_noun pattern)
- Disambiguating description when multiple similar tools exist
- Specify when NOT to use the tool
- Document each parameter individually
- Mark required fields explicitly + use
enum,pattern,min/maxfor constraints
Tool boundaries
| Tool count | Status |
|---|---|
| ≤ 10 | Generally safe |
| 10 – 25 | Workable but selection errors increase |
| 25 – 50+ | Typically too many — refactor (consolidate, subagent split, hierarchical routing, dynamic loading) |
| 100+ | Almost always wrong |
Tool result conventions
isError: false+results: []for a query that ran successfully and found nothing.isError: trueONLY for genuine system failure (timeout, auth, syntax error, permission denied).- Mislabeling empty-result as
isError: true→ agent hallucinates "the system is down." - Structured error metadata:
{ errorCategory, isRetryable, retries_attempted, description }— gives the agent enough signal to recover correctly.
Tool-use loop message ordering (the strict contract)
Agent SDK primitives (from official exam guide)
Tasktool — THE mechanism for spawning subagents in the Claude Agent SDK. A coordinator must haveallowedTools: ["Task"]in its config or it cannot spawn anything. Spawning parallel subagents = emit multiple Task tool calls in one coordinator response (not across separate turns).AgentDefinition— configures each subagent type with: description, system prompt, tool restrictions, optional model override. Each subagent receives its findings directly in its prompt — subagents do NOT inherit the coordinator's conversation history.tool_choiceoptions:"auto"— model may return text instead of calling a tool"any"— model must call a tool but can choose which (use when multiple extraction schemas exist and document type is unknown){"type":"tool","name":"..."}— force a specific tool first (e.g.,extract_metadatabefore enrichment steps)
- PostToolUse hook for data normalization — specific high-yield use case: when different MCP tools return Unix timestamps vs ISO 8601 vs numeric status codes, a PostToolUse hook normalizes them before the agent processes results. Distinct from tool-call interception hooks (which block policy-violating outgoing calls).
- MCP resources for content catalogs — expose database schemas, issue summaries, documentation hierarchies as resources (host-attached passive context) to reduce exploratory tool calls. The agent gets visibility into available data without having to query for it.
The three properties to memorize
- 50% cost savings versus the synchronous API.
- Up to 24-hour processing window with NO guaranteed latency SLA.
- NO multi-turn tool calling within a single request — cannot execute tools mid-request and feed results back.
Decision: synchronous vs batch
| Workflow | Use | Why |
|---|---|---|
| Blocking pre-merge CI check | Synchronous API | Developer is waiting; latency-sensitive. Batch's 24h window kills DX. |
| Overnight technical debt report | Batch API | Non-blocking; latency-tolerant; 50% cheaper. |
| Weekly compliance audit on 50k documents | Batch API | Non-blocking; high volume; cost matters. |
| Nightly test generation | Batch API | Non-blocking; queue overnight; results by morning. |
| Real-time user-facing extraction | Synchronous | Batch's lack of SLA breaks UX. |
| Agent loop with tool calls | Synchronous | Batch can't execute tools mid-request. |
custom_id — the request-response correlation key
- Each batch request includes a
custom_id; the response uses the same ID. This is how you correlate which output goes with which input — order is NOT guaranteed. - Partial-failure recovery: filter the response by
result.type == "errored", extract the failedcustom_ids, resubmit ONLY those (possibly with modifications like document chunking). - SLA calculation: if business SLA is 30 hours, batch needs ≤24h to process, so you have a 4-hour submission window for new batches.
| Flag | Purpose | When to use |
|---|---|---|
-p / --print | Non-interactive mode | Required in CI pipelines. Without it, Claude Code waits for interactive input and the job hangs indefinitely. Process the prompt, write output to stdout, exit. |
--output-format json | Machine-parseable output | Pair with --json-schema for structured findings to post as inline PR comments programmatically. |
--json-schema <file> | Schema-constrained output | Enforce a specific JSON shape so downstream automation can rely on field presence. |
--resume <session-name> | Continue a named session | Resume a long-running investigation across work sessions. |
Distractors that look right but are fabricated
- ✗
CLAUDE_HEADLESS=trueenvironment variable — does not exist. - ✗
--batchflag for non-interactive mode — does not exist. - ✗ Redirecting stdin from
/dev/null— doesn't address Claude Code's command syntax.
The single correct CI flag is -p.
CI hygiene patterns (from Scenario 5)
- Provide prior review findings in context when re-running on new commits — instruct Claude to report only new or still-unaddressed issues, avoiding duplicate PR comments.
- Provide existing test files in context so test generation avoids suggesting duplicate scenarios.
- CLAUDE.md as CI project context — document testing standards, fixture conventions, and review criteria so CI-invoked Claude has the same context as humans.
- Independent review instance for generated code — same session that generated retains reasoning bias; a second instance without that context catches more subtle issues.
| Symptom / Scenario | Strategy | Why |
|---|---|---|
| Tool returns are huge (e.g., 50K-token log dumps) | Compact at the source | Don't let raw blobs hit the model. Trim/summarize in the tool wrapper. |
| Long conversation, older turns no longer needed verbatim | Summarization | HISTORY is continuous; summarize older turns into a running brief. |
| Knowledge base much larger than context window | RAG | KNOWLEDGE is external, static-ish. Retrieve only what's relevant. |
| Live conversation, only recent matters | Sliding window | Drop oldest turns past a threshold. |
| Long exploratory subtask inside a larger task | Subagent isolation | Don't pollute parent context with subagent's intermediate reasoning. |
| Mid-session pivot to a different task in same window | /compact | Summarize-and-condense the current session; free window space; continue. |
| Want BOTH the current branch AND a divergent exploration | fork_session | Clone session at this point; explore both branches. |
| Multi-phase workflow where each phase's noise pollutes the next | Summarize-and-spawn | End phase: write key findings to a scratchpad; spawn fresh subagent seeded with that summary only. |
| Critical facts drift in 20+ turn conversation | Persistent "case facts" block at top of every prompt | Pin specifics (amounts, dates, IDs) to the highest-attention position. Conversation history is for narrative; the facts block is for precision. |
| Cross-session continuation | Persist via summary doc loaded at session start; or external knowledge store | API is stateless across calls — there is no automatic memory. |
The KNOWLEDGE vs HISTORY rule (most-tested distinction)
- KNOWLEDGE (external, static-ish, much larger than window) → RAG
- HISTORY (continuous thread, growing from the conversation itself) → Summarization
- Why RAG fails for conversations: implicit references ("that," "as I mentioned"), inconsistent retrieval across turns, decisions get lost in different wording, no coherent voice across snippets.
"Lost in the middle" — attention pattern
- LLMs weight content at start and end of context more than the middle.
- Place critical instructions in system message or near the end of the user message.
- For long aggregated inputs (many sub-agent outputs), front-load summaries; use explicit section headers to help navigation.
- Don't assume "it's in context, the model will use it."
Context-bloat sources, in order of typical magnitude
- Tool results — unbounded; trim at source.
- Conversation history — re-billed cumulatively each turn (quadratic).
- Tool definitions — constant overhead per turn.
When you see NOT / FALSE / LEAST / WEAKEST
- Identify the modifier. Underline it mentally: "I'm looking for the WRONG one / the EXCEPTION."
- For each option, ask: "Is this TRUE / CORRECT / APPROPRIATE / STRONG?" Not "is this the answer" — is the statement itself true.
- Mark each option silently: ✓ (true) or ✗ (false / wrong / exception).
- Pick the one with ✗. There should be exactly one.
If you don't end with exactly three ✓ and one ✗, you misread something — re-read. This is your highest-leverage habit on exam day.
| Distractor pattern | Why it's almost always wrong |
|---|---|
| "Switch to a bigger model" / "Use Opus" / "Upgrade to Sonnet" | Model choice rarely fixes an architectural problem. Test prompt → few-shot → tool use → validation → eval-opt FIRST. |
| "Multi-agent debate / collaborate / negotiate / discuss / collectively decide" | Peer-to-peer disguise. Anti-pattern. Right answer is usually hub-and-spoke or subagents-as-tools. |
| "Add 100 few-shot examples" | Past ~5–10, returns diminish. Real fix is structural (tool use schema, prefill, validation retry). |
| "Use a hook for [LLM-style judgment task]" | Hooks are deterministic shell. LLM judgment goes to a subagent. |
| "Add to CLAUDE.md: never do X" (for security / compliance / financial) | Soft enforcement. Necessary but never sufficient. Push to deterministic layer. |
| "Trust the model to follow this rule" (when the rule must be guaranteed) | Same as above. Probabilistic model + hard guarantee = wrong layer. |
| "Same lockdown for trusted as untrusted CI" | Untrusted needs read-only; trusted gets workflow-needed allow. Don't over-restrict trusted. |
| "More tools = more flexibility" | Vague flexibility = weakest justification. Tool count past ~25 degrades selection accuracy. |
| "Increase max iterations" (for runaway loop) | Treats symptom. The fix is bounded stop conditions + cost cap + observability. |
| "Continue retrying indefinitely" (transient failure) | Subagent-internal exponential backoff with bounded retries → propagate structured error. |
| "Increase the context window" (for hallucination / drift) | Bigger window doesn't fix lost-in-the-middle. Front-load summaries, scratchpad, summarize-and-spawn. |
| "Lower temperature to 0.0 for security analysis" | Temperature is per-agent-type. Synthesis needs higher; extraction needs lower; security analysis needs explicit criteria, not lower temp. |
| Scenario | Primary domains | Anchor concepts |
|---|---|---|
| 1. Customer Support Resolution Agent | D1 · D2 · D5 | Tools: get_customer, lookup_order, process_refund, escalate_to_human. 80%+ first-contact resolution target. Programmatic prerequisite gates · structured errors · escalation criteria (explicit + few-shot, NOT sentiment) · multi-issue decomposition · case-facts block. |
| 2. Code Generation with Claude Code | D3 · D5 | Custom slash commands · CLAUDE.md · plan mode vs direct execution · Explore subagent · --resume · fork_session · test-driven iteration · interview pattern. |
| 3. Multi-Agent Research System | D1 · D2 · D5 | Coordinator + web search + document analysis + synthesis + report agents. Hub-and-spoke · Task tool + allowedTools: ["Task"] · explicit context passing · task decomposition narrowness as a root cause · parallel Task calls · structured claim-source mappings. |
| 4. Developer Productivity with Claude | D2 · D3 · D1 | Built-in tools (Read, Write, Edit, Bash, Grep, Glob) · Grep for content · Glob for paths · Read+Write fallback when Edit fails · .claude/rules/ with paths: frontmatter · MCP server scoping. |
| 5. Claude Code for Continuous Integration | D3 · D4 | -p / --print · --output-format json + --json-schema · prior findings in context · CLAUDE.md for testing standards · independent review instance · per-file + cross-file passes for large PRs. |
| 6. Structured Data Extraction | D4 · D5 | tool_use with JSON schema · tool_choice options · nullable fields · "other" + detail pattern · validation-retry with specific feedback · Message Batches API for non-blocking · few-shot for varied formats · stratified sampling for review routing. |
Memorize
- 5 canonical workflow patterns + full agent + multi-agent variants (see Section 1 table).
- Decision ladder: prompt → workflow → agent (climb only as needed).
- 5 required stop conditions for every loop (Section 2).
- Multi-agent justification: need at least 1 of — context isolation, genuine parallelism, specialized toolsets. ~15× tokens of single-agent.
- Anti-pattern words in stems: collaborate / negotiate / debate / discuss / collectively decide → peer-to-peer.
- Pattern composition: PRIMARY = OUTERMOST coordination logic.
- Decomposition vs Orchestration: Decomposition = WHAT pieces. Orchestration = HOW they run.
- Task tool context passing: no shared memory between agents. Coordinator must inject context explicitly into Task tool prompts. No
shared_contextparameter exists. - Subagent permissions: INTERSECTION of (declared tools) ∩ (parent's permissions). Can never expand.
- Per-agent-type temperature: extraction / classification → low (0.0–0.3). Synthesis / strategy → higher (0.6–0.9). Fragmented synthesis = often a temperature fix, not a decomposition fix.
- Failure-location diagnostic: per-task outputs good but fragmented synthesis → SYNTHESIS failure (not decomposition).
- Robust error propagation: retry locally with exponential backoff → propagate structured error (with errorCategory + isRetryable + suggestion) only after exhaustion. Coordinator never blind-retries when subagent already did.
- Silent failure anti-pattern: subagent must NEVER return
{status: success, results: []}on a real failure. Always return typed error. - Provenance: subagents must include
claim + source_url + document_name + relevant_excerptas structured output. Synthesis can't reconstruct provenance after the fact. - Agentic loop control flow: continue iterating while
stop_reason == "tool_use", terminate whenstop_reason == "end_turn". NEVER parse natural-language phrases ("I have completed the task") for termination — that's a tell-tale anti-pattern. - Tool result message structure: the assistant's
tool_useblock stays in conversation history. The tool result goes in a NEW user-role message containing atool_resultcontent block that references thetool_use_id. Send the FULL history on the next call. Tasktool for subagent spawning — coordinator'sallowedToolsmust include"Task". Spawn parallel subagents by emitting multiple Task calls in ONE coordinator response.- Hooks for compliance enforcement: use PostToolUse hooks for data normalization (timestamp formats, status codes). Use tool-call interception hooks to block policy-violating actions (e.g., refunds > $500) and redirect to escalation. Hooks for compliance ≫ prompt-based "be careful" guidance.
- Programmatic prerequisite gates — block
process_refunduntilget_customerhas returned a verified customer ID. This is the canonical "code-level enforcement beats prompt-based ordering" example from the official sample exam (Q1). - Mandatory human override: explicit user request to talk to a human is an unconditional exit from the automated flow, regardless of capability or sentiment.
Memorize
- Tool components: name · description · input schema (JSON Schema) · implementation.
- 5 rules for descriptions (Section 5). Most-tested: specify when NOT to use; explicit superiority statement when alternatives exist.
- Schema reliability primitives:
type·enum·required·pattern·minimum/maximum. - Read vs Write tools: reads are generally safe / auto-approvable; writes need permissions, idempotency, often HITL.
- Tool boundaries: ≤10 safe · 10–25 workable · 25–50+ refactor · 100+ wrong.
- 5 strategies to reduce surface area: consolidate · subagents with focused tool sets · hierarchical routing tools · dynamic loading per phase · description curation.
- Tool naming hygiene: cross-server collisions → namespacing (
slack__search,notion__search). - 3 MCP roles: Host (Claude Desktop / Code) · Client (component inside host) · Server (separate process).
- 3 MCP capability types + who decides (Section 5).
- Transports: stdio (local) · HTTP/SSE (remote) · Streamable HTTP. MCP is transport-flexible — encryption depends on transport. nnect time.
- Tool result semantics:
- Empty success →
isError: false+results: []. - Genuine error →
isError: true+ structured metadata (errorCategory,isRetryable,retries_attempted,description). - Permission error →
errorCategory: "permission"+isRetryable: false+ escalation path inuserMessage. Never"transient"or"validation". - Business-rule violation →
errorCategory: "business"with customer-friendlyuserMessagefor the agent to relay.
- Empty success →
- Tool-use loop ordering (Section 5). Tool result MUST follow immediately in the next user-role message;
tool_use_idlinkage is what lets Claude map result → action. - Transient failure handling: exponential backoff inside the tool wrapper; agent sees only the final outcome. Don't propagate transient errors that the tool itself can recover from.
- Tool-result trimming: middleware should strip agent-irrelevant fields (raw DB timestamps, internal IDs, warehouse codes) before injecting tool results into context. Don't widen the context window; trim at the source.
- Built-in tool selection (Claude Code):
Grepfor content search ·Globfor filename pattern matching ·Read/Writefor full file ops ·Editfor targeted edits via unique text match. When Edit fails (non-unique anchor), fall back to Read+Write. - Tool count limits: 18+ tools per agent degrades selection reliability. Distribute across specialized subagents with 4-5 tools each. Provide scoped cross-role tools (e.g.,
verify_fact) for high-frequency needs while routing complex cases through the coordinator. - Replace generic with constrained: swap
fetch_urlforload_document(validates document URLs only). Splitsanalyze_documentintoextract_data_points+summarize_content+verify_claim_against_source. - MCP resources vs tools — when to choose which: resources are for passive content catalogs the agent reads (schemas, issue summaries, doc hierarchies). Tools are for active actions the agent invokes (queries, side-effecting writes).
- Choose community MCP servers over custom for standard integrations (Jira, GitHub). Reserve custom servers for team-specific workflows.
- "Use this tool instead of X" superiority statement in the description is the highest-leverage fix when Claude ignores your tool in favor of writing custom scripts.
Memorize
- 4 mechanism types: CLAUDE.md (soft) · Slash command (user-invoked) · Subagent (LLM judgment) · Hook (deterministic shell).
- CLAUDE.md is SOFT. For hard guarantees, use hooks or permissions deny.
- CLAUDE.md hierarchy: 4 levels +
CLAUDE.local.md(gitignored). Subdirectory CLAUDE.md loads when files in that subdir are read, not at session start. - Merging: all applicable files merged together. Conflict precedence: subdirectory > project > user > enterprise (enterprise often authoritative for security).
- Imports:
@path/to/filesyntax. Reorganizes but doesn't reduce context. - Path-scoped rules:
.claude/rules/<name>.mdwithpaths:YAML frontmatter. Native context-bloat reduction. - SKILL.md frontmatter:
allowed-tools:= config-layer security boundary. Cannot be overridden by user prompt. - Slash commands:
.claude/commands/<name>.md(project) or~/.claude/commands/<name>.md(user). Project-level wins on name collision. - Permission precedence: DENY ALWAYS WINS across all scopes. Specificity does not override.
- Subagent tools = INTERSECTION, never expand.
- Headless mode (
-p): non-interactive. Anything that would prompt = denied. Pre-allow everything CI needs. - Fail-closed hooks: crash → treat as block.
- CI by trust level: untrusted = read-only allow; trusted = workflow-needed allow + deny destructive.
- Hook events: PreToolUse · PostToolUse · UserPromptSubmit · Stop · SessionStart · Notification.
- Workflow mode selection in Claude Code: Direct execution (fix known, scope clear) · Plan mode (requirements ambiguous) · Explore subagent (codebase unfamiliar). Match mode to task certainty.
- MCP scoping:
.mcp.json(team-shared, committed) vs~/.claude.json(personal). Secrets via${ENV_VAR}expansion; never commit plaintext. - Custom skills vs CLAUDE.md: "Always-on" universal rules → CLAUDE.md. "Opt-in, occasional, specialized workflow" →
.claude/skills/<name>/SKILL.md(invoked explicitly). /memorycommand — verifies which memory files are loaded. Use to diagnose "instructions not being applied" issues (likely cause: instructions are in user-scoped~/.claude/CLAUDE.mdnot project-scoped).- Named session resumption:
--resume <session-name>— continue a specific investigation. Choose resume when prior context is mostly valid; start fresh with injected summary when prior tool results are stale. - Explore subagent — isolates verbose codebase-discovery output. Returns summaries to main agent. Use during multi-phase exploration to preserve context budget.
- Skill frontmatter options:
context: forkruns the skill in isolated sub-agent context (output doesn't pollute main session) ·allowed-tools:restricts tool access during skill execution ·argument-hintprompts developers for required params. - Plan mode vs direct execution vs Explore subagent: Plan mode = ambiguous-approach design (multi-file migrations, architectural decisions). Direct execution = clear-scope single-file changes. Explore = unfamiliar codebase navigation.
- /compact: mid-session context cleanup — summarize-and-condense; retain findings, discard noise.
- fork_session: branch the session for divergent exploration.
Memorize
- Reliability hierarchy (climb in order): better prompt → few-shot → tool use / prefill → validation + bounded retry → evaluator-optimizer.
- 4 levers of a good prompt: clarity · role · structure (XML / markdown) · output specification.
- System vs user message: role + persistent behavior + format rules in system. Per-turn input in user.
- XML tags are the Anthropic-recommended structural delimiter. Claude is trained to weight tagged sections.
- Positive > negative instructions. "Keep responses under 100 words" beats "don't be verbose." Prohibitions still anchor the prohibited concept.
- Few-shot: 3–5 sweet spot. > 10 → consider fine-tuning. Principles: diverse · representative · cover edge cases. Format consistency across all examples. Recency bias: the last example has stronger format influence.
- 4 ways to get structured output (most → least reliable):
- Tool use / function calling with JSON schema (API enforces).
- Prefilling the assistant turn (e.g.,
Assistant: {). - JSON schema in instructions (relies on adherence).
- Free-text "respond as JSON" — least reliable.
- Common output failure modes: markdown fence wrapping → prefill or tool use; preamble → prefill; trailing commentary → prefill / stop sequences; hallucinated fields → tool use schema; type errors → tool use enforces, otherwise validate.
- Two-part fix for variable-detail extraction: optional schema fields (
type: [string, null]) + explicit prompt instruction "extract if present, return null if absent, never infer." Neither alone is sufficient. - Validation retry — 3 principles: tell Claude WHY previous output was rejected; bound retries (max 3 + cost cap); have a fallback path for final failure.
- Field-level validation feedback: include field name + wrong value + correct format + constraint violated. Never "data invalid, try again."
- Evaluator paradox: evaluator must see what generator misses. Same model + similar prompts = shared blind spots. Diversify with different family, different framing, or external rule-based check.
- Calibrated confidence thresholds: raw model confidence scores aren't probabilities — calibrate against labeled validation set before using as a routing threshold.
tool_choiceoptions:"auto"(model may skip),"any"(must call some tool — useful when document type is unknown), forced selection{"type":"tool","name":"X"}(specific tool first, e.g., extract_metadata before enrichment).- "other" + detail string pattern for extensible enums — when the source has open-ended categories, define enum with known values + a string field for "other" detail. Prevents the model from forcing every value into an existing category.
- Nullable schema fields for optional source content. When a field may be absent, mark it nullable so the model returns
nullinstead of fabricating a value to satisfy a required field. - Validation-retry feedback loop: on failure, send a follow-up with (original document) + (failed extraction) + (specific validation error). Use Pydantic for schema-level validation. Track which retries succeed (format mismatches) vs which won't (information truly absent from source).
detected_patternfield in structured findings — tracks which code constructs trigger each finding. Use it to analyze false-positive patterns when developers dismiss results.- Explicit categorical criteria beat "be conservative." Vague filters like "only report high-confidence" fail. Concrete rules like "flag SQL injection ONLY when an unsanitized variable from an external HTTP request reaches a query string" precisely shape behavior.
- Multi-pass review architectures: per-file local analysis pass + cross-file integration pass. Use a second independent Claude instance for review (the generator's session retains reasoning context that biases self-review).
- Self-correction validation patterns: extract
calculated_totalalongsidestated_totalto flag arithmetic discrepancies. Addconflict_detectedbooleans for inconsistent source data. - Few-shot for behavioral override: when Claude has a default "be helpful by inferring" instinct and you need verbatim preservation, few-shot examples are the most direct override.
- Escalation calibration: for inverted-escalation behavior (escalating easy cases, attempting hard ones), the fix is explicit escalation criteria in system prompt + few-shot boundary examples — not sentiment models or capability classifiers.
- Aggregate accuracy is misleading. 97% overall can hide a 50%-error subsegment. Validate by segment; for safety-critical errors, route the segment to mandatory human review.
Memorize
- What's loaded every API call: system message · conversation history · all tool definitions · prior tool results · resources/attached files · current user message.
- Stateless across separate API calls. No memory between calls — memory is an architectural choice (resending history, external store, retrieval).
- Context-bloat ranking (Section 6): tool results > conversation history (quadratic) > tool definitions.
- KNOWLEDGE → RAG · HISTORY → Summarization (Section 6 alert card).
- Why RAG fails for conversations: implicit references ("that," "as I mentioned"), inconsistent retrieval across turns, decisions get lost in different wording, no coherent voice across snippets.
- Lost in the middle: place critical info at start/end; front-load summaries; explicit section headers.
- 5 long-context strategies (Section 6 table): compact at source · summarization · RAG · sliding window · subagent isolation.
- Mid-session pivot:
/compact. Divergent exploration:fork_session. Multi-phase: summarize-and-spawn between phases. - Scratchpad pattern for long agentic sessions: write key findings to a file at the end of each phase; new phase starts by reading the scratchpad. Converts passive (buried in history) to active (fresh in context).
- Persistent case-facts block at the top of every prompt for transactional precision in long conversations.
- Confidence calibration:
- Explicit labels:
{finding, confidence: high|medium|low, reasoning}. - Route by confidence: high → auto; medium → secondary check; low → human review.
- Multi-sample / consensus (~5× cost) — reserve for high-stakes.
- Raw model scores are NOT probabilities. Calibrate against labeled validation set.
- Explicit labels:
- 4 handoff boundaries: turn-to-turn · agent → subagent · session-to-session · agent → human (HITL). Each needs structured handoff.
- Agent → subagent handoff: pass goal + scope + required inputs + output schema. Return structured result, not full reasoning trail.
- Agent → human handoff (HITL): structured package — what trying to do, what's been done, why escalating, recommended action. Treat human as the next sub-agent.
- Defense in depth stack: better prompt → schema (tool use) → validation + bounded retry → confidence calibration + routing → idempotent tools + HITL on high-stakes → stop conditions + budget caps → observability.
- Recovery patterns: bounded retry with backoff · circuit breakers · graceful degradation (fall back to historical averages, flag as estimated) · idempotency for writes.
- Monitoring: task completion rate · error rate by category · cost per task · latency p50/p95/p99 · tool call distribution · stop-reason distribution · HITL escalation rate.
- Stratified random sampling for measuring error rates in high-confidence extractions. Validate accuracy by document type and field segment before reducing human review — aggregate 97% can hide a 30% segment failure.
- Field-level confidence scores calibrated using a labeled validation set. Raw model self-reported scores are NOT probabilities until calibrated.
- Structured handoff package when escalating to human: customer ID, root cause analysis, refund amount, recommended action. Human agents may not have access to the conversation transcript.
- Crash recovery via state manifests — each agent exports state to a known location; the coordinator loads a manifest on resume and injects relevant state into agent prompts.
- Coverage annotations in synthesis outputs — distinguish well-supported findings from gaps caused by unavailable sources. Helps the reader know what to trust.
- The reliability principle: predictable failure beats unpredictable success. Don't chase 100% — build bounded behavior, graceful failure, observability.
- Aggregate accuracy can mask catastrophic segment failure. Slice by segment; route safety-critical segments to mandatory human review until validated.
- Position-aware input — full mitigation: "lost in the middle" reliably affects MIDDLE positions; the model processes BEGINNING and END well. Place key findings at BOTH ends (summary at top, restate at bottom). Use explicit section headers. Don't only front-load — bookend.
- Multi-concern decomposition pattern: when a customer message has multiple distinct issues, decompose into separate items, investigate each in parallel using shared context, then synthesize a unified resolution. NOT sequential one-at-a-time; NOT a single mega-prompt jamming all concerns together.
- Render-by-content-type in synthesis: financial data → tables; news/timeline → prose; technical findings → structured lists. Forcing uniform format (e.g., bullets everywhere) destroys readability and erases signal that the content type itself carried.
- Temporal data — include dates in structured output: require
publication_date/data_collection_datein subagent outputs. Prevents temporal differences ("2023 had 18%; 2025 has 27%") being mis-flagged as contradictions during synthesis. Apparent conflicts are often just time-series. - Batch SLA math: 24-hour batch processing window means you must submit on a tighter cadence to hit downstream SLAs. Example: to guarantee a 30-hour SLA with the batch API, submit every 4 hours (4h queue wait + 24h processing + 2h buffer ≤ 30h). Frame batch as scheduled work, not a real-time call.
- PostToolUse trim pattern: raw tool outputs are often verbose (40+ fields from an order lookup). Use a PostToolUse hook to filter to the 3-5 fields the agent actually needs BEFORE the result enters context. Prevents context bloat from compounding turn-over-turn.
- Claim-source mappings preserved through synthesis: subagents must output structured
{claim, source_url, source_doc, excerpt}tuples. Summarization MUST preserve and merge these, never collapse to claim-only. Attribution loss is irreversible once it happens.
| Habit | Why |
|---|---|
| Read each stem twice | Numbers (volume, latency, cost), trust qualifiers (trusted / untrusted), and absolute words (always / never / must) are decisive. Negative framing (NOT / LEAST / WEAKEST / EXCEPT) flips polarity — miss it once and you lose the question. |
| Run the negative-framing 4-step protocol | (1) Mark the polarity word. (2) Restate stem in your head: "Find the wrong / weakest one." (3) Label each option +/-. (4) Pick the only minus. Section 9 has this. |
| Flag length-as-correctness instinct | Length parity is enforced in this prep folder but the real exam sometimes has a long correct option AND long distractors. Decide on content, not word count. |
| Match the scenario to its anchor concepts | Section 11 maps each of the 6 official scenarios to the concepts it loves to test. The exam picks 4 of 6 at random — recognize which one you're in within the first sentence. |
| Eliminate the obvious anti-patterns first | Peer-to-peer subagents · parsing natural language for loop termination · sentiment-based escalation · single iteration cap as only stop · ignoring errors silently · uniform output formatting regardless of content type · selecting first match heuristically when multiple candidates exist. Section 10 has the full distractor flag list. |
| Pick the layer before the answer | Section 3 layer placement: prompt vs schema vs hooks vs code. If the option puts a deterministic guarantee in a prompt instruction, it's almost always wrong — hooks or code, not prompts. |
| Budget 2 minutes / question; flag and move | 120 minutes / 60 questions = 2 min/question. If a question takes > 2 min, flag it, pick your best guess, move on. Come back at the end. Never lose 3 questions arguing with 1. |
| Last 10 minutes: review flagged only | Resist the urge to re-read every question. Trust your first read for unflagged ones. Use the timer reserve for the items you flagged because you genuinely couldn't decide. |
| Watch for fabricated parameters | If an option references a flag or parameter that doesn't appear in Anthropic docs (e.g., --enable-multi-agent, tool_choice: "strict"), it's a distractor. The exam doesn't invent fake APIs but bad options sometimes do. |
| Match what you've actually built | If a scenario describes something close to a system you've shipped, trust that intuition — Anthropic's exam questions reward production experience over textbook patterns. |
The two hardest exam mindset shifts
- You're being tested on "what's the BEST answer," not "what works." Multiple options may technically work. The exam wants you to pick the one that matches the scenario's actual constraints (volume, latency, team size, regulatory) — not the most powerful or sophisticated answer.
- The exam rewards humility about model autonomy. Wherever a question pits "trust the prompt" vs "enforce in code/hooks/schema," the answer leans toward deterministic enforcement. Prompt-based guidance has a non-zero failure rate — that's the recurring theme.
Built by John for the CCA-F exam · Companion files: stem-decoder · references