Contents

CCA-F Cheat Sheet

17-section concept reference for the Claude Certified Architect — Foundations exam.

CCA-F Cheat Sheet — Concept Reference

Patterns · Mechanisms · Layer placement · 5-domain memorables · Distractor flags

Print-friendly · single page

⚡ For last-minute / pre-question scanning (stem-keyword → answer pattern shortcuts), see the companion file: stem-decoder. The two are complementary — this one builds the mental model, the decoder gives instant pattern-matches for stem language.

0. Mental model — read this FIRST before the pattern selector

If you confuse "hub-and-spoke" with "multi-agent," this section is the unlock.

The one shift that fixes most pattern-identification confusion

Anthropic's framework uses two orthogonal dimensions, not one big tier list:

Dimension A — Coordination shape (the workflow pattern): chain · routing · parallelization · hub-and-spoke (orchestrator-workers) · evaluator-optimizer.
Dimension B — Worker sophistication: each worker can be a single LLM call or a full agent (LLM + tools + own loop + own context window).

"Multi-agent system" is not a separate pattern. It's a label for hub-and-spoke (Dimension A) implemented with full agents as the workers (Dimension B). Same shape, same orchestrator, same flow — just sophisticated workers.

The corrected hierarchy

Workflow shape	Workers are LLM calls	Workers are full agents	Anti-pattern variant
Hub-and-spoke (orchestrator-workers)	"Hub-and-spoke workflow" — one orchestrator LLM call spawns N worker LLM calls, synthesizes.	"Multi-agent system" — one orchestrator agent spawns N worker agents, each with own tools and loop, synthesizes.	Peer-to-peer — no central orchestrator. Agents self-organize, debate, negotiate.
Single agent with subagents-as-tools	—	A primary agent treats sub-agents as callable tools in its own tool list. Sequential or on-demand, not parallel-fan-out.	Same peer-to-peer risk if sub-agents call each other instead of being called by the primary.

The three concrete examples

Example	Structure	Worker depth	Verdict
1. Workflow hub-and-spoke	1 orchestrator LLM call → spawns 3 worker LLM calls → orchestrator synthesizes	Each worker is one LLM inference. No tools, no loop.	Valid workflow. Predictable, low-cost.
2. Multi-agent system	Same shape as #1: 1 orchestrator agent → spawns 3 worker agents → orchestrator synthesizes	Each worker is a full agent (LLM + tools + loop + own context window).	Valid. Same hub-and-spoke shape, agent-class workers. ~15× more tokens than #1.
3. Peer-to-peer	3 agents spun up, talk to each other directly, no central authority decides decomposition or synthesizes	Agents (full)	Anti-pattern. Drift, redundancy, infinite loops, no decision authority.

The single tell for peer-to-peer in exam stems

If the stem describes agents that collaborate · negotiate · debate · discuss · collectively decide — that's peer-to-peer. Always wrong. The right answer is usually "hub-and-spoke with an orchestrator" or "primary agent with subagents-as-tools."

A "moderator agent" added to peer-to-peer does NOT fix it — moderation isn't the same as orchestration with decomposition + synthesis authority.

Three questions you should be able to answer after this section

Question	Answer
Is "multi-agent system" a 4th tier above "single agent"?	No. It's hub-and-spoke (a workflow shape) with agents as workers. Two orthogonal dimensions, not a tier.
If a multi-agent system has a central orchestrator agent, how is it different from hub-and-spoke?	It's not. They're the same structure. "Multi-agent" just means the workers are full agents (not LLM calls).
Is a multi-agent system with an orchestrator a peer-to-peer anti-pattern?	No. Peer-to-peer means no central orchestrator. If there's an orchestrator with decomposition and synthesis authority, you're hub-and-spoke (valid).
What's the difference between hub-and-spoke at the multi-agent level vs subagents-as-tools?	Hub-and-spoke fans out to multiple workers in parallel and synthesizes their results. Subagents-as-tools means one primary agent calls sub-agents one at a time as if they were tools — no parallel fan-out, no synthesis step.
How is multi-agent different from a single full agent?	A single agent has one context window, one tool set, one loop. Multi-agent splits across multiple agents — useful when you need context isolation, genuine parallelism, or specialized tool sets per role. Need at least one of those three justifications.

1. Pattern selector — every agent / workflow / multi-agent type

Read Section 0 first if "hub-and-spoke" and "multi-agent" feel like competing categories — they're not. This table is a flat list for identification. The hierarchy that organizes them is in Section 0.

Pattern	Who decides decomposition	When decided	Cardinality (handlers per input)	When to use	Cost / Latency / Predictability	Anti-pattern flag
Single prompt	None — no decomposition	—	1 LLM call, no loop	One-step tasks. Knowledge already in model. No tools needed.	Cheapest, fastest, most predictable. Start here.	Adding tools or loops "just in case." Climbing the ladder unnecessarily.
Prompt chaining	Designer	Design-time	All N steps, sequential	Task decomposes into a fixed, known sequence of steps. Each step needs prior step's output.	Low–medium cost. Predictable.	If steps need to branch based on input type → that's routing, not chain.
Routing	Classifier (LLM)	Runtime (per input)	Exactly 1 of N (mutually exclusive)	Different input types need different specialized handlers. Triage problem.	Low–medium. Classifier adds 1 hop.	If >1 handler fires → not routing (that's parallelization).
Parallelization (sectioning / voting / ensembling)	Designer (or deterministic code)	Design-time	All N, concurrent	Sectioning: split input into chunks. Voting: same task N×, take majority. Ensembling: different prompts/models, fuse.	Medium cost. Lower wall-clock latency (concurrent).	Hidden sequential dependency (Section 2 needs Section 1's output) — that's actually a chain.
Hub-and-spoke (orchestrator + workers) ⭐	Orchestrator LLM	Runtime	Variable (orchestrator decides)	Sub-tasks NOT known in advance. Orchestrator plans dynamically per input, spawns workers, synthesizes.	Medium–high cost. Latency depends on parallel vs sequential worker calls.	Treating it as parallelization (sub-tasks aren't fixed). Or as full agent (no real planning loop).
Evaluator-optimizer	Designer (loop structure); Generator + Evaluator each LLM	Runtime iteration	1 generator + 1 evaluator, looping	Clear quality criteria + iterative refinement helps. Output quality matters more than latency.	Medium–high (loop multiplies calls). Bounded iterations critical.	Evaluator paradox: same model family on both = shared blind spots. Diversify.
Full agent	Agent (LLM)	Runtime, dynamic	Variable, dynamic	Path is open-ended, varies per input. Needs LLM judgment + tools + loop.	High (5–20× single prompt). Hardest to test.	No stop conditions = runaway cost. Always bound: iterations, $, wall-clock, error threshold.
Multi-agent: orchestrator + parallel subagents	Lead orchestrator (LLM) + designer for sub-agent specs	Runtime decomposition	Variable	Independent sub-tasks needing context isolation, parallelism, or specialized tool sets.	High. ~15× tokens of single-agent.	(Valid pattern A) — but justify with at least 1 of: context isolation / parallelism / specialized tools.
Multi-agent: subagents-as-tools	Primary agent at runtime	Runtime, on-demand	Sequential or on-demand	Primary agent treats sub-agents as callable expertise. Specialized prompts/tools.	High but bounded by primary agent's tool-call decisions.	(Valid pattern B) — but each sub-agent must be a clean, focused tool.
Multi-agent: peer-to-peer	Agents among themselves	Runtime, undirected	Free	NEVER (anti-pattern)	Unpredictable, highest cost.	The anti-pattern. Tell-words in stem: "collaborate," "negotiate," "debate," "discuss," "collectively decide." Adding a "moderator" doesn't fix it.

Pattern composition — identifying the PRIMARY pattern

When patterns nest, the PRIMARY pattern is the OUTERMOST coordination logic.

Composition	PRIMARY
Routing on the outside → chains nested as routed-to handlers	Routing
Chain on the outside → routing nested inside one step	Chain
Hub-and-spoke outside → eval-optimizer nested inside each worker	Hub-and-spoke
Single agent → subagents-as-tools called sequentially	Single agent with tools (NOT hub-and-spoke)
Hierarchical hub-and-spoke → orchestrators delegating to mini-orchestrators	Hub-and-spoke

2. Decision ladder — prompt vs workflow vs agent

Climb in order; stop at the first "yes." Each rung adds flexibility but costs predictability, $, latency, testability. Multi-agent is not a 4th rung — see the note below.

Question	If yes	Trade-offs
1. Can a single prompt do it?	→ Use a prompt	Highest predictability, lowest cost, easiest to test.
2. Is the control flow fixed and known?	→ Use a workflow (chain / route / parallel / orchestrator-workers / eval-opt)	Still predictable but more moving parts. Medium cost.
3. Does the task need dynamic planning based on intermediate results?	→ Use an agent (LLM in loop with tools)	Open-ended, hardest to test, 5–20× the cost of a single prompt.

"Where does multi-agent fit on this ladder?"

It doesn't get its own rung. Multi-agent is a scaling choice within Tier 3 (Agent) — you use it when a single agent can't handle the task because you need context isolation, genuine parallelism, or specialized tool sets per role (need at least one of those three).

The coordination layer of a multi-agent system follows the orchestrator-workers (hub-and-spoke) workflow pattern from Tier 2. The workers are full agents from Tier 3. So a multi-agent system is a composition:

orchestrator-workers (workflow shape from Tier 2) + agents as workers (Tier 3) = multi-agent system

Without a central orchestrator, it becomes peer-to-peer — always wrong.

Note on terminology: some educators use a "4-tier" model (Single Prompt → Workflow → Single Agent → Multi-Agent). It's pedagogically clean but not Anthropic's framing. For the CCA-F exam, use the 3-tier ladder; treat multi-agent as a composition.

Stop conditions — every agent / eval-opt / retry loop needs ALL of these

Max iterations
Max cost / token budget
Max wall-clock time
Error threshold (N consecutive failures)
Graceful failure path when any bound trips

Missing stop conditions = #1 cause of runaway-cost incidents.

3. Layer placement — where does each responsibility belong?

High-yield: at least 4-5 questions on the exam ask "where does X belong?" The agent layer is probabilistic; push to deterministic layers wherever possible.

Responsibility	Right layer	Why / mechanism
Idempotency	Tool layer	Dedup key per logical operation; retries become no-ops. Agent can't be trusted with retries.
Audit logging	Tool / runtime layer	Wraps every tool invocation. Need hard guarantees, not soft prompts.
Authentication	Server layer (MCP server)	Never trust client to self-authenticate. Server validates credentials on every request.
Schema validation	API layer (tool use) + consumer layer	Defense in depth: API enforces structure; consumer validates semantics + edge cases.
Retry on transient errors	Application / infrastructure layer	Bounded, deterministic, observable. Wraps tool calls; agent only sees final result.
PII redaction	Database / tool layer	Don't let PII reach the model in the first place.
Format enforcement	API layer (tool use schema)	Free reliability — schema fails fast at the model boundary.
HITL approval	Application / orchestrator layer	Hard gate, not a soft instruction the model can rationalize past.
Compliance "never" rules	Hook + tool layer (and data layer)	Soft enforcement in CLAUDE.md / system prompt is necessary but insufficient. Deterministic layers are primary.
Fail-closed default for safety hooks	Hook layer	If hook crashes (segfault, non-zero exit), treat as "block" — never default to allow.

4. Claude Code mechanism map

Need	Mechanism	Notes
Teach Claude project conventions, vocabulary, build commands	CLAUDE.md	Soft enforcement (model-dependent). For hard guarantees, use hooks or permissions deny.
Save and reuse a prompt with parameters	Slash command	User-invoked. `/<name>` at `.claude/commands/<name>.md`.
Delegate specialized task with isolated context	Subagent	LLM-style judgment. Own context window. Tools = INTERSECTION of (declared) ∩ (parent's permissions).
Run automatic deterministic action at a lifecycle event	Hook	Shell command, not an LLM call. PreToolUse / PostToolUse / SessionStart / Stop / Notification / UserPromptSubmit.
Apply rules only when certain paths are touched	`.claude/rules/<name>.md` with `paths:` YAML frontmatter	Path-scoped rules — the native mechanism for context-bloat reduction.
Restrict what tools a specific skill can call	`allowed-tools:` frontmatter on SKILL.md	Config-layer security. Cannot be overridden by user prompt (unlike soft instructions).

CLAUDE.md hierarchy (4 levels + local override)

Level	Location	Scope
Enterprise	OS-managed	All sessions on machine (org policy, often authoritative on security)
Project	`<repo-root>/CLAUDE.md`	Sessions inside the repo (committed)
Subdirectory	`<repo>/<subdir>/CLAUDE.md`	Loaded when files in that subdir are read — NOT at session start
User	`~/.claude/CLAUDE.md`	Personal, all your projects
Project-local override	`<repo>/CLAUDE.local.md`	Gitignored. Personal overrides for this repo only.

Permissions precedence

Deny ALWAYS wins. Across all scopes. Pattern specificity does NOT override deny.
Otherwise, more-specific scope wins (project > user > defaults).
Subagent tools = INTERSECTION of (subagent's declared tools) ∩ (parent's permissions). Subagents narrow further — never expand.
Headless mode (-p): anything that would normally prompt the user is treated as denied. Pre-allow everything CI needs.
Fail-closed default for hooks: if a hook crashes, treat as block.

CI/CD permissioning by trust level

Scenario	Permissions posture
Untrusted external PRs	Read-only allow (Read, Grep, Glob); deny everything else; `--max-turns`; sandboxed container
Trusted internal services	Allow tools needed for workflow; deny destructive; `--max-turns`; least-privilege
Developer workstation (interactive)	Broader allow; trust dev intent + interactive prompts

MCP server scoping

Config file	Scope	Best for
`<repo>/.mcp.json`	Project / team — committed to repo	Shared team MCP servers (e.g., corporate GitLab)
`~/.claude.json`	User-level / personal — never committed	Experimental, personal, role-specific tools
System-wide	`/etc/claude/mcp.json` — all users on machine	Org-wide enforcement

Secrets in .mcp.json: use ${ENV_VAR} expansion — repo stores the shape, each developer's local environment supplies the actual credential. Never commit plaintext secrets. Never use shared service accounts.

Additional Claude Code mechanisms (from official exam guide)

/memory command — verify which CLAUDE.md / rules files are loaded into the session. Use to diagnose inconsistent behavior across machines (e.g., "user A's instructions aren't being applied" → check whether they're in ~/.claude/CLAUDE.md only, not in the project repo).
--resume <session-name> — named session resumption. Continue a specific prior investigation by name. Choose resume when prior context is mostly valid; choose new session with injected summary when prior tool results are stale (e.g., files changed since last analysis).
Explore subagent — named primitive for isolating verbose discovery output. Use when codebase is unfamiliar and you need to map structure before changes. Returns summaries to the main agent, preserving context budget. Different from plan mode: Explore is for unknown-codebase navigation; plan mode is for ambiguous-approach design.
@import syntax in CLAUDE.md — pull in topic-specific standards files (e.g., @.claude/rules/api-conventions.md) so the root CLAUDE.md stays small. Each package can selectively import only the standards relevant to it.

5. MCP cheat — capabilities, transports, design

3 capability types — memorize "who decides invocation"

Capability	Who decides invocation	Use for
Tools	The model (LLM-invoked at runtime)	Active, side-effecting actions. Read or write data, take effects.
Resources	The host / user (attached as passive context)	Read-only data the host pins into context (file, URL, document).
Prompts	The user (invokes via UI)	Reusable parameterized prompt templates / fragments.

Transports

stdio — local subprocess (most common for personal/local servers)
HTTP / SSE — remote / multi-user
Streamable HTTP — newer variant of HTTP/SSE
MCP is NOT HTTP-only. Transport-flexible. Encryption depends on the transport (not guaranteed by MCP itself).

Auth patterns

Scenario	Auth
Local stdio server, single-user	None (the OS user IS the auth)
Remote server, machine-to-machine	API key / token in env
Remote server, user-private data	OAuth — host walks user through auth at connect

Public MCP servers exposing user-private data MUST have auth. No exceptions.

Tool design — 5 rules for descriptions

Clear, descriptive name (action-verb_noun pattern)
Disambiguating description when multiple similar tools exist
Specify when NOT to use the tool
Document each parameter individually
Mark required fields explicitly + use enum, pattern, min/max for constraints

Tool boundaries

Tool count	Status
≤ 10	Generally safe
10 – 25	Workable but selection errors increase
25 – 50+	Typically too many — refactor (consolidate, subagent split, hierarchical routing, dynamic loading)
100+	Almost always wrong

Tool result conventions

isError: false + results: [] for a query that ran successfully and found nothing.
isError: true ONLY for genuine system failure (timeout, auth, syntax error, permission denied).
Mislabeling empty-result as isError: true → agent hallucinates "the system is down."
Structured error metadata: { errorCategory, isRetryable, retries_attempted, description } — gives the agent enough signal to recover correctly.

Tool-use loop message ordering (the strict contract)

The conversation history MUST follow: [user] → [assistant: tool_use] → [user: tool_result] → [assistant]. Skipping the assistant tool_use block before appending the user tool_result breaks the alternating pattern and the API returns 400 Bad Request. The tool_use_id must match the corresponding tool_result.tool_use_id.

Agent SDK primitives (from official exam guide)

Task tool — THE mechanism for spawning subagents in the Claude Agent SDK. A coordinator must have allowedTools: ["Task"] in its config or it cannot spawn anything. Spawning parallel subagents = emit multiple Task tool calls in one coordinator response (not across separate turns).
AgentDefinition — configures each subagent type with: description, system prompt, tool restrictions, optional model override. Each subagent receives its findings directly in its prompt — subagents do NOT inherit the coordinator's conversation history.
tool_choice options:
- "auto" — model may return text instead of calling a tool
- "any" — model must call a tool but can choose which (use when multiple extraction schemas exist and document type is unknown)
- {"type":"tool","name":"..."} — force a specific tool first (e.g., extract_metadata before enrichment steps)
PostToolUse hook for data normalization — specific high-yield use case: when different MCP tools return Unix timestamps vs ISO 8601 vs numeric status codes, a PostToolUse hook normalizes them before the agent processes results. Distinct from tool-call interception hooks (which block policy-violating outgoing calls).
MCP resources for content catalogs — expose database schemas, issue summaries, documentation hierarchies as resources (host-attached passive context) to reduce exploratory tool calls. The agent gets visibility into available data without having to query for it.

6. Message Batches API — when to use, when NOT to

High-yield in Domain 4 (Prompt Engineering) and Scenario 5 (Claude Code for CI). See official sample Q11.

The three properties to memorize

50% cost savings versus the synchronous API.
Up to 24-hour processing window with NO guaranteed latency SLA.
NO multi-turn tool calling within a single request — cannot execute tools mid-request and feed results back.

Decision: synchronous vs batch

Workflow	Use	Why
Blocking pre-merge CI check	Synchronous API	Developer is waiting; latency-sensitive. Batch's 24h window kills DX.
Overnight technical debt report	Batch API	Non-blocking; latency-tolerant; 50% cheaper.
Weekly compliance audit on 50k documents	Batch API	Non-blocking; high volume; cost matters.
Nightly test generation	Batch API	Non-blocking; queue overnight; results by morning.
Real-time user-facing extraction	Synchronous	Batch's lack of SLA breaks UX.
Agent loop with tool calls	Synchronous	Batch can't execute tools mid-request.

`custom_id` — the request-response correlation key

Each batch request includes a custom_id; the response uses the same ID. This is how you correlate which output goes with which input — order is NOT guaranteed.
Partial-failure recovery: filter the response by result.type == "errored", extract the failed custom_ids, resubmit ONLY those (possibly with modifications like document chunking).
SLA calculation: if business SLA is 30 hours, batch needs ≤24h to process, so you have a 4-hour submission window for new batches.

7. Claude Code CLI — flags for CI / non-interactive use

Tested directly in Domain 3 + Scenario 5 (sample Q10).

Flag	Purpose	When to use
`-p` / `--print`	Non-interactive mode	Required in CI pipelines. Without it, Claude Code waits for interactive input and the job hangs indefinitely. Process the prompt, write output to stdout, exit.
`--output-format json`	Machine-parseable output	Pair with `--json-schema` for structured findings to post as inline PR comments programmatically.
`--json-schema <file>`	Schema-constrained output	Enforce a specific JSON shape so downstream automation can rely on field presence.
`--resume <session-name>`	Continue a named session	Resume a long-running investigation across work sessions.

Distractors that look right but are fabricated

✗ CLAUDE_HEADLESS=true environment variable — does not exist.
✗ --batch flag for non-interactive mode — does not exist.
✗ Redirecting stdin from /dev/null — doesn't address Claude Code's command syntax.

The single correct CI flag is -p.

CI hygiene patterns (from Scenario 5)

Provide prior review findings in context when re-running on new commits — instruct Claude to report only new or still-unaddressed issues, avoiding duplicate PR comments.
Provide existing test files in context so test generation avoids suggesting duplicate scenarios.
CLAUDE.md as CI project context — document testing standards, fixture conventions, and review criteria so CI-invoked Claude has the same context as humans.
Independent review instance for generated code — same session that generated retains reasoning bias; a second instance without that context catches more subtle issues.

8. Context management — long-context strategies

Symptom / Scenario	Strategy	Why
Tool returns are huge (e.g., 50K-token log dumps)	Compact at the source	Don't let raw blobs hit the model. Trim/summarize in the tool wrapper.
Long conversation, older turns no longer needed verbatim	Summarization	HISTORY is continuous; summarize older turns into a running brief.
Knowledge base much larger than context window	RAG	KNOWLEDGE is external, static-ish. Retrieve only what's relevant.
Live conversation, only recent matters	Sliding window	Drop oldest turns past a threshold.
Long exploratory subtask inside a larger task	Subagent isolation	Don't pollute parent context with subagent's intermediate reasoning.
Mid-session pivot to a different task in same window	`/compact`	Summarize-and-condense the current session; free window space; continue.
Want BOTH the current branch AND a divergent exploration	`fork_session`	Clone session at this point; explore both branches.
Multi-phase workflow where each phase's noise pollutes the next	Summarize-and-spawn	End phase: write key findings to a scratchpad; spawn fresh subagent seeded with that summary only.
Critical facts drift in 20+ turn conversation	Persistent "case facts" block at top of every prompt	Pin specifics (amounts, dates, IDs) to the highest-attention position. Conversation history is for narrative; the facts block is for precision.
Cross-session continuation	Persist via summary doc loaded at session start; or external knowledge store	API is stateless across calls — there is no automatic memory.

The KNOWLEDGE vs HISTORY rule (most-tested distinction)

KNOWLEDGE (external, static-ish, much larger than window) → RAG
HISTORY (continuous thread, growing from the conversation itself) → Summarization
Why RAG fails for conversations: implicit references ("that," "as I mentioned"), inconsistent retrieval across turns, decisions get lost in different wording, no coherent voice across snippets.

"Lost in the middle" — attention pattern

LLMs weight content at start and end of context more than the middle.
Place critical instructions in system message or near the end of the user message.
For long aggregated inputs (many sub-agent outputs), front-load summaries; use explicit section headers to help navigation.
Don't assume "it's in context, the model will use it."

Context-bloat sources, in order of typical magnitude

Tool results — unbounded; trim at source.
Conversation history — re-billed cumulatively each turn (quadratic).
Tool definitions — constant overhead per turn.

9. Negative-framing 4-step protocol (highest-leverage exam habit)

When you see NOT / FALSE / LEAST / WEAKEST

Identify the modifier. Underline it mentally: "I'm looking for the WRONG one / the EXCEPTION."
For each option, ask: "Is this TRUE / CORRECT / APPROPRIATE / STRONG?" Not "is this the answer" — is the statement itself true.
Mark each option silently: ✓ (true) or ✗ (false / wrong / exception).
Pick the one with ✗. There should be exactly one.

If you don't end with exactly three ✓ and one ✗, you misread something — re-read. This is your highest-leverage habit on exam day.

10. Distractor flags — options that are usually wrong

Distractor pattern	Why it's almost always wrong
"Switch to a bigger model" / "Use Opus" / "Upgrade to Sonnet"	Model choice rarely fixes an architectural problem. Test prompt → few-shot → tool use → validation → eval-opt FIRST.
"Multi-agent debate / collaborate / negotiate / discuss / collectively decide"	Peer-to-peer disguise. Anti-pattern. Right answer is usually hub-and-spoke or subagents-as-tools.
"Add 100 few-shot examples"	Past ~5–10, returns diminish. Real fix is structural (tool use schema, prefill, validation retry).
"Use a hook for [LLM-style judgment task]"	Hooks are deterministic shell. LLM judgment goes to a subagent.
"Add to CLAUDE.md: never do X" (for security / compliance / financial)	Soft enforcement. Necessary but never sufficient. Push to deterministic layer.
"Trust the model to follow this rule" (when the rule must be guaranteed)	Same as above. Probabilistic model + hard guarantee = wrong layer.
"Same lockdown for trusted as untrusted CI"	Untrusted needs read-only; trusted gets workflow-needed allow. Don't over-restrict trusted.
"More tools = more flexibility"	Vague flexibility = weakest justification. Tool count past ~25 degrades selection accuracy.
"Increase max iterations" (for runaway loop)	Treats symptom. The fix is bounded stop conditions + cost cap + observability.
"Continue retrying indefinitely" (transient failure)	Subagent-internal exponential backoff with bounded retries → propagate structured error.
"Increase the context window" (for hallucination / drift)	Bigger window doesn't fix lost-in-the-middle. Front-load summaries, scratchpad, summarize-and-spawn.
"Lower temperature to 0.0 for security analysis"	Temperature is per-agent-type. Synthesis needs higher; extraction needs lower; security analysis needs explicit criteria, not lower temp.

11. The official 6 exam scenarios (exam picks 4 at random)

From the official CCA-F Exam Guide. Recognize the scenario in the stem → instantly anchor expected concepts.

Scenario	Primary domains	Anchor concepts
1. Customer Support Resolution Agent	D1 · D2 · D5	Tools: `get_customer`, `lookup_order`, `process_refund`, `escalate_to_human`. 80%+ first-contact resolution target. Programmatic prerequisite gates · structured errors · escalation criteria (explicit + few-shot, NOT sentiment) · multi-issue decomposition · case-facts block.
2. Code Generation with Claude Code	D3 · D5	Custom slash commands · CLAUDE.md · plan mode vs direct execution · Explore subagent · `--resume` · `fork_session` · test-driven iteration · interview pattern.
3. Multi-Agent Research System	D1 · D2 · D5	Coordinator + web search + document analysis + synthesis + report agents. Hub-and-spoke · `Task` tool + `allowedTools: ["Task"]` · explicit context passing · task decomposition narrowness as a root cause · parallel Task calls · structured claim-source mappings.
4. Developer Productivity with Claude	D2 · D3 · D1	Built-in tools (Read, Write, Edit, Bash, Grep, Glob) · Grep for content · Glob for paths · Read+Write fallback when Edit fails · `.claude/rules/` with `paths:` frontmatter · MCP server scoping.
5. Claude Code for Continuous Integration	D3 · D4	`-p` / `--print` · `--output-format json` + `--json-schema` · prior findings in context · CLAUDE.md for testing standards · independent review instance · per-file + cross-file passes for large PRs.
6. Structured Data Extraction	D4 · D5	`tool_use` with JSON schema · `tool_choice` options · nullable fields · "other" + detail pattern · validation-retry with specific feedback · Message Batches API for non-blocking · few-shot for varied formats · stratified sampling for review routing.

12. Domain 1 — Agentic Architecture & Orchestration 27%

Memorize

5 canonical workflow patterns + full agent + multi-agent variants (see Section 1 table).
Decision ladder: prompt → workflow → agent (climb only as needed).
5 required stop conditions for every loop (Section 2).
Multi-agent justification: need at least 1 of — context isolation, genuine parallelism, specialized toolsets. ~15× tokens of single-agent.
Anti-pattern words in stems: collaborate / negotiate / debate / discuss / collectively decide → peer-to-peer.
Pattern composition: PRIMARY = OUTERMOST coordination logic.
Decomposition vs Orchestration: Decomposition = WHAT pieces. Orchestration = HOW they run.
Task tool context passing: no shared memory between agents. Coordinator must inject context explicitly into Task tool prompts. No shared_context parameter exists.
Subagent permissions: INTERSECTION of (declared tools) ∩ (parent's permissions). Can never expand.
Per-agent-type temperature: extraction / classification → low (0.0–0.3). Synthesis / strategy → higher (0.6–0.9). Fragmented synthesis = often a temperature fix, not a decomposition fix.
Failure-location diagnostic: per-task outputs good but fragmented synthesis → SYNTHESIS failure (not decomposition).
Robust error propagation: retry locally with exponential backoff → propagate structured error (with errorCategory + isRetryable + suggestion) only after exhaustion. Coordinator never blind-retries when subagent already did.
Silent failure anti-pattern: subagent must NEVER return {status: success, results: []} on a real failure. Always return typed error.
Provenance: subagents must include claim + source_url + document_name + relevant_excerpt as structured output. Synthesis can't reconstruct provenance after the fact.
Agentic loop control flow: continue iterating while stop_reason == "tool_use", terminate when stop_reason == "end_turn". NEVER parse natural-language phrases ("I have completed the task") for termination — that's a tell-tale anti-pattern.
Tool result message structure: the assistant's tool_use block stays in conversation history. The tool result goes in a NEW user-role message containing a tool_result content block that references the tool_use_id. Send the FULL history on the next call.
Task tool for subagent spawning — coordinator's allowedTools must include "Task". Spawn parallel subagents by emitting multiple Task calls in ONE coordinator response.
Hooks for compliance enforcement: use PostToolUse hooks for data normalization (timestamp formats, status codes). Use tool-call interception hooks to block policy-violating actions (e.g., refunds > $500) and redirect to escalation. Hooks for compliance ≫ prompt-based "be careful" guidance.
Programmatic prerequisite gates — block process_refund until get_customer has returned a verified customer ID. This is the canonical "code-level enforcement beats prompt-based ordering" example from the official sample exam (Q1).
Mandatory human override: explicit user request to talk to a human is an unconditional exit from the automated flow, regardless of capability or sentiment.

13. Domain 2 — Tool Design & MCP Integration 18%

Memorize

Tool components: name · description · input schema (JSON Schema) · implementation.
5 rules for descriptions (Section 5). Most-tested: specify when NOT to use; explicit superiority statement when alternatives exist.
Schema reliability primitives: type · enum · required · pattern · minimum / maximum.
Read vs Write tools: reads are generally safe / auto-approvable; writes need permissions, idempotency, often HITL.
Tool boundaries: ≤10 safe · 10–25 workable · 25–50+ refactor · 100+ wrong.
5 strategies to reduce surface area: consolidate · subagents with focused tool sets · hierarchical routing tools · dynamic loading per phase · description curation.
Tool naming hygiene: cross-server collisions → namespacing (slack__search, notion__search).
3 MCP roles: Host (Claude Desktop / Code) · Client (component inside host) · Server (separate process).
3 MCP capability types + who decides (Section 5).
Transports: stdio (local) · HTTP/SSE (remote) · Streamable HTTP. MCP is transport-flexible — encryption depends on transport.
Tool result semantics:
- Empty success → isError: false + results: [].
- Genuine error → isError: true + structured metadata (errorCategory, isRetryable, retries_attempted, description).
- Permission error → errorCategory: "permission" + isRetryable: false + escalation path in userMessage. Never "transient" or "validation".
- Business-rule violation → errorCategory: "business" with customer-friendly userMessage for the agent to relay.
Tool-use loop ordering (Section 5). Tool result MUST follow immediately in the next user-role message; tool_use_id linkage is what lets Claude map result → action.
Transient failure handling: exponential backoff inside the tool wrapper; agent sees only the final outcome. Don't propagate transient errors that the tool itself can recover from.
Tool-result trimming: middleware should strip agent-irrelevant fields (raw DB timestamps, internal IDs, warehouse codes) before injecting tool results into context. Don't widen the context window; trim at the source.
Built-in tool selection (Claude Code): Grep for content search · Glob for filename pattern matching · Read/Write for full file ops · Edit for targeted edits via unique text match. When Edit fails (non-unique anchor), fall back to Read+Write.
Tool count limits: 18+ tools per agent degrades selection reliability. Distribute across specialized subagents with 4-5 tools each. Provide scoped cross-role tools (e.g., verify_fact) for high-frequency needs while routing complex cases through the coordinator.
Replace generic with constrained: swap fetch_url for load_document (validates document URLs only). Splits analyze_document into extract_data_points + summarize_content + verify_claim_against_source.
MCP resources vs tools — when to choose which: resources are for passive content catalogs the agent reads (schemas, issue summaries, doc hierarchies). Tools are for active actions the agent invokes (queries, side-effecting writes).
Choose community MCP servers over custom for standard integrations (Jira, GitHub). Reserve custom servers for team-specific workflows.
"Use this tool instead of X" superiority statement in the description is the highest-leverage fix when Claude ignores your tool in favor of writing custom scripts.

14. Domain 3 — Claude Code Configuration & Workflows 20%

Memorize

4 mechanism types: CLAUDE.md (soft) · Slash command (user-invoked) · Subagent (LLM judgment) · Hook (deterministic shell).
CLAUDE.md is SOFT. For hard guarantees, use hooks or permissions deny.
CLAUDE.md hierarchy: 4 levels + CLAUDE.local.md (gitignored). Subdirectory CLAUDE.md loads when files in that subdir are read, not at session start.
Merging: all applicable files merged together. Conflict precedence: subdirectory > project > user > enterprise (enterprise often authoritative for security).
Imports: @path/to/file syntax. Reorganizes but doesn't reduce context.
Path-scoped rules: .claude/rules/<name>.md with paths: YAML frontmatter. Native context-bloat reduction.
SKILL.md frontmatter: allowed-tools: = config-layer security boundary. Cannot be overridden by user prompt.
Slash commands: .claude/commands/<name>.md (project) or ~/.claude/commands/<name>.md (user). Project-level wins on name collision.
Permission precedence: DENY ALWAYS WINS across all scopes. Specificity does not override.
Subagent tools = INTERSECTION, never expand.
Headless mode (-p): non-interactive. Anything that would prompt = denied. Pre-allow everything CI needs.
Fail-closed hooks: crash → treat as block.
CI by trust level: untrusted = read-only allow; trusted = workflow-needed allow + deny destructive.
Hook events: PreToolUse · PostToolUse · UserPromptSubmit · Stop · SessionStart · Notification.
Workflow mode selection in Claude Code: Direct execution (fix known, scope clear) · Plan mode (requirements ambiguous) · Explore subagent (codebase unfamiliar). Match mode to task certainty.
MCP scoping: .mcp.json (team-shared, committed) vs ~/.claude.json (personal). Secrets via ${ENV_VAR} expansion; never commit plaintext.
Custom skills vs CLAUDE.md: "Always-on" universal rules → CLAUDE.md. "Opt-in, occasional, specialized workflow" → .claude/skills/<name>/SKILL.md (invoked explicitly).
/memory command — verifies which memory files are loaded. Use to diagnose "instructions not being applied" issues (likely cause: instructions are in user-scoped ~/.claude/CLAUDE.md not project-scoped).
Named session resumption: --resume <session-name> — continue a specific investigation. Choose resume when prior context is mostly valid; start fresh with injected summary when prior tool results are stale.
Explore subagent — isolates verbose codebase-discovery output. Returns summaries to main agent. Use during multi-phase exploration to preserve context budget.
Skill frontmatter options: context: fork runs the skill in isolated sub-agent context (output doesn't pollute main session) · allowed-tools: restricts tool access during skill execution · argument-hint prompts developers for required params.
Plan mode vs direct execution vs Explore subagent: Plan mode = ambiguous-approach design (multi-file migrations, architectural decisions). Direct execution = clear-scope single-file changes. Explore = unfamiliar codebase navigation.
/compact: mid-session context cleanup — summarize-and-condense; retain findings, discard noise.
fork_session: branch the session for divergent exploration.

15. Domain 4 — Prompt Engineering & Structured Output 20%

Memorize

Reliability hierarchy (climb in order): better prompt → few-shot → tool use / prefill → validation + bounded retry → evaluator-optimizer.
4 levers of a good prompt: clarity · role · structure (XML / markdown) · output specification.
System vs user message: role + persistent behavior + format rules in system. Per-turn input in user.
XML tags are the Anthropic-recommended structural delimiter. Claude is trained to weight tagged sections.
Positive > negative instructions. "Keep responses under 100 words" beats "don't be verbose." Prohibitions still anchor the prohibited concept.
Few-shot: 3–5 sweet spot. > 10 → consider fine-tuning. Principles: diverse · representative · cover edge cases. Format consistency across all examples. Recency bias: the last example has stronger format influence.
4 ways to get structured output (most → least reliable):
1. Tool use / function calling with JSON schema (API enforces).
2. Prefilling the assistant turn (e.g., Assistant: {).
3. JSON schema in instructions (relies on adherence).
4. Free-text "respond as JSON" — least reliable.
Common output failure modes: markdown fence wrapping → prefill or tool use; preamble → prefill; trailing commentary → prefill / stop sequences; hallucinated fields → tool use schema; type errors → tool use enforces, otherwise validate.
Two-part fix for variable-detail extraction: optional schema fields (type: [string, null]) + explicit prompt instruction "extract if present, return null if absent, never infer." Neither alone is sufficient.
Validation retry — 3 principles: tell Claude WHY previous output was rejected; bound retries (max 3 + cost cap); have a fallback path for final failure.
Field-level validation feedback: include field name + wrong value + correct format + constraint violated. Never "data invalid, try again."
Evaluator paradox: evaluator must see what generator misses. Same model + similar prompts = shared blind spots. Diversify with different family, different framing, or external rule-based check.
Calibrated confidence thresholds: raw model confidence scores aren't probabilities — calibrate against labeled validation set before using as a routing threshold.
tool_choice options: "auto" (model may skip), "any" (must call some tool — useful when document type is unknown), forced selection {"type":"tool","name":"X"} (specific tool first, e.g., extract_metadata before enrichment).
"other" + detail string pattern for extensible enums — when the source has open-ended categories, define enum with known values + a string field for "other" detail. Prevents the model from forcing every value into an existing category.
Nullable schema fields for optional source content. When a field may be absent, mark it nullable so the model returns null instead of fabricating a value to satisfy a required field.
Validation-retry feedback loop: on failure, send a follow-up with (original document) + (failed extraction) + (specific validation error). Use Pydantic for schema-level validation. Track which retries succeed (format mismatches) vs which won't (information truly absent from source).
detected_pattern field in structured findings — tracks which code constructs trigger each finding. Use it to analyze false-positive patterns when developers dismiss results.
Explicit categorical criteria beat "be conservative." Vague filters like "only report high-confidence" fail. Concrete rules like "flag SQL injection ONLY when an unsanitized variable from an external HTTP request reaches a query string" precisely shape behavior.
Multi-pass review architectures: per-file local analysis pass + cross-file integration pass. Use a second independent Claude instance for review (the generator's session retains reasoning context that biases self-review).
Self-correction validation patterns: extract calculated_total alongside stated_total to flag arithmetic discrepancies. Add conflict_detected booleans for inconsistent source data.
Few-shot for behavioral override: when Claude has a default "be helpful by inferring" instinct and you need verbatim preservation, few-shot examples are the most direct override.
Escalation calibration: for inverted-escalation behavior (escalating easy cases, attempting hard ones), the fix is explicit escalation criteria in system prompt + few-shot boundary examples — not sentiment models or capability classifiers.
Aggregate accuracy is misleading. 97% overall can hide a 50%-error subsegment. Validate by segment; for safety-critical errors, route the segment to mandatory human review.

16. Domain 5 — Context Management & Reliability 15%

Memorize

What's loaded every API call: system message · conversation history · all tool definitions · prior tool results · resources/attached files · current user message.
Stateless across separate API calls. No memory between calls — memory is an architectural choice (resending history, external store, retrieval).
Context-bloat ranking (Section 6): tool results > conversation history (quadratic) > tool definitions.
KNOWLEDGE → RAG · HISTORY → Summarization (Section 6 alert card).
Why RAG fails for conversations: implicit references ("that," "as I mentioned"), inconsistent retrieval across turns, decisions get lost in different wording, no coherent voice across snippets.
Lost in the middle: place critical info at start/end; front-load summaries; explicit section headers.
5 long-context strategies (Section 6 table): compact at source · summarization · RAG · sliding window · subagent isolation.
Mid-session pivot: /compact. Divergent exploration: fork_session. Multi-phase: summarize-and-spawn between phases.
Scratchpad pattern for long agentic sessions: write key findings to a file at the end of each phase; new phase starts by reading the scratchpad. Converts passive (buried in history) to active (fresh in context).
Persistent case-facts block at the top of every prompt for transactional precision in long conversations.
Confidence calibration:
- Explicit labels: {finding, confidence: high|medium|low, reasoning}.
- Route by confidence: high → auto; medium → secondary check; low → human review.
- Multi-sample / consensus (~5× cost) — reserve for high-stakes.
- Raw model scores are NOT probabilities. Calibrate against labeled validation set.
4 handoff boundaries: turn-to-turn · agent → subagent · session-to-session · agent → human (HITL). Each needs structured handoff.
Agent → subagent handoff: pass goal + scope + required inputs + output schema. Return structured result, not full reasoning trail.
Agent → human handoff (HITL): structured package — what trying to do, what's been done, why escalating, recommended action. Treat human as the next sub-agent.
Defense in depth stack: better prompt → schema (tool use) → validation + bounded retry → confidence calibration + routing → idempotent tools + HITL on high-stakes → stop conditions + budget caps → observability.
Recovery patterns: bounded retry with backoff · circuit breakers · graceful degradation (fall back to historical averages, flag as estimated) · idempotency for writes.
Monitoring: task completion rate · error rate by category · cost per task · latency p50/p95/p99 · tool call distribution · stop-reason distribution · HITL escalation rate.
Stratified random sampling for measuring error rates in high-confidence extractions. Validate accuracy by document type and field segment before reducing human review — aggregate 97% can hide a 30% segment failure.
Field-level confidence scores calibrated using a labeled validation set. Raw model self-reported scores are NOT probabilities until calibrated.
Structured handoff package when escalating to human: customer ID, root cause analysis, refund amount, recommended action. Human agents may not have access to the conversation transcript.
Crash recovery via state manifests — each agent exports state to a known location; the coordinator loads a manifest on resume and injects relevant state into agent prompts.
Coverage annotations in synthesis outputs — distinguish well-supported findings from gaps caused by unavailable sources. Helps the reader know what to trust.
The reliability principle: predictable failure beats unpredictable success. Don't chase 100% — build bounded behavior, graceful failure, observability.
Aggregate accuracy can mask catastrophic segment failure. Slice by segment; route safety-critical segments to mandatory human review until validated.
Position-aware input — full mitigation: "lost in the middle" reliably affects MIDDLE positions; the model processes BEGINNING and END well. Place key findings at BOTH ends (summary at top, restate at bottom). Use explicit section headers. Don't only front-load — bookend.
Multi-concern decomposition pattern: when a customer message has multiple distinct issues, decompose into separate items, investigate each in parallel using shared context, then synthesize a unified resolution. NOT sequential one-at-a-time; NOT a single mega-prompt jamming all concerns together.
Render-by-content-type in synthesis: financial data → tables; news/timeline → prose; technical findings → structured lists. Forcing uniform format (e.g., bullets everywhere) destroys readability and erases signal that the content type itself carried.
Temporal data — include dates in structured output: require publication_date / data_collection_date in subagent outputs. Prevents temporal differences ("2023 had 18%; 2025 has 27%") being mis-flagged as contradictions during synthesis. Apparent conflicts are often just time-series.
Batch SLA math: 24-hour batch processing window means you must submit on a tighter cadence to hit downstream SLAs. Example: to guarantee a 30-hour SLA with the batch API, submit every 4 hours (4h queue wait + 24h processing + 2h buffer ≤ 30h). Frame batch as scheduled work, not a real-time call.
PostToolUse trim pattern: raw tool outputs are often verbose (40+ fields from an order lookup). Use a PostToolUse hook to filter to the 3-5 fields the agent actually needs BEFORE the result enters context. Prevents context bloat from compounding turn-over-turn.
Claim-source mappings preserved through synthesis: subagents must output structured {claim, source_url, source_doc, excerpt} tuples. Summarization MUST preserve and merge these, never collapse to claim-only. Attribution loss is irreversible once it happens.

17. Exam-day strategy (60 questions / 120 minutes)

Habit	Why
Read each stem twice	Numbers (volume, latency, cost), trust qualifiers (trusted / untrusted), and absolute words (always / never / must) are decisive. Negative framing (NOT / LEAST / WEAKEST / EXCEPT) flips polarity — miss it once and you lose the question.
Run the negative-framing 4-step protocol	(1) Mark the polarity word. (2) Restate stem in your head: "Find the wrong / weakest one." (3) Label each option +/-. (4) Pick the only minus. Section 9 has this.
Flag length-as-correctness instinct	Length parity is enforced in this prep folder but the real exam sometimes has a long correct option AND long distractors. Decide on content, not word count.
Match the scenario to its anchor concepts	Section 11 maps each of the 6 official scenarios to the concepts it loves to test. The exam picks 4 of 6 at random — recognize which one you're in within the first sentence.
Eliminate the obvious anti-patterns first	Peer-to-peer subagents · parsing natural language for loop termination · sentiment-based escalation · single iteration cap as only stop · ignoring errors silently · uniform output formatting regardless of content type · selecting first match heuristically when multiple candidates exist. Section 10 has the full distractor flag list.
Pick the layer before the answer	Section 3 layer placement: prompt vs schema vs hooks vs code. If the option puts a deterministic guarantee in a prompt instruction, it's almost always wrong — hooks or code, not prompts.
Budget 2 minutes / question; flag and move	120 minutes / 60 questions = 2 min/question. If a question takes > 2 min, flag it, pick your best guess, move on. Come back at the end. Never lose 3 questions arguing with 1.
Last 10 minutes: review flagged only	Resist the urge to re-read every question. Trust your first read for unflagged ones. Use the timer reserve for the items you flagged because you genuinely couldn't decide.
Watch for fabricated parameters	If an option references a flag or parameter that doesn't appear in Anthropic docs (e.g., `--enable-multi-agent`, `tool_choice: "strict"`), it's a distractor. The exam doesn't invent fake APIs but bad options sometimes do.
Match what you've actually built	If a scenario describes something close to a system you've shipped, trust that intuition — Anthropic's exam questions reward production experience over textbook patterns.

The two hardest exam mindset shifts

You're being tested on "what's the BEST answer," not "what works." Multiple options may technically work. The exam wants you to pick the one that matches the scenario's actual constraints (volume, latency, team size, regulatory) — not the most powerful or sophisticated answer.
The exam rewards humility about model autonomy. Wherever a question pits "trust the prompt" vs "enforce in code/hooks/schema," the answer leans toward deterministic enforcement. Prompt-based guidance has a non-zero failure rate — that's the recurring theme.

Built by John for the CCA-F exam · Companion files: stem-decoder · references

The one shift that fixes most pattern-identification confusion

The corrected hierarchy

The three concrete examples

The single tell for peer-to-peer in exam stems

Three questions you should be able to answer after this section

Pattern composition — identifying the PRIMARY pattern

"Where does multi-agent fit on this ladder?"

Stop conditions — every agent / eval-opt / retry loop needs ALL of these

CLAUDE.md hierarchy (4 levels + local override)

Permissions precedence

CI/CD permissioning by trust level

MCP server scoping

Additional Claude Code mechanisms (from official exam guide)

3 capability types — memorize "who decides invocation"

Transports

Auth patterns

Tool design — 5 rules for descriptions

Tool boundaries

Tool result conventions

Tool-use loop message ordering (the strict contract)

Agent SDK primitives (from official exam guide)

The three properties to memorize

Decision: synchronous vs batch

custom_id — the request-response correlation key

Distractors that look right but are fabricated

CI hygiene patterns (from Scenario 5)

The KNOWLEDGE vs HISTORY rule (most-tested distinction)

"Lost in the middle" — attention pattern

Context-bloat sources, in order of typical magnitude

When you see NOT / FALSE / LEAST / WEAKEST

Memorize

Memorize

Memorize

Memorize

Memorize

The two hardest exam mindset shifts

`custom_id` — the request-response correlation key