Claude Mastery Hub — Extract Maximum Power from Claude

🚀

SECTION 01

Why Master Claude

One model family, three surfaces, infinite leverage

Mastered

⚡ WHY IT MATTERSLearn one mental model and it transfers everywhere — chat, terminal, and your own apps all run on the same Claude brains.

A central Claude models core (Fable 5 / Opus / Sonnet / Haiku) powering three surfaces that fan out: the Claude.ai app, Claude Code, and the API & Agent SDK.

🎬 Animation, explained

The pulsing core is the model family — Fable 5, Opus, Sonnet, and Haiku — the one brain behind everything.
The three animated beams fan out to the three surfaces you'll use it from: the Claude.ai app, Claude Code in your terminal, and the raw API / Agent SDK.
The dots travel outward from the core, not between surfaces — each surface is a different doorway into the same intelligence, not a different AI.
Takeaway: skills you build on one surface (prompting, context, tool thinking) transfer to all the others for free.

Claude is one family of models you reach three ways — a chat app, a coding agent, and an API — and mastery is just knowing which one to grab.

Most people only ever open the chat box and stop there. The real wins come from picking the right surface for the job, then giving Claude the context it needs to nail it on the first try.

This hub builds you up in four layers: Foundations Core Power Features Mastery. Start here, then go in order or jump to what you need.

🆕 Recently updated

Jul 28, 2026 — Claude Opus 5 currency pass: current lineup, centralized model facts, migration guidance, and provenance-labeled benchmark evidence.
Jul 2026 — Mid-2026 Claude Code currency pass: Claude in Chrome (GA), Sonnet 5 as the new default, background-by-default subagents, experimental agent teams, and /doctor auto-fix.
Jul 2026 — New Practice Lab: 12 hands-on scenario challenges — type the command or pick the move, get instant feedback, hints, and solution walk-throughs with the why.
Jul 2026 — Self-test quizzes on every teachable section (100+ questions), upgraded full-text search, deep-link heading anchors, and a print-friendly stylesheet.
Jul 2026 — Learning engine: per-section quizzes, a scored 20-question final exam with ranks, and a spaced-repetition review nudge.
Jul 2026 — Mid-2026 currency pass: dynamic workflows & ultracode, fast mode, background agents, and Claude Cowork (web + mobile).
Jun 2026 — Claude Sonnet 5 & Fable 5 across the model tables, “Animation, explained” panels on all 24 diagrams, plus How-To Recipes and CLI Hacks.

The three surfaces

Each surface reaches the same Claude model family, although model and plan availability varies by product. The interface is tuned for a different kind of work.

Web & Desktop App

claude.ai and the desktop app. Best for chat, research, writing, and uploading files. Has Projects, Artifacts, and web search.

Claude Code

An agent in your terminal or IDE. Best for real coding — it reads, edits, and runs files in your repo, runs tests, and makes commits.

API

Build Claude into your own apps and scripts. Best for automation, pipelines, and products. Full control over model, prompts, and caching.

Surface	Reach it via	Reach for it when…
Web / Desktop	claude.ai or the app	You want to think, research, write, or analyze files in a chat.
Claude Code	`claude` in a terminal	You want code changed, tests run, or a repo refactored.
API	Anthropic SDK / HTTP	You want Claude inside your own software, on a schedule or at scale.

The 60-second "you can do X" tour

A taste of what each surface unlocks. The rest of the hub goes deep on every one of these.

In the web app

Drop in a messy spreadsheet and get back a clean chart, or have Claude build a working mini-app you can share.

Analyze a file

Upload a CSV or PDF and ask “what are the three biggest trends here?” Claude reads it and answers with citations.

Make an Artifact

“Build a tip calculator as an app.” Claude renders a live, usable tool in a side window you can keep and share.

Use a Project

Upload your style guide once; every chat in that Project follows it automatically.

In Claude Code

Ask in plain English and Claude edits your actual files. Start interactive, or run it headless in a script.

# Interactive: open the agent in your repo
claude

# One-shot: ask a question and exit (headless)
claude -p "Find and fix the off-by-one bug in pagination.py"

# Pipe a file in and let Claude explain it
cat error.log | claude -p "What caused this crash?"

Press Shift+Tab to enter plan mode. Claude can only read and think — not edit — so you approve the plan before any code changes. It is the single highest-leverage habit in Claude Code.

With the API

A few lines wires Claude into anything you build. Pick a model by name and send a message.

from anthropic import Anthropic

client = Anthropic()
msg = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this ticket in one line."}],
)
print(msg.content[0].text)

Pick the right model

All three surfaces let you choose a model. The rule of thumb: start cheap and fast, move up only when you hit a wall.

Model	Best for	Speed	Price (in / out per MTok)

In Claude Code, switch any time with /model sonnet, /model opus, or /model haiku. As of July 28, 2026, Opus 5 is the default on Claude Max and the strongest model on Claude Pro; product defaults and plan access can change, so check the model picker before relying on a default.

The mindset that makes Claude great

The difference between a frustrating session and a magical one is rarely the model — it is how you talk to it. Four habits carry almost all the weight.

Be specific

Name files, state constraints, point to an example. “Claude can infer intent, but it can’t read your mind.” Precise instructions mean fewer corrections.

Give context

Share the goal, the relevant code, and what “done” looks like. Good context up front beats ten rounds of clarification.

Iterate

Treat the first answer as a draft. Steer, refine, and rewind. If two corrections fail, start fresh rather than piling on.

Delegate

Hand off broad research or a self-contained task to a subagent. It works in its own context and returns just the summary you need.

One thing governs everything: “Claude’s context window fills up fast, and performance degrades as it fills.” Keep each session focused on one job — fresh context is free, and a clean start beats a cluttered one.

Best workflow in three moves: explore → plan → execute. Let Claude read the code, agree on a plan in plan mode, then turn it loose. One clear scope per session keeps quality high and cost low.

How this hub is organized

Foundations — the surfaces, models, and core mindset (you are here).
Core — everyday moves: prompting well, Projects, slash commands, managing context.
Power Features — subagents, skills, MCP servers, hooks, and plan mode.
Mastery — automation, plugins, prompt caching, and multi-agent orchestration.

Work through it in order for a clean ramp, or jump straight to the layer that matches what you need today.

🧠

SECTION 02

Pick the Right Brain

Match the model to the job — pay for power only when it earns its keep

Mastered

⚡ WHY IT MATTERSRouting each task to the cheapest model that can do it well turns cost and speed into a design choice, not an accident.

A task flows through a complexity-and-cost router that sends it down one of four lanes: Haiku for fast cheap high-volume work, Sonnet for everyday coding, Opus for deep reasoning, or Fable 5 for the hardest, most capable work.

🎬 Animation, explained

The purple dot entering from the left is one incoming task — a bug fix, a question, a refactor.
The pulsing box is the routing decision: how hard is this, and what's the cheapest model that clears the bar?
The four colored lanes are the four tiers, ordered cheap-and-fast (green, Haiku) to most-capable-and-costly (gold, Fable 5). The staggered dots show different tasks taking different lanes.
Takeaway: routing is a per-task choice you make constantly, not a one-time setting — the bottom caption's rule ("smallest model that clears the bar") is the whole game.

Four Claude tiers, one rule: cheap for simple, balanced for daily work, deepest reasoning when it pays off — and Fable 5 when the job is genuinely hard.

As verified July 28, 2026, Fable 5 is Anthropic's highest-capability widely released model. Opus 5 is the latest Opus model for complex agentic coding and enterprise work. Sonnet 5 balances speed and intelligence for most production workloads, while Haiku 4.5 is the fastest current model for lightweight, high-volume work. Mythos 5 is invitation-only for approved Project Glasswing customers, so it is specialized rather than a normal general-purpose recommendation.

🆕 Released July 24, 2026: Claude Opus 5 (claude-opus-5) brings a 1M context window, 128k output, thinking on by default, and stronger effort scaling at the same $5 / $25 base price as Opus 4.8. See the Opus 5 guide →

Meet the lineup

Fable 5

Highest-capability widely released Claude for demanding reasoning and long-running agents. It costs twice as much as Opus 5, so validate that the extra capability matters on your workload.

Opus 5

The latest Opus, built for complex agentic coding, enterprise work, large refactors, and deep analysis. It approaches Fable-class intelligence at half Fable's base price.

Sonnet 5

The balance tier for production coding, analysis, content, and tool use. It is faster and cheaper than Opus 5, with adaptive thinking and a full effort ladder.

Haiku 4.5

The fastest current lightweight model. Use it for real-time apps, bulk processing, simple transformations, and cost-sensitive sub-agent tasks.

Mythos 5

Invitation-only through Project Glasswing for approved defensive cybersecurity workflows. It is not a self-serve general-purpose tier.

Side-by-side comparison

Model	API id	Speed	Price (in / out per MTok)	Context	Max output	Thinking

Prices verified July 28, 2026. Sonnet 5's $2 / $10 introductory price ends August 31, 2026; its published standard price is $3 / $15.

Thinking works differently per tier: Fable 5 thinks always. Opus 5 and Sonnet 5 use adaptive thinking by default and respond to the effort setting. Haiku 4.5 supports the classic extended-thinking control. On Opus 5, disabling thinking is limited to high effort or below.

Sonnet 5 uses a new tokenizer — the same text costs roughly 30% more tokens than on Sonnet 4.6 (per-token price is unchanged, so re-baseline any token budgets, max_tokens limits, and cost dashboards rather than reusing 4.6 numbers).

The routing cheat sheet

When in doubt, follow this. Start cheap and upgrade only when you hit a real capability gap.

Your task	Use	Why
Bulk edits, log lines, formatting, simple Q&A, sub-agents	Haiku 4.5	Fastest and 5x cheaper than Opus
Main development: code gen, data analysis, tool use	Sonnet 5	Near-Opus coding quality at Sonnet price
Architecture, multi-file debugging, big refactors, deep research	Opus 5	Latest Opus; strong agentic execution and effort scaling
The genuinely hardest work: huge migrations, long autonomous runs, frontier analysis	Fable 5	Highest-capability widely released tier; premium price

Cheap tasks → Haiku 4.5. Main production work → Sonnet 5. Complex agentic and enterprise work → Opus 5. Highest-capability widely released work → Fable 5. “Best” depends on capability, availability, cost, latency, task type, effort, product plan, and platform.

How to switch models

In Claude Code

Use /model inside a session. Run it bare to open a picker, or name a model to switch and save it as your default.

/model                    # open interactive picker
/model opus               # switch to Opus and save as default
/model claude-opus-5      # pin the current Opus API model id
/model claude-sonnet-5    # switch to Sonnet 5 by full id
/model haiku              # switch to Haiku
/model claude-fable-5     # the premium flagship, for the hardest tasks
/model default            # recommended default; may auto-route Opus for hard tasks

Press s in the picker to switch for the current session only without changing your saved default.

At launch from the CLI

The --model flag sets the session model and overrides both the model setting and the ANTHROPIC_MODEL env var. Add --fallback-model for automatic failover — the fallbackModel setting now takes up to three models, tried in order when the primary is overloaded or unavailable.

claude --model claude-sonnet-5
claude --model claude-opus-5 --fallback-model sonnet
claude -p "summarize these logs" --model haiku   # headless, cheap

In the Claude.ai app

Use the model picker near the chat input to choose Opus, Sonnet, or Haiku per conversation. Switching is per-chat and instant.

Big context windows (1M tokens)

Fable 5, Opus 5, Opus 4.6/4.7/4.8, Sonnet 5, and Sonnet 4.6 carry a 1M-token context window; Haiku 4.5 has 200k. Opus 5's 1M window is both the default and maximum—there is no smaller context variant.

A huge window is not a license to dump everything in. Performance still degrades as context fills, so read only the file ranges you need and keep prompts tight even on 1M-token models.

Squeeze more value per dollar

Two built-in levers cut cost hard. Prompt caching is automatic in Claude Code — cache reads are 90% cheaper than fresh input. The Batch API gives a flat 50% off when latency does not matter.

Model	Cache hit (read)	Batch (in / out)

In Claude Code, front-load context in plan mode so it caches once. Avoid switching models mid-session — a model swap invalidates the cache and forces a slow, expensive uncached turn.

Route per sub-agent, not just per session

You don't have to pick one model for everything. Give each sub-agent its own model so cheap workers stay on Haiku while the main thread reasons on Opus or Sonnet.

# .claude/agents/researcher.md frontmatter
---
name: researcher
description: Searches the codebase, use proactively
model: haiku
tools: Read, Grep, Glob
---

The model field accepts an alias (sonnet, opus, haiku), a full id, or inherit (the default). For one-off overrides, set the CLAUDE_CODE_SUBAGENT_MODEL env var, which wins over frontmatter.

Heads-up on dates

Knowledge cutoffs differ. Opus 5's reliable knowledge and training-data cutoffs are May 2026; Fable 5 and Sonnet 5 are reliable through Jan 2026; Haiku 4.5 is reliable through Feb 2025, with a Jul 2025 training-data cutoff. A cutoff is not a live-data guarantee—use current sources or web tools for newer facts.

⚙️

SECTION 03

Install & First Run

From zero to your first prompt in five minutes

Mastered

Claude Code is a terminal tool that turns your codebase into a workspace you can talk to.

This section gets you installed, signed in, and running your first session safely. You only do the install and login once; after that you just type claude in any project.

Step 1 — Install the CLI

Claude Code runs from your terminal. The fastest way to install it is the official script, which works on macOS and Linux.

# macOS / Linux
curl -fsSL https://claude.com/install.sh | bash

# Or via npm (any OS with Node 18+)
npm install -g @anthropic-ai/claude-code

Confirm it worked with claude --version. If the command is not found, restart your terminal so it picks up the new PATH.

Step 2 — Sign in

The first time you launch Claude Code it walks you through login in your browser. You can sign in with a Claude subscription (Pro or Max) or an Anthropic Console API account.

# Start Claude, then follow the browser prompt
claude

# Or trigger login directly any time
/login

Stuck? /doctor is now a full setup checkup that can diagnose and fix issues (/checkup is its alias); /status shows which account you're signed in as. To rule out your own config, launch with --safe-mode (or CLAUDE_CODE_SAFE_MODE), which starts Claude Code with all customizations — CLAUDE.md, plugins, skills, hooks, MCP servers — disabled.

Step 3 — Run Claude in a project

Always cd into a real project folder before starting. Claude reads files relative to where you launch it, so the folder you start in becomes its working directory.

Open a terminal in your project root.
Type claude and press Enter.
Type a question in plain English and press Enter.

cd ~/Projects/my-app
claude

Three ways to start

Interactive

claude opens the live REPL where you chat back and forth. This is the normal way to work.

With a first prompt

claude "explain this repo" opens the REPL but starts working on that prompt right away.

One-shot (headless)

claude -p "list the API routes" prints one answer and exits. Great for scripts.

# Pipe a file in and get a quick answer back
cat error.log | claude -p "explain this stack trace"

The interactive REPL

The REPL is the prompt box where you type. Anything that does not start with / is a message to Claude; anything starting with / is a built-in command. Try these first.

Type this	What it does
`/help`	Lists every slash command and basic usage.
`/status`	Shows your account, model, and session info.
`/clear`	Wipes the conversation to free up context.
`/cost`	Shows token usage for the session.

Press Esc to stop Claude mid-action without losing your place. This is your panic button when it heads the wrong way.

Step 4 — Generate CLAUDE.md with /init

Run /init once per project. It scans your repo and writes a CLAUDE.md file describing the structure, build and test commands, and conventions. Claude reads this file at the start of every future session, so it remembers how your project works.

/init

After /init, open CLAUDE.md and trim it. For each line ask: would removing this make Claude make a mistake? A short, sharp file beats a bloated one that gets ignored.

You can edit memory any time with /memory, or append a quick note by starting a line with # in the prompt box.

Trust and permissions (read this)

Claude Code asks before it edits files or runs shell commands. The first time you open a new folder it also asks you to confirm you trust that directory. This is your safety net, so do not blindly approve everything.

Be extra careful approving commands that delete files, push to git, or hit the network. When a prompt appears, read the actual command before choosing allow.

Permission modes

Press Shift+Tab to cycle modes during a session, or start in one with a flag. plan mode is the safest place to begin: Claude can read and think, but cannot change anything.

Mode	Behavior
default	Asks before edits and commands.
plan	Read-only. Explores and proposes a plan, makes no changes.
acceptEdits	Auto-approves file writes; still asks for risky shell commands.

# Start a session in plan mode
claude --permission-mode plan

Avoid --dangerously-skip-permissions while learning. It skips every prompt, including destructive ones. Use it only in throwaway sandboxes you do not care about.

To see or change which tools are auto-allowed, run /permissions. Approved rules are saved so you are not asked the same thing twice.

How to exit

When you are done, leave the REPL cleanly. Your conversation is saved so you can pick it back up later.

# Inside the REPL, any of these exit:
/exit
# or press Ctrl+D
# or press Ctrl+C twice

Next session, run claude --continue (or -c) to resume the most recent conversation in that folder, or claude --resume to pick from a list.

Your first five minutes, recap

Install with the script, verify with claude --version.
cd into a project and run claude, then /login if prompted.
Run /init to create CLAUDE.md, then prune it.
Start in Shift+Tab plan mode and read each permission prompt.
Exit with /exit; resume later with claude -c.

⌨️

SECTION 04

Terminal Commands & Flags

Drive Claude from your terminal: interactive, headless, and scriptable

Mastered

The claude command is two tools in one: a live coding partner and a scriptable engine you can pipe, loop, and bake into CI.

Run it plain for a chat session, or add -p to get one answer and exit. Everything in between is flags that tell Claude which model, which session, and how much it's allowed to touch.

Three ways to start

Interactive REPL

Type claude alone for a full session, or claude "query" to open with a first prompt already sent.

Headless

claude -p "query" prints the answer and exits. The -p (alias --print) flag is the gateway to all scripting.

Piped input

Feed any file or command output to Claude over stdin, then ask a question about it.

# Interactive, with an opening prompt
claude "Walk me through the auth module"

# Headless one-shot: print and exit
claude -p "List the top 3 risks in this repo"

# Pipe content in, ask about it
cat error.log | claude -p "Explain the root cause in 2 sentences"

All other flags work alongside -p — including --continue, --resume, --model, --permission-mode, and --output-format.

Resuming work: continue vs resume

When: you closed the terminal but want to pick up where you left off. --continue grabs the latest; --resume lets you choose.

# Reopen the most recent conversation in this directory
claude --continue          # alias: claude -c

# Scripted continuation (combine with -p)
claude -c -p "Now check for type errors"

# Pick a specific past session by name or ID
claude --resume auth-refactor       # alias: claude -r
claude -r abc123 "Pick up the review"

# No argument = interactive picker of past sessions
claude --resume

Name a session up front with --name / -n so it's easy to find later: claude -n auth-refactor. As of v2.1.144, background sessions show in the picker tagged bg.

Picking a model

Why: match the model to the job — Haiku for speed, Sonnet for most work, Opus for the hardest reasoning. --model overrides both the model setting and the ANTHROPIC_MODEL env var.

# Use an alias
claude --model opus
claude --model sonnet

# Or a full model id
claude --model claude-sonnet-5

# Add a fallback if the primary is unavailable
claude --model opus --fallback-model sonnet

Permission modes: how much can it touch?

When: you want Claude to plan without editing, auto-accept edits, or lock things down for automation. --permission-mode overrides the defaultMode from your settings.

# Read, search, and think only — no edits, no destructive commands
claude --permission-mode plan

# Auto-approve file writes + mkdir/touch/mv/cp
claude --permission-mode acceptEdits

# Locked-down automation: deny anything not in your allow rules
claude -p --permission-mode dontAsk "Run the test suite"

Accepted values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions. Reach for --allowedTools to auto-approve only specific tools instead of opening everything.

Adding extra working directories

Why: let Claude read and edit files outside the current folder (a sibling app, a shared lib). Each path is validated as an existing directory.

claude --add-dir ../apps ../lib

--add-dir grants file access but most .claude/ config is NOT discovered from added dirs. To persist them across sessions, set permissions.additionalDirectories in your settings file. In-session, /add-dir <path> does the same.

One-shot scripting that survives turns

How: use --output-format json to capture a session_id, then feed it back with --resume. This is the backbone of multi-step automation.

# Capture the session id from a headless run
session_id=$(claude -p "Start a code review" \
  --output-format json | jq -r '.session_id')

# Continue that exact session in a later step
claude -p "Now summarize the findings" --resume "$session_id"

For reproducible CI runs, add --bare. It skips auto-discovery of hooks, skills, plugins, MCP servers, and CLAUDE.md, giving Claude only Bash + file read/edit. Load context explicitly via --append-system-prompt, --settings, --mcp-config, or --agents. It's recommended for scripted/SDK calls and is slated to become the default for -p.

The flags that matter most

Flag	Alias	What it does
`--print`	`-p`	Headless: run, print the answer, exit. Enables piping and scripting.
`--continue`	`-c`	Resume the most recent conversation in this directory.
`--resume`	`-r`	Resume a session by ID/name, or open a picker with no argument.
`--model`	—	Set the model by alias (`sonnet`, `opus`) or full id.
`--fallback-model`	—	Model to use if the primary is unavailable.
`--permission-mode`	—	Start in `plan`, `acceptEdits`, `dontAsk`, etc.
`--add-dir`	—	Add extra working directories for read/edit.
`--output-format`	—	`json` exposes the `session_id` for chaining steps.
`--name`	`-n`	Label a session so you can `--resume` it by name.
`--allowedTools`	—	Auto-approve specific tools without prompting.
`--bare`	—	Skip auto-discovery for fast, reproducible scripted runs.
`--fork-session`	—	Resume but write to a new session ID instead of reusing it.

Two flags to handle with care

# Skip ALL permission prompts (use with caution)
claude --dangerously-skip-permissions

# Safer: start in plan mode, but make bypass available later
# via the Shift+Tab mode cycle
claude --permission-mode plan --allow-dangerously-skip-permissions

--dangerously-skip-permissions removes every guardrail. The --allow-dangerously-skip-permissions variant only adds bypassPermissions to the Shift+Tab cycle — you still start in a safe mode and switch deliberately.

Monitoring parallel agents

claude agents opens a view to monitor and dispatch background sessions; claude agents --json prints live sessions as a JSON array for scripting. It shares --model, --permission-mode, --add-dir, and --mcp-config with the top-level command.

Mental model: -p for "answer and leave," -c for "keep going," --resume for "go back to that one," and --output-format json + --resume for everything stitched together in a script.

⏩

SECTION 05

Slash Commands

Type / to drive the REPL — and build your own commands

Mastered

Slash commands are the steering wheel of the Claude Code REPL: type / and a menu of built-in controls and your own custom commands appears.

Built-in commands manage your session, context, models, and tools. Custom commands are just Markdown files you drop in a folder. Master both and you stop fighting the tool and start flying it.

Run /help any time to list every available command, including ones this guide does not cover. Press / at the prompt to fuzzy-search them inline.

Context control: /clear vs /compact

This is the single most important pair. Claude's context window fills fast, and quality drops as it fills. These two commands are how you fight back.

/clear

Wipes the whole conversation and frees all context. Fresh context is free. Use it between unrelated tasks.

/compact

Replaces history with a structured summary, keeping key facts. Use mid-task when you still need what came before.

Situation	Use
Finished a task, starting a new unrelated one	`/clear`
Mid-feature, context near full, still need the thread	`/compact`
Stuck after two failed fixes on the same bug	`/clear`, then re-prompt

/compact takes optional focus instructions so the summary keeps what matters.

/compact focus on the API changes and the failing test

Auto-compaction fires automatically near ~95% capacity. Don't wait for it — compact proactively at logical breakpoints (50-75% is a good range) so you control what survives.

Both /compact and switching models invalidate the prompt cache, forcing a slow, expensive uncached turn. Plan when you trigger them.

Pick the right model: /model

Switch the active model without restarting. Run it bare to open a picker, or pass an alias.

/model opus      # deepest reasoning, hard refactors
/model sonnet    # best all-round speed + intelligence
/model haiku     # fastest, cheap, simple tasks
/model default   # recommended default; may auto-route Opus on hard work

A practical rule: use Haiku 4.5 for lightweight volume, Sonnet 5 for most production work, Opus 5 for complex agentic coding and enterprise work, and Fable 5 when the highest widely released capability justifies its premium.

Session lifecycle: /resume and friends

Your conversations are saved. Pick one back up instead of starting cold.

/resume          # lists past conversations; pick one to continue
/resume auth-fix # resume a session you named earlier

From the shell, the same idea works via flags: claude --continue (alias -c) reopens the most recent conversation, and claude --resume <id-or-name> targets a specific one.

# Scriptable continuation
claude -c -p "Check for type errors"

Inspect spend and tools

/cost

Shows token usage and estimated cost for the session. On Pro/Max plans it reflects usage, not billed dollars.

/mcp

Lists configured MCP servers, connection status, and tools — and is where you authenticate (OAuth) to remote servers.

Project memory: /init, /memory, /config

These shape how Claude understands your repo and behaves across sessions.

Command	What it does
`/init`	Scans the repo and generates a starter `CLAUDE.md` with structure, build/test commands, and conventions.
`/memory`	Opens your `CLAUDE.md` memory files in your editor for direct editing.
`/config`	Opens the settings interface for theme and toggles; values persist in `settings.json`.

Memory files load hierarchically: enterprise, then user (~/.claude/CLAUDE.md), then project (./CLAUDE.md), then local. Type # at the start of a prompt to quickly append a memory line.

Prune CLAUDE.md ruthlessly. For each line ask: would removing this cause Claude to make a mistake? A bloated memory file gets ignored because rules get lost in the noise.

Quality and editing helpers

/review

Requests a code review. /review <PR-number> reviews a specific pull request. /security-review focuses on security of current changes.

/agents

Opens an interactive manager to create, edit, and view subagents stored as Markdown in .claude/agents/.

/vim

Enables Vim-style modal editing for the input prompt — NORMAL/INSERT modes with h j k l and common motions.

Custom slash commands

Here is where power users pull ahead. A custom command is just a Markdown file. The filename (minus .md) becomes the command name. The body is the prompt Claude runs.

Location	Scope	Example file
`.claude/commands/`	Project (shared via git)	`optimize.md` → `/optimize`
`~/.claude/commands/`	User (all your projects)	`standup.md` → `/standup`

Pass arguments with $ARGUMENTS (all args) or $1, $2 for positional ones. Here is .claude/commands/fix-issue.md:

---
description: Fix a GitHub issue by number
argument-hint: <issue-number>
model: sonnet
allowed-tools: Bash(gh issue view:*), Read, Edit
---
Fix issue #$1. First read it:

!gh issue view $1

Then locate the bug, apply a fix, and explain your change.

Invoke it with the argument inline:

/fix-issue 482

Inside a command body: lines starting with ! run bash and inject the output (requires allowed-tools), and @path/to/file injects that file's contents into the prompt.

Frontmatter fields you can set: description, argument-hint, model, and allowed-tools to restrict which tools the command may use.

Namespacing

Subdirectories namespace commands for organization. A file at .claude/commands/frontend/component.md is still invoked as /component but shows a project:frontend label.

.claude/commands/
  optimize.md            -> /optimize        (project)
  frontend/component.md  -> /component        (project:frontend)
~/.claude/commands/
  standup.md             -> /standup          (user)

Custom commands have merged into Skills, so a folder like .claude/skills/deploy-staging/SKILL.md also gives you /deploy-staging. Use a single .md for a one-shot prompt; use a Skill folder when you need bundled scripts and reference files.

✨

SECTION 06

Shortcuts, @files & ! bash

Pull files in, push commands out, and steer without breaking flow

Mastered

The REPL is a cockpit: @ pulls files into context, ! pushes shell output back, and a handful of keys let you steer mid-task without losing your place.

Three in-prompt prefixes do most of the heavy lifting. @ references a file or directory as context, ! runs a shell command and feeds its output into the chat, and # at the very start of a prompt appends a line to memory. Learn these and you stop copy-pasting forever.

@ — reference files

Type @ and a path to load that file or folder as context. Tab-completes paths.

! — run bash inline

Start a line with ! to run a shell command; its output joins the conversation.

# — quick memory

Start a prompt with # to append a note to a CLAUDE.md memory file on the fly.

@ — reference files and folders as context

The @ prefix tells Claude exactly which files to read instead of making it search. Be specific: pointing at the right files is the single biggest way to cut corrections. It tab-completes, so start typing the path and press Tab.

> Refactor the auth flow in @src/auth/login.ts and keep it consistent with @src/auth/session.ts

> Summarize what changed across @docs/changelog.md

> Use the patterns in @examples/ when you write the new handler

Reference a directory like @src/handlers/ to give Claude the file listing, then let it open only what it needs. Cheaper than dumping every file.

In custom slash commands, @path in the command body injects that file's contents into the prompt — the same syntax works inside .claude/commands/*.md.

! — run a shell command inline

Prefix any line with ! to execute it in your shell and drop the output straight into the chat. When: you want Claude to see real state — a failing test, git status, a directory listing — without describing it yourself. Why: the output becomes context Claude can act on immediately.

> !git status

> !npm test 2>&1 | tail -40

> !ls -la src/ && cat package.json

A common combo: run a command with !, then ask Claude to fix what it sees.

> !pytest -x
> The first failure above is in test_kelly.py. Fix the implementation, not the test.

The ! command runs in your real shell with your permissions. Treat it like a terminal — don't pipe in anything you wouldn't run yourself.

Drag, drop & paste images

Claude Code reads images. Paste a screenshot or drag an image file into the terminal and ask about it — great for UI bugs, error dialogs, diagrams, or design mockups.

Action	How
Paste from clipboard	`Ctrl`+`V` (use this, not `Cmd`+`V`, on macOS)
Drag a file in	Drag the image onto the terminal window
Reference a saved image	`@screenshots/error.png`

> Here's the broken layout [paste screenshot]. The sidebar overflows on mobile — fix the CSS in @src/styles/sidebar.css

Keyboard shortcuts that matter

The keys below are the difference between fighting the tool and flowing with it. The two most valuable are Esc (stop without losing context) and Shift+Tab (cycle permission modes).

Keys	What it does
`Esc`	Stop Claude mid-action; context is preserved so you can redirect
`Esc` `Esc`	Open the rewind menu (same as `/rewind`) to restore an earlier state
`Shift`+`Tab`	Cycle permission modes: normal → plan → auto-accept
`Ctrl`+`B`	Background a running task so you can keep working
`Ctrl`+`C`	Cancel current input / interrupt
`Ctrl`+`O`	Toggle verbose mode (see thinking output)
`Up` / `Down`	Scroll through your prompt history
`Option`+`T` / `Alt`+`T`	Toggle extended thinking on or off

Multi-line input

To write a prompt that spans several lines, use one of these:

Method	Keys
Quick newline	`\` then `Enter` (backslash continuation)
Standard newline	`Option`+`Enter` (macOS) / `Shift`+`Enter`
One-time setup	Run `/terminal-setup` to bind `Shift`+`Enter` reliably

Prefer Vim keys? Run /vim to get NORMAL/INSERT modes with h j k l navigation in the prompt editor.

Steering mid-task

You don't have to wait for Claude to finish a bad path. Steering early saves context and time.

Hit Esc the moment it heads the wrong way — it stops with context intact.
Type a correction and submit; Claude resumes with your new direction.
If the whole attempt was wrong, press Esc Esc and rewind — this erases the failed attempt from context entirely.

Rewinding a dead-end beats correcting it. A correction leaves the failed approach in context where it can confuse later turns; a rewind removes it cleanly.

Use /btw to ask a quick side question in a dismissible overlay — the answer doesn't pollute your main conversation history.

Putting it together

A realistic loop using all three prefixes plus a key: gather state with !, scope the fix with @, plan it, then execute.

> !git diff --stat
> Review the changes in @src/api/routes.py against @CLAUDE.md conventions.
>   [Shift+Tab to enter plan mode first, then let it execute]
> !curl -s localhost:8000/health | jq .

Front-load context once (read files in plan mode), then execute — the stable prefix gets cached, so every follow-up turn is fast and cheap.

🖥️

SECTION 07

Claude in Your IDE

Same Claude Code engine, now living inside your editor

Mastered

The VS Code extension and JetBrains plugin wire Claude Code straight into your editor, so plans, edits, and diffs show up right next to your code.

The IDE integration is not a different product. It drives the exact same Claude Code engine as the terminal, just with a graphical surface for selections, diffs, and diagnostics. You get inline-diff review, automatic selection sharing, and live error feedback without leaving your editor.

Availability and exact menus shift between editor versions. Treat the steps below as the current shape, and confirm specifics in the official docs for your editor.

Two integrations, one engine

VS Code extension

Publisher id anthropic.claude-code. Adds a native graphical panel and bundles the CLI for you. Needs VS Code 1.98.0+. Also installs in forks (Cursor, Windsurf, Kiro, Devin Desktop) and on Open VSX.

JetBrains plugin

Named Claude Code [Beta] (publisher Anthropic PBC). Covers IntelliJ, PyCharm, WebStorm, GoLand, PhpStorm, RubyMine, Rider, CLion, and Android Studio. It shells out to the claude CLI, which you must install and authenticate first.

Install: VS Code

Open the Extensions panel with Cmd+Shift+X (Mac) or Ctrl+Shift+X (Windows/Linux).
Search for Claude Code and click Install (or use the marketplace 'Install for VS Code' link).
Open the panel and sign in. The CLI is bundled, so there is nothing else to set up.

The VS Code extension panel is the integration. You do not run any attach command. Quick-launch or focus it from the editor with Cmd+Esc (Mac) or Ctrl+Esc (Windows/Linux).

On macOS Tahoe and later, the system Game Overlay can steal Cmd+Esc. Clear it under System Settings -> Keyboard -> Keyboard Shortcuts -> Game Controllers.

Install: JetBrains

Install and authenticate the CLI first: run claude once in a terminal and sign in.
In the IDE, go to Settings -> Plugins -> Marketplace and search Claude Code.
Click Install, then fully restart the IDE (a reload is not enough).

JetBrains settings live under Settings -> Tools -> Claude Code [Beta]: the claude command path, the 'command not found' notification toggle, Option+Enter for multi-line prompts (macOS, needs a terminal restart), and auto-updates.

If Esc will not interrupt Claude in JetBrains, go to Settings -> Tools -> Terminal and uncheck 'Move focus to the editor with Escape'.

What the IDE adds

Selection as context

Highlight code and Claude sees it automatically. The prompt footer shows the line count and the transcript logs 'Selected N lines from <file>'.

Side-by-side diff review

Every proposed edit opens in the editor's native diff viewer with accept / reject / redirect. Edit the proposal in the diff and Claude is told you changed it.

Live diagnostics

The integration shares lint and syntax errors with Claude, so it can react to real problems instead of guessing.

@-mention shortcut

One keystroke inserts a file reference with a line range, like @app.ts#5-10, so you point at exact code fast.

File-reference keys

Editor	Mac	Windows/Linux	Inserts
VS Code	`Option`+`K`	`Alt`+`K`	`@app.ts#5-10`
JetBrains	`Cmd`+`Option`+`K`	`Alt`+`Ctrl`+`K`	`@src/auth.ts#L1-99`

A Read deny rule blocks selection sharing for matching files, and an eye-slash toggle hides the current selection from Claude when you need privacy on a snippet.

Connecting the terminal to the editor

Run claude inside the IDE's integrated terminal and it auto-connects: diff viewing and diagnostic sharing turn on by themselves. From an external terminal, attach with one command.

# In an external terminal, after launching claude:
/ide
# attaches the running session to your open editor

Choose where diffs render with the config command. Keep them in the IDE for visual review, or push them back to the terminal.

# Open diffs in the editor's native viewer:
/config diff tool auto

# Or keep diffs in the terminal:
/config diff tool terminal

Permission modes in the editor

In VS Code the mode indicator at the bottom of the prompt box switches behavior. Plan mode opens the plan as a full markdown document you can comment on inline before any edit happens.

Mode	Behavior
normal	Asks before each action
Plan mode	Read-only; describes a plan and waits for approval
auto-accept	Applies edits without asking

Set the default mode in VS Code settings so every new session starts the way you want.

// VS Code settings.json
"claudeCode.initialPermissionMode": "plan"
// values: default | plan | acceptEdits | bypassPermissions

Settings: what lives where

Editor-specific options live in the extension settings, but the rules that matter for safety and tooling are shared with the CLI. The same precedence applies across CLI, VS Code, and JetBrains.

In the editor

VS Code keys like useTerminal (default false), preferredLocation (panel or sidebar), autosave (default true), and useCtrlEnterToSend. Use Shift+Enter for a newline without sending.

Shared

~/.claude/settings.json holds allowed commands, env vars, hooks, and MCP servers for both the extension and the CLI. One source of truth for permissions.

Under the hood: the 'ide' MCP server

When the VS Code extension is active it runs a local MCP server named ide that the CLI connects to automatically. It binds to 127.0.0.1 on a random high port and writes a fresh auth token to a lock file under ~/.claude/ide/ with 0600 permissions.

It hosts about a dozen tools, but only two are exposed to the model: mcp__ide__getDiagnostics and mcp__ide__executeCode. The rest are internal RPC for opening diffs, reading selections, and saving files.

JetBrains goes further by routing through reference-aware IDE operations. It exposes tools like symbol-rename refactoring, run-configuration execution, and quickfix application, so a rename touches every reference instead of doing a blind text replace.

When the IDE flow beats the terminal

Reach for the editor flow when the work is visual or edit-heavy. Stick with the pure terminal for headless runs, CI, and tight keyboard loops.

Use the IDE when

You are reviewing multi-file diffs, refactoring with reference safety, pointing at exact selections, or want live lint and syntax errors fed back automatically.

Use the terminal when

You are scripting claude -p, running in CI, chaining commands, or working over SSH where a graphical diff viewer adds nothing.

OAuth tokens are shared via the OS keychain in JetBrains, so logging out in one place logs you out everywhere. Sign out deliberately.

Quick-start in 60 seconds

Install the extension or plugin and sign in.
Open Claude with Cmd+Esc / Ctrl+Esc.
Highlight the function you want changed so it becomes context.
Ask: 'Add input validation here and run the tests.'
Review the side-by-side diff, tweak it inline if needed, then accept.

Pair this with Plan mode for anything non-trivial: read the plan as a markdown doc, comment inline, then let Claude implement against it while you watch the diffs land.

🤔

SECTION 08

Plan Mode & Extended Thinking

Look before you leap, then think as hard as the problem demands

Mastered

⚡ WHY IT MATTERSA read-only plan + approval gate before edits turns risky guesswork into reversible, verified progress.

A loop diagram contrasting the safe plan-first flow (Request to Research to Plan, through an Approve gate, to Execute and Verify, looping back) against the reckless edit-first shortcut.

🎬 Animation, explained

The top path is the plan-first flow: Request → Research → Plan, then a deliberate stop at the Approve gate before Execute and Verify.
The Approve gate is you — plan mode forces Claude to show its intended changes before touching anything.
The bottom path skips straight to editing — watch its dot get bounced back around the rework loop, which is where unreviewed changes usually end up.
Takeaway: the gate looks slower but the loop is faster — one approval beats three rounds of undoing surprise edits.

The two cheapest upgrades to your results: make Claude plan before it touches code, and let it think harder only when the problem is actually hard.

Plan mode and extended thinking solve different problems. Plan mode controls what Claude is allowed to do. Extended thinking controls how much it reasons before it acts. Used together, they turn a guess-and-check session into a deliberate one.

Plan Mode: explore, propose, approve, then execute

Plan mode is enforced at the tool level. Claude literally cannot edit files or run destructive commands while in it. It can only read, search, and think, then hand you a written plan to approve.

Toggle it with Shift+Tab, which cycles through normal, plan, and auto-accept modes. You can also start a session already in plan mode from the CLI.

claude --permission-mode plan

The documented workflow for the best coding results is always the same: explore → plan → execute. Plan mode is what enforces the first two steps instead of letting Claude jump straight to editing.

Why it prevents wasted work

When Claude edits first and reasons later, a wrong assumption becomes ten wrong file changes you have to unwind. A plan surfaces that assumption in one paragraph you can correct in one sentence, before any code is written.

There is also a hidden performance win. Front-loading context (reading files, fixing the plan, stating constraints) writes a stable prefix to the prompt cache once. Every later execution turn then reads from that cached prefix, which is the documented cause of very high (90%+) cache-hit rates.

Cache hits cost about 10% of base input price. A stable plan up front is not just safer, it is cheaper per execution turn than constantly changing your mind mid-edit (which invalidates the cache).

How to ask for a plan first

Be specific. Name the files, state the constraints, and tell Claude not to write code yet. Vague requests cause endless exploration that floods your context.

Read src/auth/session.ts and src/auth/tokens.ts.
Propose a plan to add refresh-token rotation.
Do NOT edit anything yet — show me the plan and wait for approval.

For a larger change, ask Claude to write the plan to disk so it survives any context reset or /clear between phases.

Investigate the checkout flow and write a phased plan to PLAN.md.
Keep each phase to one session of work. List files touched per phase.

Skip plan mode for one-sentence-diff tasks: fixing a typo, adding a log line, renaming a variable. The approval round-trip is pure overhead when the change is obvious.

Behind the scenes: the Plan subagent

In plan mode Claude uses a built-in read-only Plan subagent. Like the Explore agent, it skips CLAUDE.md and the git-status snapshot so it stays focused on reading and proposing. You don't configure it; it's part of the mode.

Extended Thinking: spend reasoning where it pays

Extended thinking is on by default with roughly a 31,999-token reasoning budget. You escalate that budget with keywords right inside your prompt. More thinking buys reasoning depth, not magic.

Keyword	Budget	Use for
`think`	Lowest	Moderately tricky logic, small refactors
`think hard` / `think harder`	Higher	Multi-file changes, non-obvious bugs
`ultrathink`	Maximum	Architecture, hard multi-file debugging, costly decisions

ultrathink: our BGP failover drops sessions for ~30s on link flap.
Walk through the timers, the convergence path, and propose fixes
ranked by blast radius. Don't change configs yet.

Toggling and capping it

Flip thinking on or off mid-session with Option+T on Mac (Alt+T elsewhere). Cap the budget with an env var, or set a coarse effort level with a slash command.

# Hard cap the thinking budget for a whole session
export MAX_THINKING_TOKENS=10000

# In-session: dial reasoning effort up or down
/effort high      # low | medium | high
/effort auto      # back to default

Thinking depth does NOT improve compliance with output rules. Ultrathink can burn hundreds of reasoning tokens and still ignore an explicit formatting constraint. Reserve it for genuinely complex reasoning, not for forcing rule-following.

A model note worth knowing

The thinking story differs by model. Opus 5 and Sonnet 5 use adaptive thinking by default and expose an effort ladder; Fable 5 always thinks; Haiku 4.5 supports the extended-thinking control. On Opus 5, thinking cannot be disabled at xhigh or max effort.

Putting them together

The highest-leverage habit is to combine them: enter plan mode, ask Claude to think hard about the approach, approve the plan, then drop back to normal mode to execute.

Press Shift+Tab until you see plan mode.
Prompt: think hard and propose a plan to refactor the payment retry logic. Read the relevant files first.
Read the plan. Correct any wrong assumption in one line.
Approve, then let Claude execute against the now-cached plan.

Plan mode

Controls permissions. Read-only until you approve. Catches bad assumptions before code is written.

Extended thinking

Controls reasoning depth. Escalate with think / think hard / ultrathink only when the problem earns it.

Together

Plan + think hard = a deliberate, cache-efficient session that wastes far fewer correction rounds.

If Claude fails the same fix twice, don't keep correcting. Use Esc+Esc (or /rewind) to erase the failed attempt from context, then re-plan with sharper constraints. Correcting leaves dead ends polluting context; rewinding removes them.

📌

SECTION 09

CLAUDE.md & Memory

Teach Claude your project once — it remembers every session

Mastered

⚡ WHY IT MATTERSStack every rule once in the right layer and Claude obeys all of them automatically, with the most specific layer winning conflicts.

Four CLAUDE.md scopes (managed policy, user, project, local) concatenate in broadest-to-most-specific order and merge into one effective context fed to Claude.

🎬 Animation, explained

The four stacked cards are the four memory scopes: managed policy (org-wide), user (~/.claude), project (checked into git), and local (your machine only).
They flow downward broadest first — each more-specific scope is read after, so it can refine or override what came before.
The merge point shows all four concatenating into one effective context — Claude sees a single combined briefing, not four files.
Takeaway: put team truths in the project file, personal taste in the user file, machine quirks in local — and let the cascade do the layering.

A good CLAUDE.md is the single highest-leverage upgrade you can make: it loads at the start of every session so Claude stops guessing your conventions.

CLAUDE.md is a plain Markdown file Claude reads automatically before it does anything else. Put your build commands, code style, and gotchas there once, and they apply to every conversation in that repo. It is the difference between Claude inferring your setup and Claude knowing it.

CLAUDE.md is read at the start of every session, and re-injected from disk after a /compact. That means good rules survive context resets for free.

The memory hierarchy

Memory files load in layers: enterprise, user, project, and local. More specific layers take precedence, so a project file beats your personal defaults.

Layer	Location	Scope
Enterprise	managed policy file	Org-wide, set by admins
User	`~/.claude/CLAUDE.md`	All your projects, private
Project	`./CLAUDE.md`	This repo, shared with team via git
Local	project-local, untracked	This repo, just you (gitignored)

Commit ./CLAUDE.md so your whole team gets the same Claude behavior. Keep personal quirks in ~/.claude/CLAUDE.md or a local, gitignored memory file.

Get started fast

You almost never write CLAUDE.md from scratch. Let Claude scan the repo and draft one, then trim it.

Run /init in a repo. Claude reads the codebase and generates a starter CLAUDE.md with structure, build/test commands, and conventions.
Run /memory to open your memory files in your editor and clean up the draft.
Prune ruthlessly (see the rule below), then commit it.

The # shortcut: add a memory mid-flow

When Claude makes a mistake or you spot a convention worth saving, start a prompt with #. Claude appends it to a memory file for you, so you never lose the lesson.

# Always run tests with `pnpm test`, never `npm test`
# The staging deploy script must be run from the repo root

It lets you pick which memory file to write to (user vs project), so a one-off correction becomes a permanent rule in seconds.

What to put in (and leave out)

The golden test for every line: "Would removing this cause Claude to make a mistake?" If no, delete it. A bloated CLAUDE.md gets ignored because the real rules drown in noise.

Include

Non-guessable bash commands, non-default code style, the exact test runner, repo etiquette, project architecture, and env quirks or gotchas.

Exclude

Anything inferable from the code, standard language conventions, and info that changes often (it goes stale and misleads).

A bloated CLAUDE.md actively hurts you. When rules get lost in noise, Claude starts ignoring all of them. Shorter and sharper beats long and thorough.

Make rules stick

Claude weights emphatic instructions more heavily. Use IMPORTANT or YOU MUST for rules that really matter.

Move sometimes-relevant material out of CLAUDE.md into Skills or path-scoped .claude/rules/ files. That keeps the always-loaded core lean while still having deep guidance available on demand.

A real CLAUDE.md snippet

Concrete, scannable, and every line earns its place:

# Project: Acme API

## Commands
- Install: `pnpm install`
- Dev server: `pnpm dev` (port 3000)
- Test: `pnpm test` (Vitest) — NEVER use npm
- Typecheck: `pnpm typecheck`
- Lint+fix: `pnpm lint --fix`

## Code style
- TypeScript strict mode; no `any`.
- Immutable data only — never mutate inputs, return new objects.
- Prefer async/await over `.then()`.

## Architecture
- API routes live in `src/routes/`, business logic in `src/services/`.
- DB access goes ONLY through `src/repos/` (repository pattern).

## Gotchas
- IMPORTANT: run `pnpm db:migrate` after pulling — schema drifts often.
- The `legacy/` folder is frozen. YOU MUST NOT edit files there.

Inspect what loaded

Not sure which memory files Claude is actually using? Two commands give you ground truth.

/memory    # opens loaded CLAUDE.md files for editing
/context   # live token breakdown — see how much memory costs

After a big edit to CLAUDE.md mid-session, remember it invalidates the prompt cache and triggers one slower, more expensive turn. Prefer editing memory between tasks, not in the middle of one.

Treat CLAUDE.md as a living document. Every time Claude repeats a mistake, capture the fix with # — over a few weeks your repo trains itself.

🎓

SECTION 10

Skills

Teach Claude a procedure once, and it loads itself exactly when needed.

Mastered

⚡ WHY IT MATTERSYou can install 100+ skills for the cost of a few hundred tokens, because only the matching skill's full body ever loads.

A task matches one skill's description, and only that skill's body is read from the shelf into the lean context window while the rest stay dormant on disk.

🎬 Animation, explained

The shelf on the left holds every installed skill — but only their one-line descriptions sit in context by default.
The incoming task pings the descriptions, and exactly one lights up as a match.
Only that skill's full body travels into the context window; the rest stay dormant on the shelf, costing zero tokens.
Takeaway: this is progressive disclosure — you can install fifty skills and pay for them only when a task actually needs one.

A Skill is a folder with a SKILL.md that Claude auto-loads the moment your request matches it, then forgets again to keep context lean.

Think of a skill as a reusable playbook on disk. Claude reads only the name and description at startup, and pulls in the full instructions only when they are actually relevant. This is called progressive disclosure, and it is why skills scale where giant prompts do not.

Anatomy of a skill

Every skill is a directory containing one SKILL.md file. That file has YAML frontmatter (between --- markers) plus markdown instructions. You can bundle extra scripts, reference docs, or data files alongside it.

.claude/skills/
└── deploy-staging/
    ├── SKILL.md          # frontmatter + instructions
    ├── deploy.sh         # bundled script (run via bash)
    └── checklist.md      # reference, loaded only if needed

The directory name sets the slash command. .claude/skills/deploy-staging/SKILL.md becomes /deploy-staging. The name frontmatter field only changes the display label, not what you type.

A minimal SKILL.md

Only description is recommended; every field is technically optional. The description is what Claude reads to decide whether to activate the skill, so make it state both what it does and when to use it.

---
name: deploy-staging
description: Deploy the app to the staging environment. Use when
  the user asks to ship, deploy, or push a build to staging.
---

# Deploy to staging

1. Run the test suite: `npm test`
2. Build the bundle: `npm run build:staging`
3. Run the bundled script: `bash deploy.sh staging`
4. Confirm the health check at /healthz returns 200.

Never deploy if tests fail. Report the deployed commit SHA.

Frontmatter validation: name is max 64 chars, lowercase letters / numbers / hyphens only, and cannot contain the words 'anthropic' or 'claude'. description is max 1024 chars with no XML tags. Gerund-style names like processing-pdfs are recommended.

Where skills live

Location	Scope	Command
`~/.claude/skills/`	All your projects (personal)	`/name`
`.claude/skills/`	This project, shared via git	`/name`
Plugin `skills/` dir	Whoever installs the plugin	`/plugin:name`

Custom commands merged into skills. Your old .claude/commands/optimize.md still works and still creates /optimize. For anything new, prefer a skill folder so you can bundle scripts and resources.

How activation works

You get the skill two ways. Claude can auto-activate it when your wording matches the description, or you can fire it directly by typing the slash command. Both reach the same instructions.

At session start, Claude preloads only metadata: each skill's name + description (~100 tokens each).
When your request matches a description, Claude reads the full SKILL.md from disk.
If the body points to other files or scripts, those load (or run via bash) only when actually needed.

Progressive disclosure: the three levels

Level 1 · Metadata

Name + description, ~100 tokens per skill. Always loaded at startup so Claude knows what exists.

Level 2 · Instructions

The SKILL.md body, kept under ~5k tokens. Loaded only when the skill is triggered.

Level 3+ · Resources

Bundled docs, scripts, data. Effectively unlimited; zero context cost until a file is read or a script is run.

Controlling who can invoke

Two frontmatter switches tune visibility. They trade off context budget against convenience.

Frontmatter	Effect
(default)	Both you and Claude can invoke; description stays in context.
`disable-model-invocation: true`	Only you can run it via `/name`. Description is NOT kept in context (saves budget).
`user-invocable: false`	Only Claude invokes it; hidden from the `/` menu. Description stays in context.

Want to toggle visibility without editing the file (handy for shared repos)? Open the /skills menu, highlight a skill, press Space to cycle states (on name-only user-invocable-only off), then Enter. It writes skillOverrides in settings.

When skills beat plain prompts

Reusable procedure

You re-explain the same multi-step task often. Write it once; let it auto-load.

Keeps context lean

Long instructions sit on disk at Level 2/3 instead of bloating every turn.

Bundled determinism

Ship a tested script next to the skill so Claude runs it instead of re-generating code.

Team-shareable

Commit .claude/skills/ and the whole team inherits the same playbook.

Reach for a plain prompt instead when the task is a one-off, or so simple that a sentence does the job. Skills earn their keep on repeated, structured work.

Authoring rules that matter

Keep the SKILL.md body under ~500 lines; split overflow into separate files.
Keep file references one level deep, and use forward slashes in paths.
Add a table of contents to any reference file longer than 100 lines.
Prefer a deterministic bundled script over asking Claude to write code each time.
Put the most important instructions near the top — skill bodies are re-injected after /compact but capped at 5,000 tokens each.

Subagents differ from chat. A subagent that preloads a skill (via the skills: frontmatter field) gets the FULL skill body injected at startup, not just the description — so a worker agent always has its playbook ready without needing to be triggered.

SDK gotcha: the allowed-tools frontmatter field works only in the Claude Code CLI, not via the Agent SDK. In the SDK, filter with the skills option on query() and use the main allowedTools option instead. Claude Code supports filesystem skills only — there is no API upload.

🤖

SECTION 11

Subagents & Delegation

Spin up focused helpers that work in their own context and report back

Mastered

⚡ WHY IT MATTERSParallel subagents do heavy work in isolated context windows, so the orchestrator stays lean and N tasks finish in the time of one.

An orchestrator agent fans out to three parallel subagents, each with its own isolated context, which run concurrently and fan their final results back up to be merged.

🎬 Animation, explained

The orchestrator at the top is your main session — it decides the work can be split.
Three dots launch simultaneously, one per subagent, each landing in its own isolated context box: none of them can see the others' clutter (or pollute the parent's window).
Watch the timing: the subagents run concurrently, which is why the fan-out finishes in one subagent's time, not three.
The results fan back in as compact summaries — the orchestrator gets conclusions, not transcripts.

Subagents are disposable specialists with their own fresh context window, so dirty research and verbose output never pollute your main chat.

A subagent is just a Markdown file with YAML frontmatter. The frontmatter sets identity and limits; the body is the subagent's system prompt. Claude can call them automatically or you can invoke them by name.

Why delegate at all

The core constraint of Claude Code is that context fills fast and quality drops as it fills. A subagent runs in its own window, does the noisy work, and returns only a short summary, keeping your main thread clean.

Isolated context

Each subagent starts fresh. Verbose searches, test logs, and dead ends stay out of your main conversation.

Scoped tools

Give a reviewer read-only access. The tools allowlist means it literally cannot edit or run destructive commands.

Cheaper model per job

Route grunt work to Haiku, hard reasoning to Opus, via the model field, independent of your main session.

Parallel work

Fan out several subagents on independent investigations at once, then let Claude merge their findings.

Where they live

Project agents go in .claude/agents/ (shared with your team). Personal agents go in ~/.claude/agents/. When names collide, the higher-priority location wins; project beats user.

Location	Scope	Priority
`--agents` CLI flag	This session only	Highest (after managed)
`.claude/agents/`	Project / team	High
`~/.claude/agents/`	All your projects	Medium
Plugin `agents/`	Installed plugins	Lowest

A real agent definition

Only name and description are required. The tools line is an allowlist; omit it and the subagent inherits every tool from the main chat. Adding 'use proactively' to the description nudges Claude to delegate on its own.

---
name: code-reviewer
description: Reviews code for bugs, security, and style. Use proactively after writing or changing code.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a senior code reviewer. When invoked:
1. Run `git diff` to see what changed.
2. Flag bugs, security issues, and missing error handling.
3. Group findings as CRITICAL / HIGH / MEDIUM / LOW.
4. Suggest concrete fixes. Do NOT edit files yourself.
Return a short, scannable summary only.

Save the file, then in-session run /agents to verify it loaded. Agents edited directly on disk need a session restart; agents created through the /agents UI take effect immediately.

How to invoke one

Three ways to call a subagent, from least to most explicit.

Auto-delegation — describe the task and Claude picks an agent based on its description.
Natural language — name it: "Use the code-reviewer to check my last commit."
@-mention — guarantees that exact agent runs.

# @-mention forces the named subagent to run
@agent-code-reviewer check the auth changes in src/auth/

# Or run an entire headless session AS a subagent
claude --agent code-reviewer -p "review the staged diff"

Pick the right model per agent

The model field takes an alias (haiku, sonnet, opus), a full id like claude-opus-5, or inherit. It defaults to inherit (same model as your main chat) when omitted.

Agent job	Model	Why
Codebase search / scan	`haiku`	Fastest, cheap ($1/$5 per MTok), near-frontier for lookup
Code review / tests	`sonnet`	Best balance for most production work
Architecture / hard debugging	`opus`	Deepest reasoning for complex, costly decisions

Running several in parallel

For independent investigations, spawn multiple subagents at once. Each gathers its own context; Claude synthesizes the results. This works best when the paths truly don't overlap.

Investigate three things in parallel, one subagent each:
1. How is auth wired in src/auth/ ?
2. Where are database migrations defined?
3. What does the CI pipeline run on push?
Report each as a short summary.

Subagents run foreground (blocking) or background (concurrent, auto-denying anything that would prompt). Press Ctrl+B to background a running task, or set background: true in frontmatter to always background it.

When delegation helps vs hurts

Delegate when

Output is verbose and you don't need it in main context, you want tool/permission limits enforced, or the work is self-contained and returns a summary.

Stay in the main chat when

You need frequent back-and-forth, phases share context, the change is a quick one-liner, or latency matters — subagents start cold and must re-gather context.

Subagents can now spawn their own subagents (up to 5 levels deep), so true nested delegation is available for multi-stage work — but keep trees shallow, since each level adds cold-start latency and coordination overhead. AskUserQuestion and plan-mode tools are still unavailable inside a subagent.

Two power features

Add memory: project to give an agent persistent notes across sessions (stored in .claude/agent-memory/<name>/). Add skills: to preload full skill content into the subagent at startup — subagents get the entire skill injected, not just its description.

---
name: test-runner
description: Writes and runs tests; use proactively on new features.
tools: Read, Edit, Bash
model: sonnet
memory: project
skills: python-testing
isolation: worktree
---
Write failing tests first, then implement until green. Verify 80%+ coverage.

Use isolation: worktree for risky changes — the subagent runs in a throwaway git worktree, so it can experiment without touching your working tree. Combine with memory: project so it remembers conventions between runs.

Restricting who can spawn what

In settings or an agent definition, control delegation with the Agent tool syntax. List allowed agents in parens, allow any with a bare Agent, or omit it to block spawning entirely.

# Only let this agent spawn the named workers
# (Task is the old name for Agent — still works as an alias)
Agent(researcher, test-runner)

# Disable a built-in subagent for the session
claude --disallowedTools "Agent(Explore)"

Built-ins worth knowing: Explore (Haiku, read-only search), Plan (read-only, used in plan mode), and general-purpose (all tools). Explore and Plan skip CLAUDE.md and git status for speed; every other agent loads the full memory hierarchy.

🔌

SECTION 12

MCP Servers & Connectors

Plug Claude into your tools, your data, your whole stack

Mastered

⚡ WHY IT MATTERSOne open protocol lets Claude plug into any external tool, file, or service through interchangeable connectors instead of bespoke integrations.

The diagram shows the Claude MCP host spawning per-server clients over stdio and Streamable HTTP transports to GitHub, Filesystem, Browser, and Docs servers, each exposing tools, resources, and prompts, with a JSON-RPC request/response dot traveling to one server.

🎬 Animation, explained

Claude Code (the MCP host) sits at the center and spawns one client per configured server.
The two link styles are the two transports: stdio for local servers (spawned as child processes) and Streamable HTTP for remote ones.
The dots flowing outward are tool calls; the ones flowing back are results — GitHub, filesystem, browser, and database all speak the same protocol.
Takeaway: MCP is a USB port for AI tools — once a server exists, any MCP host can plug it in unchanged.

MCP (Model Context Protocol) is the universal adapter that lets Claude talk to outside tools and data — GitHub, browsers, databases, docs, and your own scripts.

Without MCP, Claude only has its built-in tools (read, edit, bash). With MCP, you bolt on new tools that show up right inside the session. One claude mcp add command wires a server in for good.

When and why to add an MCP

Add a server when Claude keeps needing data or actions it can't reach on its own. The goal is fewer copy-pastes and more real work done in-session.

Reach live data

Query a database, fetch current library docs, or read your Notion — instead of pasting it by hand.

Take real actions

Open PRs on GitHub, drive a browser with Playwright, or send a Slack message.

Persist memory

A memory server lets Claude store and recall facts across sessions.

Wrap your own scripts

Expose an internal CLI or API as a tool your whole team can call.

Each connected server adds its tool definitions to context, which fills the window faster. Run /mcp to see per-server tool cost and disable servers you aren't using.

The three transports

A transport is just how Claude talks to the server. Pick based on where the server lives.

Transport	Use it for	Flag
stdio	Local subprocess servers on your machine	default (no flag)
HTTP	Remote / cloud servers (recommended)	`--transport http`
SSE	Legacy remote — deprecated, use HTTP	`--transport sse`

stdio: local process

Claude launches the server as a child process and talks over stdin/stdout. The -- separates Claude's flags from the command that starts the server — everything after it is passed through untouched.

claude mcp add playwright -- npx -y @playwright/mcp@latest

HTTP: remote server

Best for hosted/cloud services. Pass a name and a URL.

claude mcp add --transport http claude-code-docs https://code.claude.com/docs/mcp

All options (--transport, --env, --scope, --header) must come BEFORE the server name. For stdio, the command goes last after --.

Auth and environment variables

Use --env to pass secrets to a local server and --header to send auth on a remote one.

# Local server with an env var
claude mcp add github --env GITHUB_TOKEN=ghp_xxx -- npx -y @modelcontextprotocol/server-github

# Remote server with a bearer token
claude mcp add --transport http my-api --header "Authorization: Bearer $TOKEN" https://api.example.com/mcp

For remote servers that use OAuth, just run /mcp in-session and authenticate through the browser prompt — no manual token needed.

Scopes: who sees this server

Scope decides where the config is stored and who can use it. Set it with --scope.

Scope	Visibility	Stored in
`local` (default)	Just you, this project	`~/.claude.json` (under project path)
`project`	Whole team, shared	`.mcp.json` in repo root (commit it)
`user`	Just you, all projects	`~/.claude.json`

# Share a server with your team via version control
claude mcp add --scope project --transport http docs https://code.claude.com/docs/mcp

Older versions named scopes differently: today's local was project, and today's user was global. When the same server name exists in several scopes, precedence is local > project > user > plugin > connector (one definition wins, no merging).

The .mcp.json file

Project-scoped servers live in .mcp.json at your repo root, checked into git so the whole team gets them. Claude reads it only at session start — restart after editing.

{
  "mcpServers": {
    "docs": {
      "type": "http",
      "url": "https://code.claude.com/docs/mcp"
    },
    "playwright": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest"]
    }
  }
}

The type field accepts streamable-http as an alias for http, so configs copied straight from a server's docs work unchanged.

No secrets in the file

Use variable expansion so committed configs never hold real tokens. Supports ${VAR} and ${VAR:-default} in command, args, env, url, and headers.

{
  "mcpServers": {
    "my-api": {
      "type": "http",
      "url": "https://api.example.com/mcp",
      "headers": { "Authorization": "Bearer ${API_TOKEN}" }
    }
  }
}

The /mcp command

Run /mcp in any session to see configured servers, their connection status, and the tools each one exposes. It's also where you authenticate (OAuth) to remote servers.

Project servers from a cloned repo prompt for approval on first use — a guard so a malicious repo can't silently launch processes. Approve at the prompt or in /mcp. Reset all approvals with claude mcp reset-project-choices.

Managing servers from the CLI

claude mcp list                 # all configured servers
claude mcp get github           # show details + which scope holds it
claude mcp remove github        # remove (add --scope if in several)
claude mcp add-json db '{"type":"stdio","command":"..."}'
claude mcp add-from-claude-desktop   # import (macOS/WSL only)
claude mcp login github               # authenticate a server from the CLI
claude mcp logout github              # sign it back out (--no-browser works over SSH)

Example servers to start with

GitHub

Read repos, search code, open and review PRs, manage issues — all without leaving the session.

Playwright

Drive a real browser: navigate, click, fill forms, screenshot, and test web flows.

Docs (Context7 / Claude docs)

Pull current, version-accurate library documentation instead of relying on training data.

Memory

A knowledge-graph server that lets Claude store and recall facts across sessions.

Using MCP tools in automation

MCP tools are namespaced mcp__<server>__<tool>, which is how you reference them in --allowedTools for non-interactive runs.

claude -p "Open a PR for the fix" --allowedTools "mcp__github__create_pull_request"

Security: trust before you connect

An MCP server runs code on your machine (stdio) or receives your data and tokens (remote). A malicious or compromised server can read files, exfiltrate secrets, or perform actions as you. Only add servers you trust.

Prefer official servers and the Anthropic connector directory over random forks.
Pin versions for stdio servers rather than always pulling @latest from strangers.
Never hardcode secrets in .mcp.json — use ${VAR} expansion.
Treat tool descriptions and returned data as untrusted input (prompt-injection risk).
Keep project servers reviewed in code review, since they ship to the whole team via git.

Add servers narrowly: enable a heavy server (like a full browser or a broad API) only in the project that needs it, and disable it elsewhere to keep both your context window and your attack surface small.

🪝

SECTION 13

Hooks (Automation)

Deterministic guardrails the model can't skip

Mastered

⚡ WHY IT MATTERSHooks let you intercept Claude at fixed lifecycle moments — and PreToolUse can deny a tool before it ever runs.

A horizontal turn timeline showing hook taps fire in order (UserPromptSubmit, PreToolUse, PostToolUse, Stop) around tool execution, with PreToolUse able to block a tool via exit 2.

🎬 Animation, explained

The timeline is one turn of the agent loop, left to right: your prompt in, work done, response out.
The taps light up in their real firing order — UserPromptSubmit → PreToolUse → PostToolUse → Stop — around the tool-execution block in the middle.
PreToolUse pulses differently for a reason: it's the only tap that can block the tool before it runs (exit code 2 = veto).
Takeaway: hooks are deterministic shell commands at fixed points — unlike instructions in a prompt, they can't be forgotten.

Hooks run your own shell commands at fixed moments in Claude's lifecycle, giving you control the model cannot talk its way around.

A hook is a command that fires automatically on an event like a file edit or a finished response. Because the harness runs it (not Claude), the behavior is deterministic and reliable. Use hooks to auto-format code, block edits to protected files, or inject fresh context.

Hooks live under a top-level hooks key in any settings file: ~/.claude/settings.json (all projects), .claude/settings.json (project, committable), or .claude/settings.local.json (gitignored). All matching hooks merge and run together.

The lifecycle events

Events fire at three cadences: once per session, once per turn, and once per tool call. Pick the event that matches when you want to act.

Event	Fires when	Can block?
`SessionStart`	Session begins or resumes	No
`UserPromptSubmit`	You submit a prompt (before Claude reads it)	Yes (erases prompt)
`PreToolUse`	Before a tool call runs	Yes (blocks call)
`PostToolUse`	After a tool call succeeds	No (already ran)
`Stop`	Claude finishes responding	Yes (keeps going)

How the config nests

The structure has three levels: event name, then matcher groups, then handler objects. A matcher filters which tool triggers the hook.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "echo ran" }
        ]
      }
    ]
  }
}

Matcher rules: only letters/digits/_/| means an exact match with | as "or" (e.g. Edit|Write). A *, empty, or omitted matcher matches everything. Anything else is treated as a regex (e.g. mcp__.* matches all MCP tools). Matchers are case-sensitive.

Exit codes: how a hook talks back

A command hook receives event JSON on stdin and communicates through its exit code. This is the contract you must get right.

Exit code	Meaning
`0`	Success. stdout may carry JSON for fine control.
`2`	Blocking error. stderr is fed back to Claude as feedback.
other	Non-blocking error. A warning shows; execution proceeds.

Use exit 2 to enforce policy, not exit 1. Only exit 2 actually blocks. A PreToolUse hook that exits 2 blocks the tool even under bypassPermissions or --dangerously-skip-permissions — PreToolUse hooks run before any permission-mode check.

Use case 1: Auto-format after every edit

The classic PostToolUse hook. It reads the edited file path from the event JSON with jq and pipes it to a formatter.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path' | xargs -r prettier --write"
          }
        ]
      }
    ]
  }
}

Swap prettier for black, gofmt, or rustfmt depending on your stack. Add an "if": "Edit(*.py)" field to scope it to one file type.

Use case 2: Block edits to protected files

A PreToolUse hook can inspect the proposed tool call and refuse it. Exit 2 cancels the edit and tells Claude why.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "f=$(jq -r '.tool_input.file_path'); case \"$f\" in *.env|*/secrets/*) echo 'Protected file, blocked.' >&2; exit 2;; esac"
          }
        ]
      }
    ]
  }
}

For structured control on PreToolUse, exit 0 and print JSON instead: hookSpecificOutput.permissionDecision set to allow, deny, or ask. Pick exit-code signaling OR JSON, never both. When several hooks run, the most restrictive wins (deny > ask > allow). Note that allow still respects existing deny/ask permission rules.

Use case 3: Inject context at session start

For UserPromptSubmit and SessionStart, stdout is injected straight into Claude's context. That makes them perfect for feeding in live state like the current git branch.

{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup",
        "hooks": [
          { "type": "command", "command": "echo \"Branch: $(git branch --show-current)\"" }
        ]
      }
    ]
  }
}

The SessionStart matcher filters by source: startup, resume, clear, or compact.

Use case 4: Forced verification on Stop

A Stop hook runs when Claude thinks it's done. Run your tests there and return decision: "block" to push Claude to keep fixing until they pass.

Stop hooks fire every time Claude finishes — including right after the hook itself forces a continue. Always check the stop_hook_active field in the input and exit 0 when it is true, or you will create an infinite loop.

Handler types beyond shell

command

Run a shell command. Gets event JSON on stdin; replies via exit code and stdout.

http

POST the event to an endpoint — good for central audit or team policy servers.

mcp_tool

Invoke an MCP tool as the handler.

prompt / agent

Run an LLM prompt or a subagent to make a judgment call at the hook point.

Handlers also accept "timeout" (seconds) and "async": true. Use async for non-blocking work like an audit log — an unmatched PostToolUse hook that appends JSONL records every action without slowing Claude down.

Manage and inspect your hooks live with the /hooks slash command, which shows what is configured across all your settings files.

🆕

WHAT'S NEW · JULY 24, 2026

Claude Opus 5 & the Current Model Lineup

The latest Opus, how it differs from Fable 5, and evidence-aware model selection

Anthropic released Claude Opus 5 on July 24, 2026. It is the latest Opus for complex agentic coding and enterprise work, comes close to Fable 5's frontier intelligence at half Fable's base token price, and replaces Opus 4.8 as the default on Claude Max.

Positioning matters: Fable 5 remains Anthropic's highest-capability widely released model. Opus 5 is the strongest model on Claude Pro and the latest model in the Opus family—not a universal “best model.” Availability, cost, latency, task type, effort, plan, and platform still determine the right choice.

Opus 5 at a glance

Fact	Verified value
API model ID	`claude-opus-5`
Context and output	1M-token context window (default and maximum); 128k maximum output
Base API price	$5 input / $25 output per MTok
Prompt cache	512-token prompt-cache minimum; cache reads cost 0.1× base input
Thinking and effort	Adaptive thinking is on by default; `low`, `medium`, `high`, `xhigh`, and `max`
Knowledge	May 2026 knowledge and training cutoff
Availability	Claude API, Amazon Bedrock, Google Cloud, and Microsoft Foundry; Opus 4.8 remains available
Feature exceptions	Opus 5 carries the Opus feature set forward with two exceptions: the API Web Fetch tool is not available on Opus 5, and Priority Tier is not supported. Web Search is a separate tool and remains available.
Fast mode	Research preview on the Claude API (including Managed Agents) and in Claude Code via usage credits; not on Amazon Bedrock, Google Cloud, or Microsoft Foundry. Up to about 2.5× output tokens per second; $10 input / $50 output per MTok.

Model specifications and prices verified July 28, 2026 against Anthropic's model overview, pricing page, release notes, and Opus 5 System Card.

What changes from Opus 4.8

Thinking starts on

On Opus 4.8, thinking stayed off unless requested. On Opus 5, thinking is on by default and effort is the main control for reasoning depth, token use, and latency.

Effort has five levels

Start at the default high, step down when your evals preserve quality, use xhigh for demanding agentic work, and reserve max for capability-critical tasks.

One breaking restriction

Explicitly disabling thinking at xhigh or max returns a 400 error. Either keep thinking enabled or lower effort to high or below.

More proactive behavior

Default deliverables tend to be longer; agentic sessions narrate more, delegate to subagents more readily, and self-verify without being asked.

Practical migration

# Before
model = "claude-opus-4-8"

# After
model = "claude-opus-5"
# thinking is already adaptive; start with effort="high"

Change the model ID, then re-run your own quality, latency, and cost evals.
Increase max_tokens where needed because thinking and visible output share the hard output limit.
You should remove inherited verification instructions such as “always add a final verification subagent”; Opus 5 already self-checks and those prompts can cause unnecessary over-verification.
Specify concise progress updates or delegation limits if the model's longer narration or broader subagent use does not fit your workflow.
Plan an alternative if you call the API Web Fetch tool — it is not currently available on Opus 5. Web Search is a different tool and still works, so “Opus 5 cannot reach the web” is the wrong conclusion.
Check your API service tier: Priority Tier does not support Opus 5. This affects only organizations holding an existing capacity commitment — Anthropic no longer sells new ones — and it does not limit ordinary Opus 5 availability on the API or the cloud platforms.

Choose by workload, not by rank

Fable 5

Start here when the highest available widely released capability matters more than price: long-running agents, advanced research, or the hardest reasoning.

Opus 5

Use for complex agentic coding, large-scale refactors, enterprise knowledge work, vision-heavy workflows, and multihour autonomous execution.

Sonnet 5

Use for production coding and agents at scale when faster responses and lower cost matter more than the final increment of Opus capability.

Haiku 4.5

Use for real-time, high-volume, simple, and cost-sensitive workloads where lightweight speed is the main constraint.

Benchmark evidence, with provenance

Benchmarks are task-specific evidence, not a universal leaderboard. The owner-reported figures below were re-checked against the benchmark owners' own published result pages on July 28, 2026; the Frontier-Bench row stays an Anthropic-reported launch evaluation because no benchmark-owner Opus 5 score is published. Leaderboards move, so re-verify before you cite them.

Evaluation	What it measures	Reported result	Provenance and limitation
Frontier-Bench v0.1	Difficult coding and non-coding work in rich terminal environments	Anthropic reports Opus 5 led its internal run and more than doubled Opus 4.8 at lower cost per task.	Anthropic-run. Mini-SWE-agent harness, GKE backend, mean reward over five attempts; this is a model-plus-harness result.
CursorBench 3.2	Ambiguous, multi-file tasks drawn from real Cursor sessions	Opus 5 Max: 70.0%; Fable 5 Max: 70.5%, with lower reported task cost for Opus.	Cursor-reported. Cursor warns that small score differences may not be statistically meaningful.
ARC-AGI-3	Interactive novel-problem solving and action efficiency	Opus 5 at High effort: 30.2% rounded, which is 30.16% exact in ARC Prize's verified-score table. Same single result, two representations.	ARC Prize-verified. Evaluated only at High effort because of the short testing window; one benchmark does not establish general intelligence.
AutomationBench	Strict end-to-end completion of multi-app business workflows	Opus 5 Max: 26.2% on Zapier's current private held-out leaderboard.	Zapier-reported. Simulated tools, strict all-assertions pass/fail, and typical run-to-run variance within one point.

These results do not establish superiority for every workload. Effort, tools, harness, token budget, task set, price date, and evaluator all affect the comparison. Build a representative evaluation set for your own production decision.

Primary sources

On June 9, 2026 Anthropic released Claude Fable 5, its highest-capability widely released model, plus limited-access Claude Mythos 5 for approved Project Glasswing work.

Anthropic's June launch evaluation reported Fable 5 at or near the frontier across its tested suite. That is a dated vendor-reported result, not a claim that Fable wins every benchmark or workload.

What launched

Claude Fable 5

The highest-capability widely released tier, with broad safeguards. API id claude-fable-5. Use when its capability advantage is worth the $10 / $50 base price.

Claude Mythos 5

A specialized Mythos-class deployment with fewer safeguards for approved defensive cybersecurity workflows. Invitation-only through Project Glasswing, not general access.

The numbers

Item	Detail
API id	`claude-fable-5`
Price (in / out)	$10 / $50 per million tokens
Autonomy	Runs unattended across millions of tokens on long tasks
Availability	Live on the Claude API and consumption-based Enterprise plans. On subscription plans, Fable 5 now runs on usage credits — without credits enabled, access is cut off.

Why it's a big deal

Software engineering

Anthropic and launch partners reported strong results on long-horizon coding evaluations and large migration work. Treat partner case studies as workload-specific evidence.

Analysis & finance

Fable remains the highest-capability widely released starting point when deep analysis matters more than its higher price, but validate it against Opus 5 on your own tasks.

Science & vision

Use domain review and reproducible checks even when a launch evaluation reports gains; benchmark performance does not remove scientific verification requirements.

What it means for you

Reach for Fable 5 when a task is genuinely hard — large refactors, deep research, multi-step agentic runs — where a better answer is worth the higher token price.
Keep routing cheap, high-volume work to Haiku and everyday coding to Sonnet. Fable 5 is the premium tier, not the default for everything (see Pick the Right Brain).
In Claude Code, switch with /model claude-fable-5. Front-load context once so prompt caching keeps the cost down.

Safety: Fable 5 uses classifier-based safeguards. As of the Opus 5 launch, blocked biology requests can route to Opus 5, while some flagged cyber requests can still fall back to Opus 4.8. Fallback behavior is topic- and product-dependent, so do not treat it as a general model-selection guarantee.

Also new: Claude Sonnet 5

The Sonnet tier got its own generational jump. Claude Sonnet 5 (claude-sonnet-5) replaces Sonnet 4.6 as the everyday workhorse, reaching what used to be Opus-tier quality on coding and agentic work at Sonnet cost.

Near-Opus coding

The biggest gains are in coding and agentic tasks — Sonnet 5 at medium effort is roughly comparable to Sonnet 4.6 at high, and it's the first Sonnet with the xhigh effort level for the hardest work.

Intro pricing

Sticker price stays $3 / $15 per MTok, with introductory pricing of $2 / $10 through Aug 31, 2026. 1M context window, 128k max output, high-resolution vision.

Watch the tokenizer

Sonnet 5 uses a new tokenizer: the same text costs ~30% more tokens than on 4.6. Adaptive thinking is also on by default — set it to disabled explicitly if you relied on 4.6's thinking-off default.

Source: anthropic.com/news/claude-fable-5-mythos-5

Also shipping this summer: Claude Code, May–July 2026

The model wasn’t the only thing moving. A run of Claude Code releases landed across early summer — here’s what’s worth knowing.

🌐 Claude in Chrome (GA)

A browser extension that lets Claude drive a real browser — read pages, click, fill forms, juggle tabs, and work inside authenticated apps. Generally available on Pro, Max, Team, and Enterprise.

🧠 Defaults have moved again

Sonnet 5 became the production default across several plans in June. On July 24, Opus 5 became the default on Claude Max and the strongest model available on Claude Pro. Always check the current product picker.

🤖 Subagents run in the background

By default now — your main session stays responsive while delegated work runs. Background chains nest up to 5 levels deep.

👥 Agent teams (experimental)

Orchestrate several Claude Code sessions as a team — a lead plus teammates that share a task list and message each other. Opt in with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.

🩺 /doctor now fixes

The setup checkup (/doctor, alias /checkup) went from diagnosing issues to actually fixing them.

🖥️ In-app browser on Desktop

Claude Desktop can now pull up docs and interact with sites in-app — no context switch.

🛡️ Auto mode, hardened

Auto mode now blocks transcript tampering and asks before destructive commands like rm -rf.

↩️ /rewind past /clear & /cd

/rewind can now resume from before a /clear, and /cd <path> switches working directory while preserving the prompt cache.

This surface moves fast. For the authoritative, dated list, see the official Claude Code “What’s New” weekly digests.

🧩

SECTION 14

Plugins & Marketplaces

Bundle your whole setup into one installable unit

Mastered

⚡ WHY IT MATTERSOne plugin bundles commands, agents, skills, hooks, and MCP servers so a whole capability set installs into any project or team in one move.

A single Plugin package bundling Commands, Agents, Skills, Hooks, and MCP servers, installing into a Claude Code project and staying portable across other projects and teams.

🎬 Animation, explained

The outer container is one plugin — a single installable unit.
Inside it, five component types travel together: slash commands, subagents, skills, hooks, and MCP servers.
The animation shows the whole bundle dropping into a Claude Code project in one install step — no five separate setups.
Then it lifts out and lands again unchanged: plugins stay portable across machines, projects, and marketplaces.

A plugin packages commands, agents, skills, hooks, and MCP servers into a single thing you install once and reuse everywhere.

Building a great Claude Code setup takes work: custom skills, subagents, format-on-save hooks, the right MCP servers. A plugin wraps all of that into one directory you can share. Install it on a new machine, hand it to a teammate, or drop it into every repo and your capabilities travel with you.

What lives inside a plugin

A plugin is a self-contained folder. Components are auto-discovered from the directory layout at the plugin root, so you rarely have to wire anything up by hand.

skills/

Each skill is a folder with a SKILL.md. Namespaced as /plugin-name:skill-name to avoid collisions.

agents/

Subagents shipped with the plugin. (Plugin agents ignore hooks, mcpServers, and permissionMode for security.)

hooks/hooks.json

Lifecycle hooks (auto-format, audit logs, policy gates) that activate when the plugin is enabled.

.mcp.json

MCP servers the plugin needs, started automatically alongside the plugin.

bin/, themes/, output-styles/

Executables added to the Bash PATH, plus themes and output styles.

.claude-plugin/plugin.json

The only file inside .claude-plugin/. It is OPTIONAL — used to set name, version, and author.

Components live at the plugin ROOT, not inside .claude-plugin/. Only plugin.json goes in .claude-plugin/.

The /plugin manager

Run /plugin to open the manager — a tabbed interface you cycle with Tab / Shift+Tab.

Tab	What it does
Discover	Browse plugins from every marketplace you've added.
Installed	Enable, disable, or uninstall, grouped by scope.
Marketplaces	Add, remove, or update marketplace sources.
Errors	See what failed to load and why.

Recent versions also show a Context cost estimate, a Last updated date, and a 'Will install' breakdown listing exactly which commands, agents, skills, hooks, and MCP/LSP servers a plugin adds before you commit.

Install your first plugin

Anthropic ships a curated marketplace, claude-plugins-official, that is available automatically in every install — no setup needed.

# Install the official GitHub plugin (user scope by default)
/plugin install github@claude-plugins-official

# Activate it this session without restarting
/reload-plugins

The syntax is always plugin-name@marketplace-name. That @marketplace suffix is how Claude knows which catalog the plugin came from.

After installing, enabling, or disabling mid-session, run /reload-plugins to apply changes instead of restarting Claude.

Add a marketplace

A marketplace is just a repo (or hosted file) with a .claude-plugin/marketplace.json catalog at its root. Add one, then install from it.

# Add Anthropic's community marketplace
/plugin marketplace add anthropics/claude-plugins-community

# Now install any plugin it lists
/plugin install commit-commands@claude-community

Sources are flexible. Use whichever fits where your plugins live.

Source type	Example
GitHub owner/repo	`anthropics/claude-code`
Git URL (GitLab, self-hosted)	`https://gitlab.com/team/cc-plugins.git`
Local path	`./my-marketplace` or a `marketplace.json`
Hosted URL	remote `marketplace.json`

Shortcuts: /plugin market works for marketplace, and rm works for remove.

Share a plugin with your whole team

This is the real payoff. Install at project scope and the plugin is recorded in enabledPlugins in .claude/settings.json — which you commit. Every teammate who clones the repo gets the same setup.

# Write the plugin into the shared project settings file
/plugin install lint-suite@claude-community --scope project

One reviewer agent, one format hook, one test command — defined once, used by everyone. No more 'works on my machine' tooling drift.

Scripted install (CLI, non-interactive)

For CI, dotfiles, or onboarding scripts, use the non-interactive claude plugin subcommands.

# Add a marketplace and install without the UI
claude plugin marketplace add anthropics/claude-plugins-community
claude plugin install commit-commands@claude-community

# Audit what's installed (machine-readable)
claude plugin list --json

Validate before you publish: claude plugin validate checks plugin.json, every skill/agent frontmatter, and hooks.json. Scaffold a fresh one with plugin init --with skills,agents,hooks,mcp.

Build a plugin in five steps

Run plugin init (or make a folder) and drop components into skills/, agents/, hooks/, and .mcp.json at the root.
Reference bundled files with ${CLAUDE_PLUGIN_ROOT} in hooks and MCP configs — plugins are copied to a cache, so absolute paths break.
Optionally add .claude-plugin/plugin.json with a name, version, and author. Skip version and the git commit SHA is used.
Run claude plugin validate to catch frontmatter and manifest errors.
Push to a repo, add a .claude-plugin/marketplace.json catalog, and share the owner/repo as a marketplace source.

Skip the marketplace entirely for local dev: any folder under a skills directory with a .claude-plugin/plugin.json loads in place as name@skills-dir on the next session — no install, no cache. Great for iterating on a plugin you're authoring.

Anthropic does NOT verify plugin contents. A plugin can add hooks, MCP servers, and bin/ executables that run on your machine. Read the 'Will install' list and trust the source before installing.

Why plugins win

Skills, agents, and hooks scattered across ~/.claude/ and individual repos are hard to move and easy to forget. A plugin makes that whole stack portable: version it, audit its context cost in /plugin, enable it per project, and update it when the source changes.

Think in layers: user scope for your personal toolkit on every project, project scope (committed) for the standards your whole team must share.

🔀

SECTION 15

Multi-Agent Workflows & Loops

From one agent to a coordinated team — without burning your token budget

Mastered

⚡ WHY IT MATTERSFan out independent work to parallel subagents, then pipeline through compose and verify to ship complex tasks in a fraction of the time.

A Coordinator fans out to parallel Research subagents whose outputs pipeline through parallel Compose then Verify stages before fanning in to a merged result.

🎬 Animation, explained

The Coordinator at the top splits the job into independent research streams — the first fan-out.
Each stream then flows through its own Compose and Verify stages as a pipeline: stream A can be verifying while stream B is still researching. No stage waits for stragglers.
The verified outputs fan back in to a single merge point, where the coordinator synthesizes one answer.
Takeaway: fan-out → pipeline → fan-in is the master shape of multi-agent work; everything else is a variation.

One agent is a worker; many agents are an assembly line that you steer from a single chair.

Subagents each run in their own fresh context window with their own tools. They do the heavy reading, then hand back a short summary. That is the whole trick: keep your main context clean while the noisy work happens elsewhere.

A subagent starts cold. It has no memory of your chat, must gather its own context, and returns only what it reports. Great for self-contained work, bad for tasks needing constant back-and-forth.

When orchestration is worth it (and when it is not)

The core constraint never changes: context fills fast and quality drops as it fills. Multi-agent setups exist to protect that budget, not to look clever. Spending more tokens only pays off when the work is parallelizable or verbose.

Use subagents / loops when	Stay in the main chat when
Task spits out verbose output you do not need to keep	You need frequent back-and-forth steering
Several investigations are independent (fan-out)	Phases share the same context
You want to enforce tool/permission limits	The change is a quick one-line diff
Work is self-contained and returns a summary	Latency matters (cold start costs time)

Nested delegation is now real: subagents can spawn their own subagents up to 5 levels deep. Use it sparingly — every level adds cold-start cost and makes the run harder to follow. For anything genuinely large, prefer a dynamic workflow (below), which keeps the whole tree coordinated and visible.

Pattern 1: Parallel fan-out (research)

Spin up several agents at once on independent slices of a problem. Each explores in its own context; the main thread then synthesizes their summaries. This works best when the paths truly do not overlap.

Investigate our auth system. Run three subagents in parallel:
1. Explore: map every login/session file and how tokens are issued
2. Explore: list all places that read or write user roles
3. Explore: find rate-limiting and lockout logic
Then summarize the attack surface in one page.

The built-in Explore agent is perfect here: it is Haiku-backed, read-only, and skips CLAUDE.md and git status to start lean. Cheap, fast, disposable.

Scope every fan-out task narrowly. An unscoped "investigate the codebase" causes infinite exploration that floods context. Name files, set boundaries, demand a short report.

Pattern 2: The pipeline (research → compose → verify)

Some work is a relay, not a sprint. Each stage runs as a separate subagent so a bloated research phase never pollutes the writing phase. Define small, single-purpose agents in .claude/agents/.

---
name: researcher
description: Gathers facts and returns a tight brief. Use proactively.
tools: Read, Grep, Glob, Bash
model: haiku
---
You investigate, then return ONLY a bulleted brief with file paths.
Never edit files. Never speculate beyond what you found.

A reviewer agent caps cost the same way by restricting tools and using a cheaper model:

---
name: verifier
description: Checks work against the brief and the tests.
tools: Read, Bash
model: sonnet
disallowedTools: Write, Edit
---
Run the test suite. Report PASS/FAIL plus the smallest fix list.

Match the model to the job. Use haiku for search and high-volume sub-tasks, sonnet for most production work, and opus for long-horizon reasoning. The model frontmatter field defaults to inherit.

Pattern 3: Find → fix → test loops

An autonomous loop repeats a step until a quality gate passes. The cleanest gate in Claude Code is a Stop hook: it fires when Claude finishes and can refuse to let it stop until tests are green.

{
  "hooks": {
    "Stop": [{
      "hooks": [{
        "type": "command",
        "command": "npm test --silent || { echo 'TESTS FAILED' >&2; exit 2; }"
      }]
    }]
  }
}

Have the hook script exit 2 on failure: that blocks the stop and feeds stderr back to Claude as a to-do, so it keeps fixing. Exit 0 lets it finish.

Always guard against infinite loops. Check the stop_hook_active field in the hook input and exit 0 when it is true, or your loop never ends.

Headless loops for scripts and CI

Drive Claude non-interactively with -p (print mode). Capture the session ID as JSON, then resume it to chain steps without losing state.

sid=$(claude -p "Find the failing test and its cause" \
        --output-format json | jq -r '.session_id')

claude -p "Fix it, then run the suite until green" \
        --resume "$sid" \
        --permission-mode acceptEdits \
        --allowedTools "Edit,Bash(npm test)"

For reproducible CI runs, add --bare to skip auto-discovery of hooks, skills, MCP, and CLAUDE.md, then load only what you need explicitly. It is the recommended mode for scripted and SDK calls.

--permission-mode dontAsk denies anything not in your allow rules; acceptEdits auto-approves file writes plus safe filesystem commands. Pair with --allowedTools so locked-down loops never hang on a prompt.

Dynamic workflows: orchestration Claude writes for you

The patterns above are ones you design by hand. Dynamic workflows (GA across the CLI, Desktop, and VS Code) let Claude design the orchestration itself: from your prompt it writes an orchestration script that fans work out across tens to hundreds of parallel subagents in a single background run, checks the results, and hands you one coordinated answer. Work you'd normally scope in quarters can land in days.

Two ways to start one:

# 1. Just ask
> create a workflow to migrate every component off the legacy API

# 2. Turn on ultracode — sets effort to xhigh and lets Claude
#    decide when a task is big enough to warrant a workflow
/config    # enable "ultracode" (also on the effort menu)

/workflows # watch runs: live agent tree, status, per-run activity

How it reaches answers a single pass can't: agents attack the problem from independent angles, other agents try to refute what they found, and the run keeps iterating until the answers converge. Progress is saved as it goes, so an interrupted job resumes instead of restarting, and because the coordination happens outside your conversation the plan stays on track no matter how large the task grows.

Dynamic workflows consume meaningfully more usage than a normal session. The first time one triggers, Claude Code shows what's about to run and asks you to confirm; a "Dynamic workflow size" setting (small/medium/large) advises how many agents to spawn, and org admins can disable workflows via managed settings.

On by default for Max, Team, Enterprise, and API use; Pro users enable them in /config. Pair with auto mode for the smoothest experience — each subagent spawn is checked by the permission classifier before it launches.

Background agents: dispatch, detach, come back to a PR

Long jobs don't block you. Subagents run in the background by default, and the claude agents view is mission control for parallel background sessions — dispatch new ones, peek at status, attach to steer, detach to move on.

claude agents

The background-session dashboard. Sessions that need input or finish fire the Notification hook (agent_needs_input / agent_completed). --json (with --all) prints them for scripting.

! background shell

In the agents view, type ! <cmd> (or claude --bg --exec '<cmd>') to run a shell command as an attachable background session.

Auto draft PRs

A background agent that finishes code work in a worktree now commits, pushes, and opens a draft PR on its own instead of stopping to ask.

isolation: worktree

Runs a subagent in a throwaway git worktree so parallel agents never collide; memory: project gives it persistent cross-session memory.

Point research/exploration subagents at haiku (model: haiku in frontmatter): they read a lot of files and burn context, so running the sweep cheaply keeps Opus free on the main thread. And ask agents for evidence — the test output, the command and what it returned, a screenshot — not just a claim of success; reviewing evidence is faster than re-running the check, especially for sessions you weren't watching.

Agent teams: many sessions, one goal (experimental)

Subagents and dynamic workflows both run inside a single session. Agent teams go one level up: they coordinate several independent Claude Code sessions — a lead and its teammates — that share one task list and message each other directly, so different agents can own different parts of a large job in parallel.

# Off by default — enable the experiment, then start a team
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

Experimental and off by default. It’s distinct from subagents (delegation within your session) and dynamic workflows (orchestration Claude authors for you) — reach for a team only when you genuinely want multiple peer sessions collaborating, not just fanned-out workers.

Token discipline for any multi-agent run

Plan in plan mode first (Shift+Tab) so a stable prefix caches once and every execution turn reads from it.
Delegate broad searches to subagents; let only their short summaries enter main context.
Write the plan to a file like PLAN.md so it survives /clear between phases.
/clear between unrelated tasks — fresh context is free.
Check spend with /cost and live usage with /context.

Reserve opus and ultrathink for the orchestrator's planning and synthesis, where reasoning depth pays off. Let cheap worker agents do the grunt work. That is where heavy orchestration actually beats a single agent on both quality and cost.

🛠️

SECTION 16

Claude API & Agent SDK

Build on Claude — the raw Messages API and the Agent SDK that powers Claude Code

Mastered

⚡ WHY IT MATTERSThe agent loop is just one self-feeding cycle: the model keeps calling tools until it has enough to answer.

A cycle showing Prompt feeding the Model, which emits tool_use to run a tool, whose tool_result loops back to the Model, repeating until the loop exits to the Final answer.

🎬 Animation, explained

Your prompt enters the model, which decides it needs real-world information and emits a tool_use request instead of a final answer.
The dot carries that request to the tool — your code runs it (an API call, a file read, a query).
The tool_result loops back into the model, which now reasons with fresh data — and may call another tool.
The circling continues until the model stops asking for tools and answers. That loop is an agent; everything else is packaging.

There are two ways to build on Claude: call the model directly with the Messages API, or reuse Claude Code's whole agent loop with the Agent SDK.

Pick the layer that matches your job. The Messages API is one stateless HTTP call where you own everything. The Agent SDK wraps the API in a ready-made agent loop with tools, memory, and MCP. Claude Code is the interactive CLI built on that same SDK.

Messages API

One model call. You write the tool loop, manage history, and control every byte. Best for chat, classification, extraction, RAG.

Agent SDK

The Claude Code agent loop as a library. File tools, Bash, MCP, and subagents work out of the box. Best for headless automation.

Claude Code

The interactive terminal app on top of the SDK. Best for hands-on coding and exploration, not unattended pipelines.

Layer 1 — The Anthropic Messages API

Every request is a POST to /v1/messages at https://api.anthropic.com. It is stateless: you resend the full conversation on each call (up to 100,000 messages). Three headers are required.

Header	Value
`x-api-key`	your API key
`anthropic-version`	`2023-06-01`
`content-type`	`application/json`

Current models & list pricing

Model ID	Context	Max output	$/MTok (in / out)
`claude-opus-5`	1M	128k	$5 / $25
`claude-sonnet-5`	1M	128k	$2 / $10 introductory
`claude-haiku-4-5`	200k	64k	$1 / $5

API prices verified July 28, 2026. Sonnet 5 moves to $3 / $15 on September 1, 2026.

Batch API is 50% off and cache reads cost roughly 1/10th of base input. Route cheap work to Haiku, hard reasoning to Opus.

Deprecations: claude-sonnet-4-20250514 and claude-opus-4-20250514 retired June 15, 2026; claude-opus-4-1 retires August 5, 2026. For current new integrations, start with Sonnet 5 or Opus 5 and verify lifecycle dates in the model-deprecations page.

Minimal Python request

pip install anthropic   # Python 3.9+, reads ANTHROPIC_API_KEY from env

from anthropic import Anthropic

client = Anthropic()
msg = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    system="You are a terse senior network engineer.",
    messages=[{"role": "user", "content": "Explain BGP route reflectors in 2 sentences."}],
)
print(msg.content[0].text)

Minimal TypeScript request

npm install @anthropic-ai/sdk   // 4.9+, Node 20+; also Deno/Bun/browser

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // reads ANTHROPIC_API_KEY
const msg = await client.messages.create({
  model: "claude-sonnet-5",
  max_tokens: 1024,
  system: "You are a terse senior network engineer.",
  messages: [{ role: "user", content: "Explain BGP route reflectors in 2 sentences." }],
});
console.log(msg.content[0].type === "text" && msg.content[0].text);

System prompts use the top-level system parameter (string or array of text blocks); there is no system role in the Messages API array. Opus 5 supports beta mid-conversation tool changes that preserve the prompt cache when tools are added or removed between turns.

Tool use (function calling) & the agentic loop

Pass a tools array — each tool has a name, description, and JSON-Schema input_schema. When Claude wants a tool it returns stop_reason: "tool_use". You run the tool and feed the result back as a tool_result block, then loop.

tools = [{
    "name": "get_weather",
    "description": "Current weather for a city",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

messages = [{"role": "user", "content": "Weather in Detroit?"}]
while True:
    resp = client.messages.create(
        model="claude-sonnet-5", max_tokens=1024, tools=tools, messages=messages
    )
    messages.append({"role": "assistant", "content": resp.content})
    if resp.stop_reason != "tool_use":
        break
    results = []
    for block in resp.content:
        if block.type == "tool_use":
            out = run_my_tool(block.name, block.input)   # your code
            results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": out,
            })
    messages.append({"role": "user", "content": results})

Loop while stop_reason == "tool_use"; exit on end_turn, max_tokens, stop_sequence, or refusal. Control calls with tool_choice (auto, any, named tool, or none) and disable_parallel_tool_use: true to force at most one call.

Server-executed tools run inside Anthropic's infra — you only enable them and read results. Example: {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}. Long server loops may return stop_reason: "pause_turn"; just resend the response to continue.

Streaming, vision & PDFs

Set stream: true for SSE. The event flow is message_start then per-block content_block_start / _delta / _stop, then message_delta and message_stop. SDK helpers hide the plumbing.

with client.messages.stream(
    model="claude-sonnet-5", max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about subnets."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()

All current models accept images via {"type":"image","source":{...}} (base64 with media_type, or a URL source). PDFs go through document blocks and Claude reads both text and visuals.

messages=[{"role": "user", "content": [
    {"type": "image", "source": {"type": "url", "url": "https://example.com/diagram.png"}},
    {"type": "text", "text": "What VLAN topology is this?"},
]}]

SDKs require streaming for large max_tokens to avoid HTTP timeouts on long generations.

Prompt caching — cut repeat cost ~90%

Mark stable, reused content (big system prompts, docs, tool defs) with cache_control. Cache reads cost about 1/10th of base input. Place the breakpoint after the static prefix.

system=[
    {"type": "text", "text": LONG_STYLE_GUIDE,
     "cache_control": {"type": "ephemeral"}},
]

TTL is 5 minutes by default or 1 hour (GA, no beta header). Minimum cacheable length is 512 tokens on Opus 5, 1,024 on Opus 4.8, and 4,096 on Haiku 4.5. Usage reports cache_creation_input_tokens and cache_read_input_tokens.

Pre-warm the cache by sending max_tokens: 0 — the API writes the cache at the breakpoint and returns an empty content array with stop_reason: "max_tokens" and full usage, generating no output. (Rejected inside Batch requests.)

Batch API — 50% off, async at scale

For non-urgent bulk jobs (evals, backfills, large extractions), submit many requests at once for half price. Poll until done, then read results.

batch = client.messages.batches.create(requests=[
    {"custom_id": "job-1",
     "params": {"model": "claude-haiku-4-5", "max_tokens": 256,
                "messages": [{"role": "user", "content": "Classify: ..."}]}},
])
# poll batch.processing_status until == "ended", then:
for r in client.messages.batches.results(batch.id):
    print(r.custom_id, r.result)

The output-300k-2026-03-24 beta header raises max_tokens to 300k on Opus 5, Opus 4.8/4.7/4.6, Sonnet 5, and Sonnet 4.6 — on the Message Batches API only, not the synchronous Messages API, where Opus 5 stays at 128k. Extended output is not available on Amazon Bedrock, Google Cloud, or Microsoft Foundry.

Extended thinking & structured outputs

Opus 5 and Sonnet 5 use adaptive thinking by default; set effort to control reasoning depth, token use, and latency. Earlier Claude 4/4.5 and Sonnet 4.6 integrations may use {"type":"enabled","budget_tokens": N}, while Opus 4.6/4.7/4.8 use {"type":"adaptive"}. With tool use, pass thinking blocks back unchanged and verify the supported tool_choice modes for the selected model.

Structured outputs are GA on Sonnet 4.5 / Opus 4.5 / Haiku 4.5 via output_config.format (no beta header) when you need guaranteed JSON shapes.

Official SDKs also cover Java 8+, Go 1.23+, Ruby 3.2+, C#, and PHP 8.1+, plus an ant CLI. Companion endpoints: Models API, Token Counting API, Files API, and the Admin API.

Layer 2 — The Claude Agent SDK

The Agent SDK (formerly the Claude Code SDK) runs the exact same agent loop that powers Claude Code, inside your process. It handles tool execution, context management, retries, and prompt caching so you don't write the tool loop yourself.

# Python (3.10+)
pip install claude-agent-sdk

# TypeScript — bundles a native Claude Code binary, no CLI install needed
npm install @anthropic-ai/claude-agent-sdk

Tiny Python agent

import anyio
from claude_agent_sdk import query

async def main():
    async for message in query(
        prompt="Find every TODO in src/ and summarize them."
    ):
        print(message)

anyio.run(main)

Python has two entry points: query() creates a fresh session per call and returns an async iterator; ClaudeSDKClient reuses one session across exchanges with manual lifecycle (connect / query / receive_response / interrupt / disconnect) and supports interrupts.

Tiny TypeScript agent

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Find every TODO in src/ and summarize them.",
})) {
  console.log(message);
}

In TypeScript, query() returns an async generator (the Query object) that maintains the session and exposes interrupt(), setPermissionMode(), setModel(), setMcpServers(), and close().

Built-in tools — nothing to implement

Files

Read, Write, Edit, Glob, Grep

Shell & web

Bash, WebSearch, WebFetch, Monitor

Orchestration

Agent (subagents), Skill, TaskCreate/TaskUpdate, ToolSearch, AskUserQuestion

Read-only tools (Read, Glob, Grep, read-only MCP) can run concurrently; state-mutating tools (Edit, Write, Bash) run sequentially.

Permission modes — the safety dial

Mode	Behavior
`default`	Unmatched tools trigger the `canUseTool` callback
`acceptEdits`	Auto-approves file edits + filesystem commands (mkdir/rm/mv/cp)
`plan`	Read-only tools only
`dontAsk`	Denies anything not pre-approved (never calls the callback) — for CI
`bypassPermissions`	Approves all, but deny rules and hooks still apply

Evaluation order: hooks first, then deny rules (a deny blocks even in bypassPermissions), then the mode, then allow rules, then canUseTool. allowed_tools only pre-approves — it does NOT restrict Claude to those tools.

Custom tools, MCP & subagents

Define in-process tools with the @tool decorator (Python) or tool() helper (TypeScript, Zod schema), bundle with create_sdk_mcp_server() / createSdkMcpServer(), and pass via mcpServers. Return isError: true to signal failure without halting the loop; an uncaught exception stops it.

from claude_agent_sdk import tool, create_sdk_mcp_server

@tool("add", "Add two numbers", {"a": float, "b": float})
async def add(args):
    return {"content": [{"type": "text", "text": str(args["a"] + args["b"])}]}

server = create_sdk_mcp_server(name="math", tools=[add])
# pass server via the mcp_servers option of query()/ClaudeSDKClient

External MCP servers go through mcpServers/mcp_servers or a .mcp.json file, over stdio, HTTP/SSE, or in-process. Tool names follow mcp__{server}__{tool}; auto-approve with wildcards like mcp__github__*.

Subagents run in a fresh, isolated context and return only their final answer to the parent — keeping the main context lean. Define them via the agents option, as markdown in .claude/agents/, or use general-purpose. They are invoked via the Agent tool (renamed from Task in v2.1.63), so add Agent to allowedTools to auto-approve. Subagents can spawn their own subagents, up to 5 levels deep.

Hooks, sessions & cost caps

Hooks fire at lifecycle points (PreToolUse, PostToolUse, UserPromptSubmit, Stop, SubagentStop, PreCompact, and more). A PreToolUse hook can return permissionDecision allow/deny/ask/defer and even rewrite the input.

Capture ResultMessage.session_id to resume later via resume. The SDK also supports forking sessions, file checkpointing (rewind files to a prior message), reasoning effort (low→max), and guardrails max_turns and max_budget_usd.

Headless / CI automation

For pipelines, drive Claude Code non-interactively with claude -p.

claude -p "run the test suite and fix any failures" \
  --allowedTools "Bash,Read,Edit" \
  --permission-mode acceptEdits \
  --output-format json

Use --bare for reproducible CI: it skips auto-discovery of hooks, skills, plugins, MCP servers, auto-memory, and CLAUDE.md (auth must come from ANTHROPIC_API_KEY). Anthropic says --bare will become the default for -p in a future release.

Starting June 15 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive limits.

Which layer should I use?

You want to…	Use
Add chat, classify/extract/summarize text, build RAG	Messages API
Run a model with full control over the tool loop and history	Messages API
Ship an autonomous agent that reads/writes files, runs Bash, uses MCP, spawns subagents	Agent SDK
Automate coding tasks headless in CI	Agent SDK / `claude -p`
Code hands-on in the terminal, explore a repo interactively	Claude Code CLI

Rule of thumb: if you'd otherwise hand-write a tool loop with file access, retries, and context management, the Agent SDK already did it. If you just need the model's output, the Messages API is leaner. Full docs: docs.claude.com and code.claude.com/docs/en/agent-sdk/.

🎯

SECTION 17

Prompting for Best Results

Precise input, powerful output — stop guessing, start steering

Mastered

Claude can infer intent, but it can't read your mind — the more precise your instructions, the fewer corrections you'll need.

Great results come from clear inputs. Tell Claude what to do, give it context, show it an example, and set the rules for "done." Then iterate in small steps and fix course early.

Anthropic's core rule: "The more precise your instructions, the fewer corrections you'll need." Vague asks waste tokens on guessing; specific asks land on the first try.

The seven habits of a strong prompt

Be specific

Name the exact task, file, and outcome. Replace "fix the bug" with the symptom, the file, and the expected behavior.

Give context

Say why, who it's for, and what came before. Context shapes the right answer; missing context invites wrong guesses.

Show an example

Point to a pattern Claude should copy: "follow the style in userService.ts."

State constraints

List the rules: no new dependencies, keep it under 50 lines, must pass pytest.

Define "done"

Give acceptance criteria so Claude can self-check before stopping.

Plan first

Ask for a plan before any edits. Review it, then say go. Cheaper to fix a plan than a diff.

Iterate small

One scoped change at a time beats one giant request. Smaller steps are easier to verify and correct.

Before and after: vague vs strong

The same goal, written two ways. The strong version names the file, the rule, and the test that proves success.

Vague (forces guessing)

fix the login bug

Strong (one-shot ready)

In @src/auth/login.ts, users with expired sessions get a
500 instead of a 401 redirect to /login.

Reproduce: call loginHandler() with an expired token.
Constraint: don't change the token schema; reuse the
existing redirectToLogin() helper.
Done when: the expired-token test in auth.test.ts passes
and `npm test auth` is green.

Notice the four parts: where (file), what's wrong (symptom), the rules (constraints), and the proof (acceptance test). Most strong prompts hit all four.

Point at files with @

Typing @ opens a file picker so Claude reads the exact file — no broad searching that floods context. Reference the real path instead of describing the code.

Refactor @src/api/users.py to use the repository pattern
in @src/api/orders.py as the reference. Keep public
function signatures unchanged.

File reads dominate context usage. Pointing Claude at one or two specific files keeps the window lean and the cache stable, so follow-up turns stay fast and cheap.

Ask for a plan first

Plan Mode is the highest-leverage habit. Toggle it with Shift+Tab (cycles normal / plan / auto-accept). In plan mode Claude can read, search, and think — but it literally cannot edit files until you approve.

Enter plan mode with Shift+Tab or launch with claude --permission-mode plan.
Let Claude explore the relevant files and propose a plan.
Read the plan, correct the approach, then approve it.
Claude executes against a stable, cached context.

Skip plan mode for one-sentence diffs — typos, log lines, renames. Use it for anything multi-file or risky.

Dial reasoning with thinking keywords

Extended thinking is on by default. Escalate the budget with keywords when a problem is genuinely hard.

Phrase	Use for
`think`	Moderately tricky logic or a small refactor
`think hard` / `think harder`	Multi-file changes, non-obvious bugs
`ultrathink`	Architecture, hard debugging, costly decisions

More thinking improves reasoning depth, not rule compliance. Ultrathink can reason for hundreds of tokens and still break an explicit output rule — so still state your constraints plainly.

Correct course early

The moment an answer drifts, stop it. Don't let a wrong approach pile up in context.

Stop it

Press Esc to interrupt mid-action with context preserved, then redirect.

Erase the failure

Press Esc Esc (or /rewind) to undo a bad attempt. Rewinding deletes it from context entirely — cleaner than correcting on top of it.

Reset after two misses

After two failed corrections on the same issue, run /clear and re-prompt with a sharper ask. Fresh context is free.

Correcting leaves the failed approach polluting the window; rewinding removes it. Prefer rewind over "no, try again" when an attempt goes off the rails.

Do / Don't grid

Do	Don't
Name the exact file: `@src/db/pool.ts`	Say "the database file"
State acceptance criteria ("done when tests pass")	Leave "done" undefined
Ask for a plan before edits	Let Claude rewrite five files unprompted
Point to a pattern to copy	Assume Claude knows your house style
Scope investigations narrowly	Say "investigate the codebase" (floods context)
Iterate one change at a time	Bundle ten unrelated asks in one prompt
`/clear` between unrelated tasks	Pile new tasks onto stale context
Rewind a failed attempt	Keep correcting on top of a broken approach

Delegate research to keep context clean

Unscoped "investigate" requests cause endless exploration that floods your main window. Hand research-heavy work to a subagent that runs in its own fresh context and returns only a short summary.

Use the Explore subagent to find every place we call the
Stripe API, then report back a one-paragraph summary with
file paths. Don't edit anything.

Save reusable plans and decisions to a file like PLAN.md. It survives /compact and /clear, so a fresh session can pick up exactly where you left off.

💬

SECTION 18

Claude.ai: Chat, Projects & Artifacts

Turn the chat box into a coworking studio with persistent knowledge and live artifacts

Mastered

The Claude.ai app is more than a chat box: Projects give Claude a memory for your work, Artifacts give it a canvas, and styles give it your voice.

Most people just type into the chat and start over every time. Power users wire up Projects, Artifacts, and custom instructions so Claude shows up already knowing the context. This section shows how those pieces snap together into a real coworking setup.

Chat

The conversation. Where you ask, refine, and steer.

Projects

A workspace with its own chat history, knowledge base, and instructions.

Artifacts

A live side window rendering code, docs, and apps next to the chat.

Styles

Reusable voice and formatting rules applied to Claude's output.

Projects: a memory for a body of work

A Project is a self-contained workspace. It keeps its own chat history plus a knowledge base of files you upload once and reuse in every chat inside it. Projects are on every plan, including Free, which is capped at 5 projects.

The two pillars are project knowledge (uploaded docs, text, and code) and project instructions (custom instructions that set tone and role for every chat in that project).

Use one Project per ongoing effort, not per question. A "Network Architecture" project with your diagrams, configs, and a role instruction means every new chat starts already grounded, with no re-pasting.

When knowledge gets big: RAG

On paid plans (Pro, Max, Team, Enterprise), when your knowledge base nears the context limit, Claude automatically switches to Retrieval Augmented Generation. RAG expands capacity up to 10x by pulling only the relevant chunks per question instead of stuffing everything into context.

A claude.ai Project (lives in your account, holds a cloud knowledge base, shareable on Team/Enterprise) is NOT the same as a Cowork project in Claude Desktop (lives only on your computer, points at local folders). You can link a claude.ai Project into a Cowork project without merging them.

Project instructions: set the role once

Project instructions are the single highest-leverage Project setting. They tell Claude who it is and how to answer for the whole project, so you stop repeating yourself.

You are my senior network architecture reviewer.
Audience: enterprise infra teams (Juniper, Arista, Palo Alto).
Always: cite the config file by name, flag security risks first,
and give a one-line summary before the detail.
Never invent interface names — ask if a config is missing.

Styles: control the voice

Styles change how Claude writes, independent of any Project. Open the Search and tools menu and pick Use style. Four presets ship in: Normal Concise Formal Explanatory. Normal and Concise are always visible and can't be hidden.

Custom styles are where it gets useful. You can build one two ways:

Upload or paste a writing sample (pdf, doc, or txt) so Claude matches your voice.
Or pick Describe style for a starting point, then use Use custom instructions (advanced) to give exact rules.

Styles can be renamed, previewed, edited (by Claude or via Set Instructions Manually), and deleted.

Paste a few of your best LinkedIn posts or past emails into a custom style. Claude then drafts in your real tone instead of generic AI prose — far less rewriting on your end.

Artifacts: the coworking canvas

Artifacts are a dedicated side window for standalone content: code, documents, data visualizations, and full apps that render live beside the chat. They're generally available on every plan and on iOS and Android, with their own Artifacts space in the sidebar.

The workflow is conversational. You ask, the artifact appears, you point at what's wrong, and Claude edits it in place — no copy-paste loop.

Live preview

HTML, React, and visualizations render instantly so you see results, not just code.

Iterate in place

"Make the header sticky" edits the same artifact instead of restating everything.

AI-powered apps

Embed Claude into a shareable app via a text-based API — no API keys needed.

MCP + storage

On paid plans, artifacts read/write tools like Slack and persist data.

AI-powered Artifacts

You can build a shareable app with Claude intelligence baked in. It runs on Anthropic's infrastructure, needs no API keys, and — importantly — usage counts against each end-user's own subscription, not yours. So you can share a tool without paying for everyone's calls.

On Pro, Max, Team, and Enterprise (web and desktop), Artifacts also support MCP integration and persistent storage (personal or shared). That lets an artifact read and write to external tools like Asana, Google Calendar, Slack, and custom MCP servers.

Sharing differs by plan. Free and Pro users can publish and remix Artifacts with the community. Team and Enterprise users share Projects (and the Artifacts inside them) in a secure shared workspace.

File uploads and file creation

Drop files straight into a chat for analysis: CSV, TSV, PDFs, images, and more. Multiple files work in one chat and stay downloadable for the whole conversation.

Claude can also create files. On all plans (web, Desktop, Mobile), it generates downloadable .xlsx, .pptx, .docx, and PDF files, plus PNG charts and Python data analysis. Turn it on at Settings > Capabilities > Code execution and file creation.

In a Project, your knowledge-base files are now reachable from Claude's code-execution environment while staying in context, and outputs can save directly to Google Drive. Upload a raw CSV, ask for a cleaned pivot, and get a real .xlsx back.

Web search and Research

Web search must be turned on by you (toggle it in chat settings/tools). Once on, Claude decides when a prompt needs fresh data, searches, and answers with citations to its sources.

Research mode is a paid-only (Pro/Max/Team/Enterprise) agentic step up. It runs many progressive searches across the web and connected internal sources like Gmail, Google Calendar, and Google Docs. It needs web search on, and you enable it from the + button > Research.

Web search is off by default and Claude won't browse without it. If you get a "my knowledge has a cutoff" answer to a current-events question, you forgot to enable the toggle.

Cowork: hand Claude the whole job, from any device

Claude Cowork is the app's agentic surface — where you don't just ask for an answer, you hand over the work. Claude operates across your files, calendar, email, messaging apps, the web, and whatever tools you connect, until the job is done. It began as a desktop-only feature and, as of July 2026, runs on web and mobile too (rolling out in beta, Max plans first).

Your work follows you

Start a task at your desk, check progress from your phone, pick up the finished output anywhere. Chat and Cowork now share one home, with projects and artifacts across both.

Runs in the background

Close the laptop and Claude keeps going. Scheduled tasks run with no device online — set Monday's client prep for 6am and review the drafted briefing over coffee.

You keep the decisions

When Claude hits a call only you can make, the question reaches your phone. You can redirect a draft mid-meeting; nothing ships until you've reviewed it.

Computer use (preview)

On Pro/Max, Claude can open files and click around your screen to do tasks itself — no setup — making it useful well beyond code.

Desktop remains the deepest surface: there Claude also reaches your local files and browser in an isolated VM. Web and mobile bring the same delegate-and-check-later loop to devices that can't install the app. Over 90% of Cowork usage is non-coding work — ops, research, content.

Putting it together: a coworking session

The real power is the combination. Here's a typical flow that uses all four surfaces at once.

Create a Project, upload your reference docs, and write a role instruction.
Pick (or build) a custom style so output lands in your voice.
Open a chat, turn on web search, and ask Claude to draft something.
Claude renders the draft as an Artifact you can preview live.
Refine by pointing at the artifact; export it as .docx or .pptx when done.

Surface	Holds	Best for
Project	Knowledge + instructions	Persistent context for one effort
Chat	The conversation	Asking and steering
Artifact	Rendered output	Live code, docs, and apps
Style	Voice rules	Consistent tone everywhere

Treat Project instructions like a tiny CLAUDE.md for the web app: keep them short and high-signal. A focused 5-line instruction beats a wall of text Claude has to wade through every chat.

💰

SECTION 19

Cost & Speed Optimization

Pay Haiku prices for Opus-quality work

Mastered

The cheapest token is the one you never send — route smart, cache big, and keep context lean.

Two levers control your bill: which model runs the task, and how many tokens it reads each turn. Master both and you cut cost 5–10x without losing quality. The single rule under everything: the context window fills up fast, and quality drops as it fills.

Pick the right model for the job

Opus 5 costs 5× more per input token than Haiku 4.5. Using Opus to fix a typo is like renting a crane to hang a picture. Match the model and effort to the difficulty, not your mood.

Model	In / Out (per MTok)	Speed	Use it for

Prices verified July 28, 2026. Sonnet 5 is shown at its temporary introductory price.

Anthropic's own rule of thumb: start with Haiku 4.5 and upgrade only when you hit a real capability gap. Don't reach for Opus by default.

Switch the active model mid-session with /model. Use /model default to let Claude auto-route harder turns to Opus for you.

/model haiku      # cheap, fast turns
/model sonnet     # the workhorse
/model opus       # bring out the heavy reasoning
/model default    # let Claude pick per task

Route from the command line

Set the model per run for scripts and one-offs. A headless triage pass on Haiku, then escalate only the hard cases.

# Cheap, scripted lint check on Haiku
cat build.log | claude -p "summarize the errors" --model haiku

# Escalate one tough bug to Opus
claude --model opus "think hard about the race condition in worker.py"

--model overrides both the model setting and the ANTHROPIC_MODEL env var. Add --fallback-model so jobs still run if your first choice is unavailable.

Push cheap, verbose work to subagents

A subagent on Haiku can churn through a noisy codebase search and hand back a short summary — the verbose output never pollutes your main (pricier) context. Set model: haiku in the agent frontmatter.

---
name: log-scanner
description: Use proactively to grep logs and return a short summary
tools: Read, Grep, Glob, Bash
model: haiku
---
Search the logs, extract only the failing requests, and report a 5-line summary.

Prompt caching: pay once for stable context

Caching is automatic in Claude Code — you never add cache_control yourself. The API caches by exact prefix match, so a cache hit costs just 0.1x base input (90% cheaper). The catch: any change anywhere in the prefix recomputes everything after it.

Operation	Cost vs base input
Cache hit (read)	0.1x — 90% off
5-minute cache write	1.25x
1-hour cache write	2x

Cache killers: switching models mid-task, editing CLAUDE.md mid-session, and /compact. Each invalidates the cache and forces a slow, expensive uncached turn. Do these at boundaries, not in the middle of a flow.

Plan mode is the secret to 90%+ cache hit rates. Front-load all your file reads and constraints once (the stable prefix), then every execution turn reads from that cached prefix. Toggle with Shift+Tab and explore before you edit.

Keep context lean

File reads dominate context, and a bloated window both costs more and degrades quality. Fresh context is free — use it.

/clear between tasks

Wipes history entirely. Use it whenever you switch to unrelated work so old tokens stop riding along on every turn.

/compact at breakpoints

Summarizes history into a compact form. Steer it: /compact focus on the API changes. Do it proactively around 50–75%, not at the auto-trigger.

Scope your @ files

Reference exact files and line ranges, not whole directories. Prefer rg/grep over dumping files into the chat.

/context to audit

Shows a live token breakdown by category so you can see what's eating the window before it bites.

Rewinding beats correcting. EscEsc (or /rewind) erases a failed attempt from context entirely, while correcting leaves the dead-end approach polluting every later turn.

Be specific to avoid re-work

Vague prompts cause endless exploration that floods context and burns tokens. "Investigate the auth bug" can spiral; a scoped request lands in one pass.

# Costly: unscoped, invites infinite exploration
claude "investigate why login is slow"

# Lean: names the file, states the constraint
claude "in src/auth/session.ts, find why validateToken() does a DB call per request; propose a cache"

Batch and reuse work

Stop paying startup costs over and over. Reuse sessions in scripts instead of cold-starting each time, and let the cache stay warm across follow-ups.

# Capture a session id, then continue it cheaply
session_id=$(claude -p "Start a review of the diff" --output-format json | jq -r '.session_id')
claude -p "Now check for type errors" --resume "$session_id"

For the API and async work, the Batch API runs jobs at 50% off (e.g. Sonnet 5 drops to $1.50 in / $7.50 out per MTok). Ideal for large, non-interactive jobs you don't need answered in real time.

Fast mode: pay for speed when latency is the bottleneck

Cost isn't the only axis — sometimes you just want the answer now. Fast mode (research preview) runs Claude Opus at up to 2.5× higher output tokens per second. That is Anthropic's reported figure for throughput — not time-to-first-token, and not a guaranteed latency SLA. It's the same model with the same quality, just a config that trades cost efficiency for speed. Toggle it in the CLI with /fast.

/fast          # toggle fast mode on/off (Claude Code CLI; not in VS Code)
claude --version   # verify your client supports the current Opus 5 fast tier

Model	Fast-mode price (in / out per MTok)	Notes
Opus 5	$10 / $50	Research preview; about 2× base price for up to 2.5× output tokens/sec
Opus 4.8	$10 / $50	Still supported for fast-mode workloads that have not migrated
Opus 4.7	Removed	Fast mode removed July 24, 2026; migrate to Opus 5 or Opus 4.8

Reach for it during interactive, latency-sensitive work: rapid iteration, live debugging, a tight deadline. Toggle it back off for batch or background jobs where cost matters more than the wait.

Fast mode is a research preview and Opus-only — Opus 5 and Opus 4.8, not Sonnet/Haiku. On subscription plans (Pro/Max/Team/Enterprise) it bills via usage credits only, outside your included limits, and Team/Enterprise Owners must enable it first. Not available on Amazon Bedrock, Google Cloud, Microsoft Foundry, or Claude Platform on AWS — but that restriction is on fast mode, not on Opus 5 itself, which runs normally on all of those platforms.

Trim CLAUDE.md ruthlessly

CLAUDE.md loads on every session and re-loads after every compaction — every line is a recurring tax. For each line ask: "Would removing this cause Claude to make mistakes?" If not, cut it. A bloated file also makes Claude ignore the rules that matter.

Watch the meter

You can't optimize what you don't measure. Check spend and usage as you go.

/cost       # token usage + estimated cost this session
/context    # live token breakdown by category
/mcp        # per-server tool cost — prune noisy servers

Reserve ultrathink for genuinely hard, costly-to-get-wrong decisions (architecture, multi-file debugging). For everyday work, plain think or think hard gives the depth you need without the token bill.

Your cost-saving habits, in order

Default to Haiku or Sonnet; reach for Opus only on hard reasoning.
Plan first (Shift+Tab) to write one stable, cached prefix.
Avoid mid-task model switches and CLAUDE.md edits — they nuke the cache.
/clear between unrelated tasks; /compact at logical breakpoints.
Scope prompts and @ files; rewind failures instead of correcting them.
Reuse sessions in scripts; batch non-urgent jobs at 50% off.
Check /cost and /context to catch waste early.

🔒

SECTION 20

Security Hardening

Give Claude autonomy without giving it the keys to your machine

Mastered

⚡ WHY IT MATTERSLeast privilege means every tool call must pass a deny-first gate before it can ever touch your system.

A tool request enters the permission check, where deny rules are evaluated first, then routed to Allow (run), Ask (prompt), or Deny (block).

🎬 Animation, explained

Every tool call Claude wants to make enters the permission check before anything executes.
Deny rules are evaluated first — a matching deny wins no matter what else matches. That ordering is the whole security model.
Surviving requests route three ways: Allow (runs silently), Ask (you get prompted), or Deny (blocked).
Takeaway: build your allowlist from things you'd approve every time, keep deny rules for the truly dangerous, and let Ask handle the gray zone.

Claude Code is powerful enough to edit files, run shells, and call external servers — so the question isn't whether to trust it, but how to scope that trust.

Security in Claude Code is enforced by the harness itself, not by the model. A prompt or CLAUDE.md can never widen what Claude is allowed to do — only your settings, permission modes, and hooks can. That separation is the whole game.

Key principle: permission rules are checked by Claude Code before any tool runs. Instructions inside the conversation cannot override them.

The permission model

Every tool call is matched against three rule types. They are evaluated deny → ask → allow, and the first match wins, so a deny always beats an allow at any other level.

Rule	Effect
allow	Run the tool with no prompt
ask	Prompt for confirmation each time
deny	Block it outright — always wins

Rules use the format Tool or Tool(specifier). Manage them interactively with /permissions or write them into settings.json.

{
  "permissions": {
    "allow": [
      "Read(src/**)",
      "Bash(npm test)",
      "Bash(npm run lint)"
    ],
    "ask": [
      "Bash(git push:*)"
    ],
    "deny": [
      "Read(.env)",
      "Read(./secrets/**)",
      "Bash(curl:*)"
    ]
  }
}

A bare-name deny like "Bash" removes the tool from Claude entirely. A scoped deny like "Bash(rm *)" only blocks matching calls. Choose the narrower one that fits the risk.

Settings precedence

When rules conflict, this order decides (highest to lowest). A deny at any level cannot be overridden by an allow at any other level.

Managed (admin) settings — not even CLI flags override these
Command-line arguments
Local project settings (.claude/settings.local.json)
Shared project settings (.claude/settings.json)
User settings (~/.claude/settings.json)

Permission modes

Modes set the default posture for unmatched tools. Cycle them with Shift+Tab or set permissions.defaultMode.

default

Reads freely, prompts on first use of anything that writes or runs.

plan

Read-only exploration. Enforced at the tool level — Claude literally cannot edit or run destructive commands.

acceptEdits

Auto-accepts file edits plus filesystem commands (mkdir, touch, rm, mv, cp, sed) in the working dir.

dontAsk

Auto-denies anything not pre-approved. Built for non-interactive CI.

auto

Auto-approves with a background safety classifier. Research preview — treat as experimental.

bypassPermissions

Skips all prompts. Only safe inside an isolated environment.

Start every non-trivial task in plan mode. Let Claude explore and propose a plan with zero write access, then switch out to implement.

The truth about --dangerously-skip-permissions

This flag (and the bypassPermissions mode) removes per-action review completely and skips protected-path checks. The only thing left is a circuit breaker that still prompts on rm -rf / and rm -rf ~ — and it offers no protection against prompt injection.

Never run --dangerously-skip-permissions on your host machine for convenience. It is designed for isolated, no-host-damage environments only — containers, VMs, or dev containers without internet access.

The flag is blocked when running as root or via sudo on Linux/macOS (that check is skipped inside a recognized sandbox). The safer move is almost always a scoped allowlist instead.

# Risky: blanket bypass on your laptop
claude --dangerously-skip-permissions   # don't do this on the host

# Safer: pre-approve exactly the tools you need
claude -p "run the test suite and fix failures" \
  --allowedTools "Read" "Edit" "Bash(npm test)" "Bash(npm run lint)"

Admins can disable the escape hatch entirely with permissions.disableBypassPermissionsMode: "disable" (and disableAutoMode for auto) in managed settings.

Protected paths

In every mode except bypassPermissions, Claude Code refuses to auto-approve edits to sensitive locations. You don't configure these — they're built in.

Protected dirs	Protected files
`.git`, `.vscode`, `.idea`, `.devcontainer`, `.husky`, `.cargo`, `.claude`	`.bashrc`, `.zshrc`, `.profile`, `.envrc`, `.npmrc`, `.gitconfig`, `.mcp.json`, `.claude.json`

Sandboxing risky autonomy

The sandboxed Bash tool gives OS-level filesystem and network isolation — Seatbelt on macOS, bubblewrap on Linux/WSL2. It covers Bash commands and their children only (not Read, Edit, WebFetch, or MCP). Turn it on with /sandbox or sandbox.enabled: true.

The sandbox is the only thing that enforces path limits for all processes. Read/Edit deny rules only cover Claude's built-in tools and recognized commands like cat or sed — not an arbitrary script that opens a file itself.

{
  "sandbox": {
    "enabled": true,
    "filesystem": {
      "denyRead": ["~/.aws", "~/.ssh", "~/.config/gcloud"],
      "denyWrite": ["~/"]
    },
    "network": {
      "allowedDomains": ["api.anthropic.com", "registry.npmjs.org"]
    }
  }
}

The default read policy still allows credential dirs, so Anthropic recommends adding denyRead for ~/.aws and ~/.ssh yourself. For org-wide enforcement, managed settings can set failIfUnavailable: true (refuse to start without a sandbox) and allowUnsandboxedCommands: false (kill the dangerouslyDisableSandbox escape hatch).

Whole-process isolation

For maximum autonomy, two options wrap more than just Bash:

Dev container

The claude-code repo ships an example .devcontainer/ with a default-deny iptables firewall, a non-root user, and no unapproved egress — a starting point for unattended runs. It's a convention, not a boundary Claude enforces.

sandbox-runtime

@anthropic-ai/sandbox-runtime (beta) wraps the entire process in Seatbelt/bubblewrap so every tool, hook, and MCP server is constrained — denying all write and network by default.

Secret handling

Claude Code stores its own API keys encrypted via credential management, and blocks web-fetching commands like curl and wget by default. Your job is to keep your project's secrets out of reach.

Never hardcode keys — read them from environment variables (ANTHROPIC_API_KEY, etc.).
Add secret files to .gitignore and add Read deny rules for them.
Run a pre-commit secret scan as a hard gate before any commit.

# .gitignore
.env
.env.*
secrets/
*.pem

Deny reads on secret files in settings, but remember: only the sandbox stops an arbitrary script from reading them. For high-autonomy runs, combine deny rules with a sandbox denyRead.

Trusting MCP servers

MCP servers are separate processes that run unconstrained on your host unless the whole Claude Code process is sandboxed. A server you add can read your data and run code — treat adding one like installing a dependency with full machine access.

Only add MCP servers you trust or that Anthropic has reviewed. The allowed-server list is checked into source control so your team can audit it.

First-time codebase runs and new MCP servers require trust verification. Note that verification is disabled in non-interactive -p mode (except --worktree, which still requires accepted trust) — so don't pipe an untrusted repo straight into headless mode.

When you do auto-approve MCP tools, scope them with wildcards rather than reaching for bypassPermissions. Note that acceptEdits does not auto-approve MCP tools at all.

Skills and plugins are supply chain too

The same "installing a dependency with machine access" logic applies to skills and plugins — they can carry hooks, MCP servers, and executables. Vet the source before installing, and prefer the curated claude-plugins-official marketplace or your org's reviewed list.

Static scanning is not a safety guarantee. In mid-2026 researchers published SkillCloak, a technique that hides malicious behavior in Agent Skills so it slips past over 90% of static scanners, detonating only at runtime. Treat an unreviewed third-party skill like unreviewed code: read it, run it sandboxed first, and don't grant it broad tool access on trust alone.

{
  "permissions": {
    "allow": ["mcp__github__*"],
    "deny": ["mcp__github__delete_repository"]
  }
}

Hooks as guardrails

Hooks are your programmable safety net. A PreToolUse hook runs before the permission prompt and can deny a call, force a prompt, or skip it. Across the evaluation flow, hooks run first — even before deny rules — making them the strongest enforcement point you control.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path' | grep -qE '(^|/)(\\.env|secrets/)' && { echo 'Blocked: protected path' >&2; exit 2; } || exit 0"
          }
        ]
      }
    ]
  }
}

An exit code of 2 tells Claude Code to block the call and feed the message back to Claude. Use the same pattern to scan a diff for secrets before a commit, or to protect any path your team treats as off-limits.

A ConfigChange hook can audit or block changes to settings during a session — so Claude can't quietly loosen its own policy mid-task.

Hardening checklist

Default to plan mode for unfamiliar work; switch to edit only after reviewing the plan.
Maintain a least-privilege allow list — specific tools and commands, not blanket grants.
Add deny rules for secret files, credential dirs, and dangerous commands (curl, rm *).
Never run --dangerously-skip-permissions on your host — only inside a container, VM, or dev container.
Enable the sandbox (sandbox.enabled: true) with denyRead for ~/.aws and ~/.ssh on autonomous runs.
Keep secrets in env vars; .gitignore them; gate commits with a secret scan.
Add only trusted MCP servers; scope tool approval with wildcards, never bypassPermissions.
Use PreToolUse hooks to block protected paths and scan diffs before commit.
Run a fresh-context /security-review on the diff before treating work as done.

Layer these. No single control is complete — deny rules, sandbox, hooks, and isolation each cover a gap the others leave open.

💻

SECTION 21

Best Practices: Writing Code

Plan, scope, test, verify — then trust.

Mastered

Great code from Claude is not luck — it comes from a tight loop: plan first, change small, verify always, commit clean.

The goal is not code that runs. It is code that is correct, readable, and maintainable. Every habit below points at that.

Anthropic's #1 rule: give Claude a check it can run itself (tests, build, linter, a screenshot). If you adopt only one practice, make it verification.

The Core Loop

Use the same four-phase workflow Anthropic recommends. It keeps each change small enough to review and reversible if it goes wrong.

Explore in plan mode — read-only, no edits. Claude reads the code and asks questions.
Get a written plan — ask for a step-by-step approach before any code.
Implement & verify — leave plan mode, build it, run the checks against the plan.
Review the diff & commit — read every change, then commit with a clear message.

1. Ask for a Plan First

Plan mode is enforced at the tool level, not just a polite request. In plan mode Claude can only read, search, and think — it cannot edit files or run destructive commands. Enter it with Shift+Tab to cycle modes, or the /plan command.

# Start a session locked to read-only planning
claude --permission-mode plan

When to skip it: Anthropic's rule of thumb is simple. If you could describe the diff in one sentence (a typo, a log line, a rename), skip the plan. Plan when the approach is uncertain, the change spans multiple files, or the code is unfamiliar.

A plan you can read is a plan you can correct before it costs you a bad diff. Persist big plans to a PLAN.md file so they survive context compaction.

2. Point Claude at the Right Files

Don't make Claude guess where code lives. Reference exact files with @ so it loads only what matters and keeps context lean.

# Mention specific files and a line range
Review @src/auth/login.ts#L40-95 and the test in
@src/auth/login.test.ts. The session cookie isn't
being set on success. Find the root cause.

In the IDE, Option+K (VS Code) or Cmd+Option+K (JetBrains) inserts an @-mention for the current selection with its path and line range automatically.

Asking Claude to "investigate the codebase" with no scope makes it read hundreds of files and flood the context window. Scope it, or delegate broad exploration to a subagent that returns only a summary.

3. Make Small, Testable Changes

The Common Workflows guidance is explicit: do refactoring in small, testable increments, and keep backward compatibility when it matters. Small diffs are easy to read, easy to test, and easy to revert.

Give yourself a safety net with git. Have Claude branch per feature so a bad turn is throwaway.

# Discardable safety net before a risky change
git checkout -b fix/session-cookie

4. Test-Driven Development

Claude defaults to writing implementation first — you have to invert that on purpose. Tell it to write failing tests first, give concrete examples, and run the tests after implementing.

Write failing tests FIRST for isValidEmail(), then implement.
Cases: "user@example.com" -> true, "nope" -> false,
"" -> false, "a@b.c" -> true.
Run the tests after implementing and show me the output.

After Claude fixes a mistake, ask it to update CLAUDE.md with a rule so it won't repeat it. Claude is good at writing rules for itself.

5. Anchor Conventions in CLAUDE.md

CLAUDE.md is read at the start of every conversation, before anything else. It is the most reliable channel for your project's conventions — build commands, test runner, code-style rules, and gotchas. Generate a starter with /init, then trim it.

Include	Exclude
Bash commands Claude can't guess	Anything inferable from the code
Non-default code-style rules	Standard language conventions
Test runner & how to run it	Full API docs (link instead)
Architecture decisions, env quirks	Self-evident advice

Keep it short — aim under ~200 lines. A bloated CLAUDE.md makes Claude ignore real instructions. If a rule keeps getting ignored, the file is probably too long and the rule is getting lost. For each line ask: "Would removing this cause a mistake?"

# CLAUDE.md (excerpt)
## Commands
- Test:  pytest -q
- Lint:  ruff check . && mypy src
- Build: make build

## Style (YOU MUST)
- Type hints on every function signature
- Immutable data — never mutate in place
- Async/await for all IO

6. Review the Diff Before You Trust It

This is where correctness is won or lost. Read the diff — don't just accept it. Watch for the "trust-then-verify gap": plausible-looking code that misses edge cases.

# Open the interactive diff viewer
/diff

For a second opinion, run the bundled review skills. They review the current diff in a fresh subagent that sees only the change and the criteria — not Claude's own reasoning.

/code-review

Correctness pass on the diff in a clean context. Add --fix to apply findings.

/security-review

Security-focused pass for auth, input handling, and secrets.

/simplify

Cleanup-only review — reuse, simplification, lower altitude. No bug hunting.

7. Run It, Then Commit

Code that compiles is not code that works. Run the build and the test suite, address the root cause of any failure (don't suppress the error), then commit.

pytest -q && ruff check . && git add -A
git commit -m "fix: set session cookie on successful login"

Anthropic's stated rule for the trust gap: "If you can't verify it, don't ship it."

Do / Don't Grid

Do: plan complex work

Read-only plan mode before multi-file or unfamiliar changes.

Don't: one-shot a big feature

Large unscoped diffs are unreviewable and unrevertable.

Do: give a runnable check

Tests, build exit code, linter, or a screenshot close the loop.

Don't: trust on sight

Plausible code hides edge-case bugs. Read the diff, run it.

Do: point with @

Name exact files and line ranges to keep context lean.

Don't: say "investigate"

Unscoped exploration floods context and degrades quality.

Do: /clear between tasks

Reset context for unrelated work; one scope per session.

Don't: bloat CLAUDE.md

Long files bury the rules Claude actually needs.

A Tight Example Loop

Putting it together on a real bug fix — small, verified, committed:

Shift+Tab into plan mode: Bug: @api/cart.py#L20-60 double-charges on retry. Diagnose only.
Approve the plan, then leave plan mode and ask for a failing test that reproduces the double charge.
Let Claude implement the fix and run pytest -q until the new test passes.
Run /code-review on the diff, then /diff and read it yourself.
Run the full suite + linter, then commit: fix: make checkout idempotent on retry.

Keep loops tight. When context fills, run /context to see usage and /compact focus on the API changes to summarize while keeping file paths, decisions, and test commands.

📝

SECTION 22

Best Practices: Docs & Writing

Turn Claude into your fastest, most consistent co-writer

Mastered

Claude writes its best docs when you hand it a role, an audience, a goal, hard constraints, and a sample of the voice you want.

Vague prompts produce vague prose. The fix is a simple frame you reuse for every writing task: who Claude is, who it's writing for, what success looks like, and the rules it must follow. Add your own source material and a voice sample, and quality jumps immediately.

The five-part writing prompt

Every strong writing request answers five questions. Skip one and Claude guesses, which is where bland output comes from. Keep them explicit and up front.

Role

"You are a staff engineer writing internal docs." Sets vocabulary and depth.

Audience

"For new hires with no Kubernetes background." Controls jargon and assumptions.

Goal

"Reader can deploy the service in 10 minutes." Gives a measurable target.

Constraints

Length, tone, format: "Under 400 words, plain, Markdown with headers."

Source + voice

Paste real notes and a paragraph in the voice you want copied.

Give Claude a voice sample, not a voice adjective. "Match the tone of the paragraph below" beats "write professionally" every time, because Claude can imitate concrete examples far better than abstract labels.

Lock a consistent style

If you write the same kinds of docs often, stop re-pasting style rules. Save them once so every session starts from your house style.

In the Claude app: Projects + custom instructions

Create a Project, drop your style guide and example docs into its knowledge, and put the rules in the Project's custom instructions. Every chat in that Project inherits them.

In Claude Code: CLAUDE.md

CLAUDE.md is read at the start of every session, so it is the most reliable place for writing conventions. Keep it short; bloated files get ignored.

# Writing style
- Audience: external developers integrating our API.
- Voice: direct, second person, present tense. No marketing fluff.
- Format: Markdown. H2/H3 headings. Code in fenced blocks with language tags.
- IMPORTANT: every code sample must be copy-pasteable and runnable as-is.
- Sentences under 25 words. Define an acronym on first use.

For style rules that only apply to docs (not code), use a path-scoped rule in .claude/rules/ with a paths: ["docs/**"] frontmatter field so it loads only when you touch doc files, keeping your main context lean.

Work section by section, not all at once

Long documents drift when generated in one shot. Ask for an outline first, approve it, then fill one section at a time. You catch structure problems before any prose is written.

Ask for an outline only: "Draft a section outline for this README. No prose yet."
Edit the outline directly and paste it back as the agreed structure.
Request one section at a time, feeding the relevant source for each.
After the draft, run a self-critique pass before you accept it.

Make Claude critique its own draft

A fresh-eyes review catches what the drafting pass missed. Tell Claude exactly what to look for so it doesn't just rewrite for the sake of it.

Review the draft above as a skeptical editor. Flag ONLY:
1. Claims that need a source or are unverifiable
2. Sentences over 25 words or with passive voice
3. Steps a beginner could not follow
4. Anything off-tone vs. the voice sample
List issues as a numbered table, then give a revised version.

A reviewer told to "find problems" will almost always report some, even in clean text. Scope the critique to issues that genuinely affect clarity or correctness so you don't burn cycles polishing what is already fine.

Generate tables, diagrams, and structured output

Claude is strong at turning messy notes into clean structure. Ask for the exact format you need.

Tables and comparison grids

Turn these release notes into a Markdown table with columns:
Feature | What changed | Action needed | Breaking?
Sort breaking changes to the top.

Diagrams as code (Mermaid)

Ask for Mermaid and you get a diagram that renders in GitHub, many docs tools, and Markdown viewers, and stays editable as text.

Draw the auth flow as a Mermaid sequenceDiagram:
user -> web app -> auth service -> token -> protected API.
Return only the fenced ```mermaid block.

Strict structured output via the API

When another program consumes Claude's output, enforce a schema. Structured outputs are GA on Claude API via output_config.format (no beta header) on recent models.

output_config={
  "format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "summary": {"type": "string"},
        "tags": {"type": "array", "items": {"type": "string"}}
      },
      "required": ["title", "summary", "tags"]
    }
  }
}

Artifacts: draft documents live

In the Claude app, ask for an Artifact when you want an editable, self-contained document you can iterate on in a side panel instead of scrolling chat. Great for a README, a spec, or an email you'll keep refining.

Iterate by pointing, not by re-pasting. Once a doc lives in an Artifact, say "tighten the intro" or "add an error-handling section after step 3" and Claude edits in place, keeping the rest untouched.

Before / after: a real prompt upgrade

Same task, two prompts. The second wins because it supplies role, audience, goal, constraints, and a voice anchor.

Before vague

Write a README for my CLI tool.

After framed

You are a developer-experience writer. Audience: developers who
found this tool on GitHub and have 2 minutes to decide if it fits.
Goal: they can install and run their first command without leaving
the page.

Constraints: Markdown; under 350 words; sections = What it does /
Install / Quickstart / Config. Tone: terse and friendly, like the
sample below. Every command must be copy-pasteable.

Voice sample: "No config, no daemon. Point it at a folder and go."

Source (use only these facts):
- Installs via: npm i -g foldermap
- First command: foldermap ./src
- Reads a .foldermaprc for ignore globs

Output the outline first and wait for my OK before drafting.

Quality checklist before you ship a doc

Check	What good looks like
Audience fit	No undefined jargon; assumptions match the reader's level.
Goal met	A reader could actually do the thing the doc promises.
Every claim sourced	Facts come from your material, not invented by the model.
Commands run	You pasted and executed each snippet; none are guesses.
Voice consistent	Reads like one author and matches your sample.
Length honored	Within the word/section limits you set.
Structure scannable	Clear headings, short paragraphs, tables where useful.

Always verify the facts yourself. Claude can write a confident, well-structured doc around a wrong detail. If a claim or command can't be checked against your source, cut it or confirm it before publishing.

🗂️

SECTION 23

Organizing Your Workflow

Run a system, not a pile of prompts

Mastered

⚡ WHY IT MATTERSMatch each need to its right Claude extension surface and you stop forcing one mechanism to do another's job.

A decision map fanning from 'Need to add capability?' to the seven Claude surfaces, each paired with the situation it fits.

🎬 Animation, explained

The center question — "need to add a capability?" — is where every customization decision starts.
The seven branches light up one at a time, each pairing a surface (CLAUDE.md, slash command, skill, subagent, hook, MCP, plugin) with the situation it fits.
The pulse visits all seven to make the point that they're siblings, not rivals — each one is the right answer to a different sentence.
Takeaway: match the sentence, not the hype — "every session should know X" → CLAUDE.md; "I keep typing this prompt" → slash command; "must happen every time" → hook.

Power users don't write better prompts — they run a better system around the prompts.

One goal per thread. Break it into a task list, sequence the steps, and feed Claude only the context that step needs. Everything else in this section is in service of that one idea.

Orchestrate the goal, not the prompt

For anything with 3+ distinct steps, Claude Code already tracks a todo list for you. As of Claude Code v2.1.142 it uses structured Task tools (TaskCreate, TaskUpdate, TaskList) instead of the old TodoWrite. Your job is to set the scope and let the list keep the work honest.

Keep one thread per goal. When you switch goals, switch threads (or /clear). Mixing two goals in one context is the fastest way to make Claude forget the first one.

Anthropic's recommended rhythm for non-trivial work is four phases: explore, plan, implement, commit. Stay read-only while you explore so nothing changes before you have a plan.

Explore in plan mode (read-only) — press Shift+Tab or run /plan.
Plan — ask for a detailed, file-by-file implementation plan.
Implement — leave plan mode and build, verifying against the plan.
Commit — descriptive message, then open a PR.

Skip the plan for tiny changes. Anthropic's rule: “If you could describe the diff in one sentence, skip the plan.”

Durable context: where memory lives

A focused thread is short-lived. Durable context is what survives a /clear or a new session. You have three layers — use the right one.

Projects (Claude app)

Attach files and a custom instruction set to a workspace so every chat in that Project starts with the same background. Best for non-code work — research, writing, planning.

CLAUDE.md (Claude Code)

Read at the start of every session, before anything else. The most reliable channel for project conventions. Generate a starter with /init, then trim it.

Auto memory (MEMORY.md)

Claude writes notes to itself across sessions. The first 200 lines or 25KB load at startup — anything past that is ignored. Toggle with /memory.

CLAUDE.md lives in a few places, and they stack. Use /memory to see which files are actually loaded.

~/.claude/CLAUDE.md      # applies to every session, all projects
./CLAUDE.md              # project root, checked into git (team shares it)
./.claude/CLAUDE.md      # project, alternate location
./CLAUDE.local.md        # personal, gitignored
src/api/CLAUDE.md        # per-directory, loads on demand when you touch that dir

Keep CLAUDE.md short — target under 200 lines. A bloated file makes Claude ignore the real instructions. For each line ask: “Would removing this cause Claude to make a mistake?” If not, cut it.

After Claude makes a mistake you corrected, ask it to add a rule to CLAUDE.md so it won't repeat it. Claude is good at writing rules for itself — and use YOU MUST or IMPORTANT on rules that keep getting ignored.

Manage context deliberately

The context window is the resource that matters most. Performance drops as it fills, and Claude starts forgetting early instructions. Treat free context like a budget you spend on purpose.

/context          # see what is eating your context window
/mcp              # per-server token cost (MCP schemas are deferred until used)
/clear            # full reset between unrelated tasks
/compact          # summarize history, preserving key decisions
/compact focus on the auth refactor and the failing tests   # targeted compact

For long single tasks Claude Code auto-compacts near the limit — it drops old tool output first, then summarizes while keeping code, file paths, and decisions. You can also bias what survives with a Compact Instructions section in CLAUDE.md.

Pull in only the files a step needs with scoped @ mentions: @src/auth.ts or @app.ts#5-10. Don't ask Claude to “investigate” with no scope — it will read hundreds of files and flood your context.

For broad exploration, delegate to a subagent. It runs in its own fresh context window and returns only a summary, so the research, logs, and doc-reads never touch your main thread.

Which mechanism do I reach for?

This is the question power users get wrong most often. Match the job to the lightest tool that does it.

Reach for…	When you want…	Loads
CLAUDE.md	Project conventions Claude needs every session (build cmd, test runner, style rules it can't guess)	Always, at startup
Skill	Deep how-to for a specific task, only when relevant (e.g. “generate a PDF”)	On demand, by name/description
Path-scoped rule	Instructions tied to certain files (`.claude/rules/` with a `paths` frontmatter)	Only when matching files are touched
Subagent	Isolate noisy work (research, review, tests) so the main context stays clean	Spawned per task, returns a summary
MCP server	Connect Claude to an external system (GitHub, a DB, Notion, browser)	Tool schemas deferred until used
Hook	Deterministic action on every event (auto-format on save, block a command)	The harness runs it — not the model
Plugin	Bundle and share skills, commands, agents, and MCP config as one unit	Installed once, brings its pieces
API / Agent SDK	Run the agent loop in your own code, headless, on your infrastructure	You build and deploy it

Rule of thumb: if it should fire automatically and reliably, it's a hook — instructions in CLAUDE.md are advice the model may skip, hooks are enforced by the harness.

Patterns for repeatable work

Anything you do more than twice should become a command. Custom slash commands are just markdown files — drop one in .claude/commands/ and it becomes /name.

# .claude/commands/ship.md
Review the current diff with /code-review, fix any CRITICAL
or HIGH issues, run the test suite, then write a conventional
commit message and open a PR. Verify the build passes first.

Define reusable subagents the same way, in .claude/agents/, with frontmatter to route cheap work to a faster model and scope its tools.

---
description: Reviews diffs for correctness. Use proactively after edits.
model: haiku                # route cheap, frequent work to Haiku 4.5
tools: Read, Grep, Bash
isolation: worktree         # run in its own git worktree
---
You are an adversarial reviewer. You see only the diff and the
acceptance criteria. Flag ONLY gaps that affect correctness.

For a big change, run /batch to split it into 5–30 independent units, each in its own git worktree — then 3–5 sessions work in parallel without colliding. Persist the plan to PLAN.md so it survives any compaction.

An operating rhythm you can adopt

Put it together into a loop you run for every meaningful goal.

Frame — one goal, fresh thread. /clear if reusing a session.
Explore + plan — plan mode, then a written plan saved to a file.
Scope context — @ only the files in play; delegate wide research to a subagent.
Implement in increments — small, testable steps; give Claude a check it can run.
Verify — run /code-review in a fresh context before calling it done.
Capture — commit, and fold any new lesson into CLAUDE.md.
Reset — /compact for a long task, /clear before the next goal.

Watch the “trust-then-verify gap”: a plausible-looking implementation that misses edge cases. Anthropic's fix is blunt — “If you can't verify it, don't ship it.” Verification is their #1 practice; if you adopt only one habit, make it this.

🎤

SECTION 24

From the Source: Boris Cherny

A complete, slide-by-slide walkthrough of Boris Cherny's Claude Code masterclass.

Mastered

A full walkthrough of "Mastering Claude Code in 30 Minutes" by Boris Cherny — the creator of Claude Code — turned into slide-by-slide notes you can act on.

Source: Mastering Claude Code in 30 Minutes — Boris Cherny (Code w/ Claude, Anthropic). Quotes paraphrased.

From the Source: Boris Cherny on Claude Code

This section distills a talk by Boris Cherny — a member of technical staff at Anthropic and the creator of Claude Code — titled "Mastering Claude Code in 30 Minutes." Everything here comes straight from his words at Code w/ Claude.

Boris built Claude Code. So when he describes how it's meant to be used, he isn't guessing at the design intent — he set it.

Slide 1 — What Claude Code Is

A new kind of AI assistant

Past coding assistants completed a line, or a few lines, at a time — autocomplete on steroids. Claude Code is different: it is fully agentic. It is meant to build whole features, write entire functions and files, and fix entire bugs in one go.

Old way: line completion

Suggests the next line or two as you type. You stay the driver for every keystroke.

Claude Code: fully agentic

You describe the outcome — "build this feature," "fix this bug" — and it figures out the steps end to end.

Works with ALL your tools — no workflow change

A core design goal: you don't change how you work to use it. Claude Code slots into your existing setup instead of replacing it.

Dimension	What's supported
IDEs / editors	VS Code, Xcode, JetBrains, Vim/Emacs
Terminal	Every terminal — it's terminal-native
Where it runs	Locally, over remote SSH, inside tmux
Scope	General purpose — not tied to one language or stack

A power tool, intentionally un-opinionated

When you open Claude Code, you just see a prompt bar — nothing else. That minimalism is deliberate.

It's a power tool. Anthropic intentionally does not push you toward one "blessed" workflow. The blank prompt is an invitation to use it however your work demands.

Slide 2 — Recommended First-Time Setup

Boris recommends a short setup pass the first time you run Claude Code. Each step is a one-time investment that makes everything afterward smoother.

Run /terminal-setup to fix multi-line input.
Run /theme to pick a theme that's comfortable for your eyes.
Run /install-github-app to bring Claude into your GitHub flow.
Customize your allowed tools so you stop getting prompted for routine actions.
(macOS) Turn on Dictation so you can speak your prompts.

1. `/terminal-setup` — better multi-line input

By default, adding a new line in a terminal prompt is awkward. This command wires up Shift+Enter for new lines instead of forcing you to type trailing backslashes.

/terminal-setup
# now: Shift+Enter = newline (no more \ at line ends)

2. `/theme` — light, dark, and colorblind-friendly

Set the theme to match your environment and accessibility needs, including a Daltonize option for colorblind users.

/theme
# choose: light  |  dark  |  Daltonize (colorblind)

3. `/install-github-app` — @mention Claude on GitHub

This installs the new GitHub app. Once it's set up, you can @mention Claude on any GitHub issue or pull request and have it respond right there in the GitHub UI.

/install-github-app
# then on any issue/PR:  @claude please review this and suggest a fix

4. Customize your allowed tools

Claude Code asks for approval before taking certain actions. If there's something you get prompted about constantly, allow-list it once so you're not interrupted every time.

Boris does this for the things he's prompted about a lot. The goal is to remove repeated friction — not to blindly approve everything. Allow what you trust; keep approvals on what changes system state.

5. Pro tip — talk to Claude Code (macOS Dictation)

Enable Dictation in System Settings > Accessibility > Dictation, then hit the dictation key twice and just speak your prompt out loud.

Talk to Claude Code like it's another engineer on your team. Specific spoken prompts work great, and dictating saves a ton of typing — you can describe a whole task faster than you can type it.

Slide 3 — Tip 1: Start with Codebase Q&A

This is Boris's number-one recommendation and the single best way to begin with Claude Code. Before you ask it to write anything, just point it at your repo and ask questions. It is also exactly what Anthropic teaches every new hire on day one of technical onboarding.

"Just ask questions about your codebase." Anthropic teaches this to new engineers on day one: download Claude Code, set it up, then start asking about the code.

The onboarding result

Codebase Q&A collapsed the time it takes a new engineer to become productive at Anthropic.

Onboarding	Before Claude Code	After Claude Code
Time to productive	~2–3 weeks	~2–3 days

Zero setup — and your code stays yours

There is nothing to configure before you can ask questions. Unlike tools that build a search index or upload your repo to a server, Claude Code reads the code locally, in place, on demand.

No indexing

No index to build, no embeddings, no wait. You point it at the repo and ask immediately.

Stays local

Code is NOT uploaded to a remote database. It stays on your machine and you stay in control.

No training

Anthropic does NOT train generative models on your code.

Zero setup

No remote service, no indices to maintain — just open the prompt and ask.

Deeper than text search

Plain text search (grep / Ctrl-F) only finds literal string matches. Claude Code goes a level deeper — it actually understands the code, so it can answer relational questions and surface real examples, at the quality of a wiki or internal docs.

> How is the AuthSession class instantiated and used?

Instead of listing every line containing AuthSession, it finds the real call sites and shows you how the class is actually used in practice.

Ask about git history

This is where it shines. The model knows git — it was never system-prompted to do this; "the model is awesome" and reaches for git on its own.

> Why does this function have 15 arguments named this weird way?

It looks through the git history to find who introduced those arguments, what issues the commits link to, and summarizes the whole story back to you.

Ask about GitHub issues

Using web fetch, Claude Code can read a linked GitHub issue and pull that context into its answer — connecting the code in front of you to the discussion behind it.

The Monday standup trick

Boris asks this every Monday before standup. It reads the git log, already knows his username, and produces a readout he pastes straight into a doc.

> What did I ship this week?

Codebase Q&A is also the best way to learn how to prompt. It teaches you the boundary of what Claude can do in one shot, two shots, or three shots — versus what needs an interactive back-and-forth (REPL) session.

Slide 4 — Tip 2: Editing Code with a Small Toolset

Claude Code is fully agentic. Agentic means: you give the model tools, and it figures out how to use them to accomplish your goal — you don't micromanage the steps.

The whole toolset is tiny

There are only a handful of core tools. The power comes from how Claude strings them together to explore, brainstorm, and then edit.

Edit files

Make changes to your source files.

Run bash

Execute commands in your shell.

Search files

Find and read the relevant code.

You never name a specific tool in your prompt. You just say what you want done — Claude decides which tools to use and in what order to get there.

> Add rate limiting to the /login endpoint and update the tests.

From that one sentence, Claude will search the codebase to find the endpoint, read the surrounding code, edit the handler, and run the tests — chaining its small toolset together on its own.

Slide 5 — Brainstorm & Plan Before Big Features

The most common failure mode: handing Claude an enormous task cold. People ask it to "implement this 3,000-line feature" in one go. Sometimes it nails it — but sometimes it builds the wrong thing, because it guessed at decisions you never spelled out.

Don't one-shot huge features blind. If Claude has to guess at the design, you may get 3,000 lines of the wrong thing.

The easiest fix: just ask it to think first

You do not need plan mode or any special tool. Plain language in your prompt is enough.

> Brainstorm a few approaches for this feature, make a plan,
  run it by me, and ask for my approval before you write any code.

"You don't need plan mode or special tools — just ask." Telling Claude to brainstorm, plan, and check in with you before coding fixes the wrong-thing problem almost every time.

Explore the relevant code
Brainstorm a few options
Write a plan
Get your approval
Then — and only then — write the code

Slide 6 — The "Commit, Push, PR" Incantation

Once a change is done, one short phrase hands the whole git wrap-up to Claude.

> commit, push, and open a PR

From that, Claude makes a commit, creates a branch, pushes it, and opens a pull request — the full ship sequence.

It matches your conventions automatically

Claude reads your code, your history, and your git log to figure out how your team writes commit messages — and then matches that format. This is not system-prompted; the model works it out from context.

Commit

Stages and commits the change.

Branch + push

Creates a branch and pushes it up.

Open PR

Opens the pull request for review.

Match style

Reads git log to mirror your commit-message format.

Two short incantations cover most everyday work: "brainstorm a plan and check with me before coding," then once it's done — "commit, push, PR." No special modes, no tool names. Just plain English.

Slide 7 — Plug in your team's tools: bash/CLI and MCP

Claude Code is agentic: you give it tools, it figures out how to use them. Tip 3 in the talk is about extending its tiny built-in toolset (edit files, run bash, search files) with the tools your team already lives in. There are two kinds.

1. Bash / CLI tools

Tell Claude to use a command-line tool. The trick: point it at --help so it learns the tool on the spot, then it starts using it correctly.

2. MCP tools

Add an MCP server, tell Claude how to use it, and it begins calling it. Extremely powerful on a new codebase — hand Claude every tool your team already uses.

Teaching Claude a CLI

You don't pre-program specific tools — you just describe what you want and let Claude explore. The "made-up Barley CLI" example: introduce a tool by name and let it read its own help text.

> We use the `barley` CLI for deploys. Run `barley --help` to
  learn it, then deploy the staging environment.

If you find yourself re-explaining the same CLI every session, put it in CLAUDE.md so it persists. One-off discovery via --help is great; repeated use belongs in shared context.

Adding an MCP server

MCP (Model Context Protocol) lets Claude talk to external systems — issue trackers, design tools, browsers, internal services. Boris's headline example: a Puppeteer MCP server that lets Claude drive a real browser and screenshot the result.

# Register an MCP server, then just ask Claude to use it
claude mcp add puppeteer -- npx -y @modelcontextprotocol/server-puppeteer

> Use the Puppeteer server to open localhost:3000 and screenshot it.

"Extremely powerful on a new codebase — give it all the tools your team already uses." Plugging in CLIs + MCP is how Claude inherits your team's muscle memory.

Slide 8 — Workflows: explore → plan → confirm, then let Claude CHECK its work

The most common effective workflow is simple: let Claude explore a bit, plan a bit, and ask for confirmation before it writes code. You don't need a special mode — just ask.

> Before you write any code: brainstorm a few approaches,
  make a plan, run it by me, and wait for my approval.

People sometimes ask for an "enormous 3,000-line feature" in one shot. Sometimes it nails it — sometimes it builds the wrong thing. The cheapest insurance is to make it think and confirm first.

The big one: give Claude a way to check its own work

This is the most powerful pattern in the talk. When Claude has a feedback tool to verify what it built, it can iterate by itself — running, looking, fixing, re-running — until the result is near-perfect. No human in the loop between passes.

Unit / integration tests

Claude writes tests, runs them, reads failures, and fixes the code until they pass.

Puppeteer screenshots

It screenshots a web UI in a headless browser, compares to the target, and adjusts.

iOS simulator screenshots

It captures the running app from the simulator and self-corrects layout/visuals.

Give Claude a mock and say "build this web UI."
It gets close on the first pass.
Give it a check (tests / screenshot tool) and let it iterate 2–3 times.
It converges to near-perfect on its own.

The whole trick: pair a build instruction with a way to verify the result. With a feedback loop, Claude's 2nd and 3rd attempts are dramatically better than its first.

Slide 9 — CLAUDE.md: the simplest way to give Claude context

More context = smarter decisions. As an engineer you carry tons of unwritten context; CLAUDE.md is how you hand some of that to Claude. It's a special filename in the project root (the same directory you start Claude in), and it's auto-read into context at the start of every session — it rides along on your first user turn.

Local vs checked-in

Project CLAUDE.md (checked in)

Commit it, share it with the team. Write it once and everyone benefits — the network effect.

Local CLAUDE.md (not checked in)

Just for you. Personal notes and preferences you don't want in source control.

What to put in it

Common bash commands

Build, test, lint, run — the commands you'd tell a new hire.

Common MCP tools

Which servers exist and how the team uses them.

Architecture decisions

Key choices and conventions worth knowing up front.

Important / core files

Where the load-bearing code lives.

Style guide

Formatting and code conventions to follow.

Anything onboarding-relevant

What you'd need to know to work in this codebase.

Keep it SHORT. A bloated CLAUDE.md burns context on every session and is actually less useful — concise beats comprehensive.

Nested CLAUDE.md

You can drop a CLAUDE.md into child directories. Unlike the root file (always loaded), nested ones are pulled in on demand — only when Claude is working in that subtree. Great for module-specific guidance without bloating the global context.

repo/
├── CLAUDE.md            # root — always loaded
├── frontend/
│   └── CLAUDE.md        # loaded when Claude works in frontend/
└── services/payments/
    └── CLAUDE.md        # loaded on demand for payments work

There's also an enterprise-level root CLAUDE.md shared across all codebases and users — org-wide context that follows every engineer.

Slide 10 — The context & config hierarchy

It's not just CLAUDE.md — everything in Claude Code is hierarchical. Config flows from your repo, up to your personal global settings, up to org-wide enterprise policy. This applies to CLAUDE.md, slash commands, permissions, and MCP servers alike.

Level	Scope	Governs
Project	Specific to one git repo. Check it in to share, or keep it personal/local.	Project `CLAUDE.md` · repo slash commands · `.mcp.json` · repo permissions
Global (user)	Applies across all of your projects on this machine.	Your personal CLAUDE.md · your slash commands · your permissions · your MCP servers
Enterprise	Global config rolled out to all employees, org-wide.	Mandated CLAUDE.md · shared slash commands · auto-approved and blocked commands/URLs · org MCP servers

Permissions: auto-approve AND block, org-wide

Permissions work in both directions at the enterprise level:

Auto-approve

Check a safe command (e.g. your test runner) into enterprise policy so it's auto-approved for every employee — no per-prompt clicking.

Block (hard deny)

Block a command or a URL org-wide — e.g. a URL that should never be fetched. Employees can't override it, so it can never be fetched.

.mcp.json — ship MCP servers with the repo

Check a .mcp.json into the repository and anyone who runs Claude Code there is prompted to install those MCP servers automatically.

// .mcp.json (checked into the repo)
{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
    }
  }
}

Real example from Anthropic: their apps repo ships a Puppeteer MCP server via checked-in .mcp.json, so every engineer can drive end-to-end tests and screenshot/iterate without installing anything themselves.

Slide 11 — /memory and the `#` shortcut

Because memory is hierarchical, you need a way to see and edit it. Two tools do that.

`/memory`

Shows every memory file currently being pulled into context — enterprise policy, your user memory, the project CLAUDE.md, and any nested CLAUDE.md — and lets you open and edit a specific one.

Type `#`

The fastest way to remember something. Type #, write the note, and pick which memory file it lands in. It's incorporated immediately.

/memory            # list & edit all active memory files

# always run the test suite with `pnpm test --silent`
   ↑ press '#', type the note, choose the destination file

A common use: Claude keeps using a tool the wrong way. Hit #, correct it, and the fix is written into CLAUDE.md right away — so it sticks for future sessions instead of being re-explained each time.

Slide 12 — Recommendation: start with shared project context

If you're unsure where to begin in the hierarchy, Boris's answer is clear: shared project context. Write your CLAUDE.md, slash commands, permissions, and .mcp.json once, check them into the repo, and share them with the team.

"Write it once, share it with the team." Shared project context has a network effect — every engineer who clones the repo inherits the same commands, tools, conventions, and guardrails, and the whole team's Claude gets smarter together.

If you want…	Start here
Maximum payoff for the least effort	Checked-in project `CLAUDE.md`
Team tools available to everyone	Checked-in `.mcp.json`
Consistent, safe commands across the team	Repo (then enterprise) permissions
Personal-only tweaks	Local CLAUDE.md / global config

Slide 13 — Key Bindings: The Hidden Keyboard Layer

Boris flags these because most people never discover them — Claude Code looks like a single prompt bar, but a handful of keystrokes unlock its real power. Each one is about steering Claude mid-flight or feeding it exactly the context it needs.

Key	What it does	When Boris uses it
`Shift`+`Tab`	Accept edits / cycle into auto-accept-edits mode — file edits are auto-applied, but bash commands still need approval	When Claude is clearly on the right track or iterating on tests; you can always ask Claude to undo later
`#`	Make Claude remember something — it's incorporated into `CLAUDE.md` immediately, and you pick which memory file it goes to	"It's not using a tool correctly" — teach it once so the fix persists across sessions
`!`	Drop into bash mode and run a command locally — the command and its output both go into the context window	Long-running commands, or anything you want Claude to "see" without it running the command itself
`@`	Mention files and folders to pull them into context	Pointing Claude at the exact files/dirs that matter
`Esc`	Stop whatever Claude is doing — anytime, safely; it won't corrupt the session	Claude suggested a 20-line edit, 19 lines are perfect — stop it, correct the one line, redo
`Esc` `Esc`	Jump back through conversation history	Rewinding to an earlier point in the session
`Ctrl`+`R`	See the full output — the exact same thing Claude sees in its context window	Inspecting truncated output / understanding what the model is actually working from

Resuming work across sessions

You don't lose a session when you close the terminal. Two startup flags bring you right back.

# Pick a previous session to resume
claude --resume

# Continue the most recent session
claude --continue

The Esc key is the unsung hero. Because stopping is always safe, you can interrupt Claude freely instead of letting a near-perfect edit go wrong — interrupt, give one line of correction, and let it redo.

The difference between ! bash mode and just asking Claude to run a command: with ! you run it locally and the command + output land in context — perfect for long-running commands or output you want Claude to reason over.

Slide 14 — The SDK Is Already On Your Machine

There's no separate package to install. The -p / --print flag is the Claude Code SDK. Pass a prompt, get a result back, and exit — non-interactive, scriptable, composable.

Boris's framing: think of claude -p as a super-intelligent Unix utility. Give it a prompt, get JSON back. Pipe in, pipe out — exactly like grep, jq, or any other tool in your pipeline.

The shape of a call

You control the prompt, which tools Claude is allowed to use (including specific bash commands), and the output format — plain text, a JSON blob, or a stream of JSON you can process as it arrives.

# Basic one-shot
claude -p "summarize the failing tests"

# Constrain the toolset (allow-list specific tools/commands)
claude -p "triage this build" --allowedTools "Bash(git log:*)" "Read"

# Structured output for programmatic consumption
claude -p "list the changed files and why" --output-format json

# Streaming JSON to process incrementally
claude -p "analyze this log" --output-format stream-json

Piping: in and out

Because it reads stdin and writes stdout, claude -p slots into any shell pipeline.

# Feed it command output, parse the result with jq
git status | claude -p "which of these files are risky to commit?" --output-format json | jq '.result'

# Pull a giant log from a GCP bucket and find what's interesting
gsutil cat gs://my-bucket/prod.log | claude -p "what looks anomalous here?"

# Fetch data from the Sentry CLI and let Claude act on it
sentry-cli issues list | claude -p "group these by likely root cause"

CI

Run Claude as a step in your continuous-integration pipeline — review diffs, summarize failures, gate merges.

Incident response

Pipe logs, traces, or Sentry output straight in so Claude can triage and recommend actions during an outage.

Pipelines

Drop it anywhere data flows — read a giant log, find what matters, hand the result to the next stage.

A colleague gave a dedicated deep-dive on the SDK right after Boris's talk. His own verdict on the SDK's potential: "We've barely scratched the surface."

Slide 15 — Advanced Parallelism: Many Claudes at Once

Boris calls himself a "Claude normie" — one Claude at a time, a few terminal tabs for a few repos. But the power users he sees, inside and outside Anthropic, almost always run many Claudes in parallel.

How the power users scale up

tmux + SSH

SSH sessions and tmux tunnels into running Claude sessions, so multiple agents persist and survive disconnects.

Multiple checkouts

Several separate checkouts of the same repo, each running its own Claude, so many tasks progress at once.

Git worktrees

Use git worktree for isolation — each Claude works in its own branch/directory without stepping on the others.

# Spin up an isolated worktree so a parallel Claude can't collide with your main checkout
git worktree add ../feature-x -b feature-x
# ...then start Claude inside that directory

The unifying idea is isolation: give each parallel Claude its own workspace (a separate checkout or a worktree) so they don't corrupt each other's state. Anthropic is actively making this easier — the goal is to run as many sessions as you want and get a lot done in parallel.

Slide 16 — Q&A Gems

The hardest thing to build: safe bash

Boris named making bash commands safe as the single hardest part of building Claude Code. Bash is inherently dangerous — it changes system state in unexpected ways — yet approving every command by hand kills productivity, and not everyone runs inside Docker.

Read-only detection — some commands are recognized as read-only and treated as low risk.
Static analysis — figures out which commands can be safely combined (e.g. chained commands) before running them.
Tiered permissions — a layered allow-list / block-list system that operates at different levels (project, global, enterprise).

Fully multimodal from day one

Claude Code has always been multimodal — it's just hard to discover in a terminal. There are three ways to give it an image.

Drag & drop

Drag an image straight onto the terminal.

File path

Give it a path to an image file.

Copy-paste

Paste an image directly into the prompt.

Boris's favorite use: implementing from mocks. Drag in the mock, tell Claude to build it, and give it a Puppeteer server so it can screenshot its own work and iterate toward near-perfect.

Why a CLI and not an IDE?

At Anthropic people use a huge range of editors — VS Code, Zed, Xcode, Vim, Emacs. The terminal is the common denominator that works for everyone. And given how fast the model is improving, there's a real chance people won't be using IDEs the same way by year's end — so they deliberately avoid over-investing in UI layers that may not matter.

How much does Anthropic itself use it?

Roughly 80% of technical staff at Anthropic use Claude Code every day — including researchers, who use it for things like the notebook tool to edit and run notebooks.

Boris's throughline: don't over-prompt. Give Claude good context (CLAUDE.md), the right tools, and a way to check its own work — then let the model cook.

▶︎ Watch the full talk: youtube.com/watch?v=B_KAEqiC-0Q

🧭

SECTION 25

How-To Recipes

Copy-paste playbooks for the things everyone eventually asks "wait, how do I…?"

Mastered

⚡ WHY IT MATTERSA recipe turns "I think Claude can do that" into muscle memory — three known steps beat twenty minutes of rediscovery every single time.

A goal enters a three-step recipe card — set up, run, verify — and exits as a shipped, repeatable result, with a token moving through the steps in order.

🎬 Animation, explained

The purple dot is your task. It starts at the Goal box — a plain-English "I want to…" — and travels the whole pipeline left to right.
The three cards pulse in sequence (1 → 2 → 3) to show that recipes are ordered: you create one file or flag, run one command, then verify the result before trusting it.
The dot only reaches the orange Shipped box after passing all three steps — skipping "verify" is how recipes turn into superstitions.
The loop repeats to make the real point: once learned, a recipe costs seconds to rerun, forever.

Everything below is a complete, copy-paste recipe: the setup, the command, and how to check it worked. Skim now, come back when you need one.

1 · Pick up where you left off

Closed the terminal mid-task? Nothing is lost — every session is saved locally.

claude --continue        # reopen the most recent session in this project
claude --resume          # interactive picker of past sessions
claude -p --resume <session-id> "run the tests again"   # scriptable resume

Verify: the resumed session greets you with full memory of the earlier conversation — ask "where were we?" and it will tell you.

2 · Make your own slash command

Any prompt you type more than twice deserves a name. Slash commands are just Markdown files.

mkdir -p .claude/commands
cat > .claude/commands/fix-issue.md <<'EOF'
Find and fix GitHub issue #$ARGUMENTS.
1. Read the issue, locate the relevant code
2. Implement a fix with tests
3. Commit with a descriptive message
EOF

/fix-issue 123           # inside Claude Code — $ARGUMENTS becomes "123"

Project commands live in .claude/commands/ (shared via git); personal ones in ~/.claude/commands/ (follow you everywhere).

3 · Build your first skill

A skill is a folder with a SKILL.md — instructions Claude loads only when the task matches its description.

mkdir -p .claude/skills/release-notes
cat > .claude/skills/release-notes/SKILL.md <<'EOF'
---
name: release-notes
description: Draft release notes from merged PRs. Use when the user asks for release notes or a changelog.
---
Collect merged PRs since the last tag, group by feature/fix/chore,
write user-facing notes in our house style: short, benefit-first, no jargon.
EOF

Verify: ask "draft release notes for this week" — Claude should announce it's using the skill. See Skills for progressive disclosure details.

4 · Connect an MCP server

MCP servers give Claude live tools — databases, browsers, ticketing systems. One command each.

claude mcp add --transport http github https://api.githubcopilot.com/mcp/
claude mcp add postgres -- npx -y @modelcontextprotocol/server-postgres "$DB_URL"
claude mcp list          # verify: both servers listed and connected

Only add servers you trust — an MCP server's tools run with your permissions. Scope tokens minimally (see Security Hardening).

5 · Auto-format every file Claude edits

Hooks run real shell commands at fixed points in the loop — no more "please remember to run prettier".

// .claude/settings.json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{ "type": "command", "command": "npx prettier --write \"$CLAUDE_FILE_PATHS\"" }]
    }]
  }
}

Verify: ask Claude to edit any file, then check git diff — it should already be formatted. Full event list in Hooks.

6 · Run Claude headless in CI

-p (print mode) turns Claude Code into a scriptable Unix tool: prompt in, result out, exit.

# label new issues in a GitHub Action
claude -p "Read the issue at $ISSUE_URL and suggest 3 labels" \
  --output-format json --max-turns 5 \
  --allowedTools "Read" "Grep" "WebFetch" | jq -r '.result'

In CI, always pin --allowedTools and --max-turns — an unattended agent should have the narrowest possible blast radius.

7 · Run parallel Claudes with git worktrees

One repo, several simultaneous Claude sessions, zero collisions — each worktree is an independent checkout.

git worktree add ../myapp-feature-a feature-a
git worktree add ../myapp-bugfix-b  bugfix-b
cd ../myapp-feature-a && claude     # terminal 1
cd ../myapp-bugfix-b  && claude     # terminal 2 — fully isolated

Clean up with git worktree remove ../myapp-feature-a when the branch merges. This is the single most-recommended trick for heavy users.

8 · Learn an unfamiliar codebase fast

Point Claude at the repo and interrogate it like a senior engineer who has read every file.

claude
> /init                       # generates a CLAUDE.md map of the project
> give me a guided tour: entry points, core modules, how a request flows
> where is authentication handled, and what would break if I changed it?

Commit the generated CLAUDE.md — every future session (and teammate) starts oriented instead of lost.

9 · Ship a feature end-to-end

The full loop, using pieces from across this guide:

claude
> think hard, then plan how to add rate limiting to the API   # plan mode: Shift+Tab
> looks right — implement it with tests
> run the tests and fix any failures
> commit and open a PR with a summary of the change

Plan → approve → implement → verify → ship. The approval gate in the middle is what keeps large changes safe (see Plan Mode).

🥷

SECTION 26

CLI Hacks & Power Moves

The small tricks that compound — what separates regulars from power users

Mastered

⚡ WHY IT MATTERSNo single hack saves more than seconds — but stacked into every session, they compound into hours a week and a fundamentally faster loop.

Five hack chips fire one by one into the terminal, and the velocity bar climbs a notch with each arrival — small tricks, stacked, become real speed.

🎬 Animation, explained

Each chip on the left is one real hack from this section — a keystroke, a keyword, a pipe, a mode, a command.
The dots fire one at a time, not all at once: you adopt hacks gradually, and each one lands in the same terminal you already use.
Every arrival bumps the green velocity bar one notch — the gain per hack is small and roughly equal, which is exactly why people underrate them.
The bar only reaches the top when all five have landed, then the cycle resets: compounding is the whole story.

None of these are secrets — they're all in the docs. They're hacks because almost nobody uses them, and the people who do are visibly faster.

Keyboard fu

Keys	What it does	Why it's a hack
`Esc`	Interrupt Claude mid-response	Steer early — don't wait for a wrong answer to finish
`Esc` `Esc`	Jump back in history, edit an earlier prompt, fork from there	Undo for conversations — retry a turn without starting over
`Shift`+`Tab`	Cycle permission modes: normal → auto-accept edits → plan mode	Switch trust level per task in one keystroke
`Tab`	Complete file paths after `@`	Point at exact files instead of describing them
`↑`	Recall previous prompts (across sessions)	Re-run yesterday's prompt in two keystrokes
`Ctrl`+`B`	Send a running bash task to the background	Dev server keeps running while Claude keeps working

Terminal not passing Shift+Enter for newlines? Run /terminal-setup once. Vim habits? /vim enables modal editing in the input box.

The thinking dial

Plain-English keywords buy more reasoning on demand — no flags, no settings. Each level burns more tokens and time, so match it to the problem.

> think about why this test is flaky            # a nudge
> think hard about the race condition here      # more budget
> ultrathink: design the migration so both      # maximum deliberation
  schemas stay live during rollout

On adaptive-thinking models (Opus 5, Sonnet 5), Claude decides when to think and the effort setting controls depth. Fable 5 always thinks; Haiku 4.5 uses the extended-thinking control.

Pipe anything in, script anything out

Claude Code is a Unix citizen. stdin in, structured output out — it composes with everything you already have.

cat build-error.log | claude -p "explain the root cause in 2 lines"
git diff main | claude -p "review this diff; list risky changes only"
claude -p "list TODOs as JSON" --output-format json | jq '.result'
claude -p --output-format stream-json "..."   # event stream for tooling

Steer mid-flight, don't restart

Type while Claude works — your message queues and lands at the next step, like tapping a colleague on the shoulder: "also update the tests".
Esc then redirect — interrupt, say "stop — use the existing helper in utils.ts instead", and it course-corrects with full context.
Fork with double-Esc — went three turns down a bad path? Jump back to the fork point and try a different instruction.

Context is a budget — spend it deliberately

/context                  # see what's filling the window right now
/compact focus on the API refactor, drop the CSS discussion
/clear                    # nuke history between unrelated tasks
#remember: build with 'make fast' not 'make'   # '#' = instant memory

Starting a message with # appends it to memory (CLAUDE.md) without breaking flow — the cheapest way to make every future session smarter.

Permission hacks (safe ones)

/permissions              # allowlist tools you always approve, e.g. Bash(npm test:*)
claude --allowedTools "Read" "Grep"      # per-invocation cage for scripts

--dangerously-skip-permissions exists for containers and throwaway VMs only. On your real machine, use allowlists — one prompt-injected web page is all it takes (see Security Hardening).

Model & cost hacks

Session-only model switch — in the /model picker, press s to switch without changing your saved default.
Cheap subagents — set model: haiku in agent frontmatter (or CLAUDE_CODE_SUBAGENT_MODEL) so workers stay cheap while the main thread reasons on Opus.
Don't swap models mid-session — it invalidates the prompt cache and your next turn pays full price (see Cost & Speed).
Know what you're spending — /cost shows the session's token bill; --fallback-model sonnet keeps headless jobs alive when Opus is saturated.

One-liners worth memorizing

claude commit                          # well-formed commit from staged changes
claude -p "changelog since v2.1" --max-turns 3
claude --add-dir ../shared-lib         # work across two repos in one session
claude update                          # newer CLI = newer tricks
/doctor                                # when anything feels broken, start here

Adopt one hack per day, not all at once. In two weeks the difference is embarrassing — in the good direction.

🛟

SECTION 27

Troubleshooting & Pro Tips

Recover fast, then level up

Mastered

Most Claude Code problems are context problems in disguise — know the recovery move and you stay fast.

Performance degrades as the context window fills, so almost every fix below is about protecting that one resource. Learn the symptom, the cause, and the keystroke that resets it.

The fast triage table

Symptom	Likely cause	Fix
Replies slow, vague, or “forgetting” earlier facts	Context window near full	`/compact` at a breakpoint, or `/clear` for a new task
Output drifts off the goal	Goal buried under noise / failed attempts	Re-state the goal, switch to plan mode (`Shift`+`Tab`)
Same fix fails twice in a row	Failed approaches polluting context	`/rewind` to erase, or `/clear` and re-prompt
Constant permission prompts	No allow rules / wrong mode	`/permissions` add allow rules; set `--permission-mode acceptEdits`
“File is outside allowed directories”	Path not in working dirs	`/add-dir <path>` or restart with `--add-dir`
MCP tools missing / server red	Not connected, bad transport, or auth	`/mcp` to check status and re-auth
Edited `.mcp.json` but nothing changed	Config read only at startup	Restart the session
Installation acting weird	Broken setup / stale state	`/doctor` to diagnose health

When context is full

Two moves, two situations. Use /clear when you are starting something unrelated — fresh context is free and gives the cleanest results. Use /compact when you need to keep working on the same thread but reclaim room.

/clear                       # wipe history, start a fresh task
/compact                     # summarize history, keep key facts
/compact focus on the API changes   # steer what the summary keeps

Compact proactively at logical breakpoints (around 50–75% full) instead of waiting for auto-compaction near ~95%. You control what survives.

/context shows a live token breakdown by category so you can see exactly what is eating your window before deciding to clear or compact.

Re-anchor after a reset

Project-root CLAUDE.md and MEMORY.md are re-injected from disk after a compact, but skill descriptions and path-scoped rules are lost until re-triggered. Persist plans to a file so they survive any reset.

echo "Goal: migrate auth to OAuth. Constraints: no schema change." > PLAN.md
# after /clear or /compact:
# "Read PLAN.md and continue from step 3."

When output drifts

Drift means the model lost the thread, not that it got dumber. First, re-state the goal in one specific sentence with the file names involved. Second, drop into plan mode so it must read and think before it touches anything.

Plan mode is enforced at the tool level — Claude literally cannot edit files or run destructive commands. Toggle it with Shift+Tab (cycles normal → plan → auto-accept). Workflow: explore → plan → code.

If a correction fails twice, stop correcting. Rewinding erases the failed attempt entirely; correcting leaves the bad approach in context where it keeps misleading the model.

Esc          # stop Claude mid-action, context preserved
Esc Esc      # open rewind menu (or /rewind) to restore prior state

Permission & sandbox issues

Prompts are a feature, but you can tune the friction. Add durable allow rules with /permissions, or start the session in a looser mode for trusted work.

claude --permission-mode acceptEdits   # auto-approve file writes + mkdir/mv/cp
claude --permission-mode plan          # read-only until you approve a plan
claude --allowedTools "Bash(git*),Edit" # auto-approve specific tools only

For “outside allowed directories” errors, grant access to the extra path. Note that --add-dir grants file access but does not load that dir’s .claude/ config.

/add-dir ../shared-lib              # in-session
claude --add-dir ../apps ../lib    # at launch

Avoid --dangerously-skip-permissions for routine work. Prefer --allow-dangerously-skip-permissions, which only adds bypassPermissions to the Shift+Tab cycle so you can start safely in plan mode and switch to it later.

MCP server not connecting

Start with /mcp — it shows every server, its connection status, available tools, and is where you complete OAuth for remote servers. From there, check the three usual causes: transport, scope, and restart timing.

Run /mcp and read the status. Red usually means a crashed subprocess or an unfinished OAuth.
Verify the transport. Cloud servers need --transport http; sse is deprecated; omit the flag for local stdio servers.
If you just edited .mcp.json, restart — it is read only at session start.
For a project-scoped server, approve it at the first-use prompt (a guard against cloned repos auto-launching processes).

claude mcp add --transport http docs https://code.claude.com/docs/mcp
claude mcp add playwright -- npx -y @playwright/mcp@latest
claude mcp get docs           # shows which scope holds it
claude mcp reset-project-choices   # re-trigger project approval prompts

All flags (--transport, --env, --scope, --header) must come BEFORE the server name. For stdio, everything after -- is the launch command, passed untouched.

Claude Code reads only ~/.claude.json and the project-root .mcp.json. It does NOT read ~/.claude/mcp.json — a common reason a server “won’t load.”

High-leverage pro habits

One scope per session

Fresh context is free. Finish a task, /clear, start the next. Don’t let unrelated work share a window.

Plan before you build

Explore, plan in plan mode, then execute. Front-loading context writes a stable prefix that caches — driving 90%+ cache hits.

Be specific

Name the files, state constraints, point to an example pattern. Precise instructions mean far fewer corrections.

Scope investigations

Never say a bare “investigate” — it floods context. Delegate research to a subagent that returns a short summary.

Prune CLAUDE.md

For each line ask: would removing this cause a mistake? A bloated memory file makes Claude ignore the rules that matter.

Read narrow

File reads dominate context. Read only the ranges you need and prefer rg/grep over dumping whole files.

Route by cost

Haiku for simple/high-volume, Sonnet for most work, Opus for the hardest reasoning. Switch with /model.

Reserve ultrathink

Escalate ‘think’ → ‘think hard’ → ‘ultrathink’ only for genuinely hard reasoning. More thinking aids depth, not output compliance.

Persist plans and decisions to disk (PLAN.md) and break large work into per-session phases with a /clear between them. External files survive any context reset; conversation history does not.

Avoid cache-killers mid-session: switching models, editing CLAUDE.md, and /compact all invalidate the prompt cache and trigger a slow, expensive uncached turn. Batch those changes at natural breakpoints.

📋

SECTION 28

Master Cheat Sheet

The one tab you keep open

Mastered

Every flag, command, shortcut, and "which tool when" call in one scannable place.

CLI Flags (the daily ten)

Flag	Alias	Does
`--print`	`-p`	Headless: run once, print, exit. Pipe-friendly.
`--continue`	`-c`	Resume the most recent conversation in this dir.
`--resume`	`-r`	Resume by ID/name, or open the picker.
`--model`		Set model: `sonnet`, `opus`, or full id.
`--permission-mode`		`default` / `acceptEdits` / `plan` / `auto` / `dontAsk` / `bypassPermissions`.
`--add-dir`		Add extra working dirs for read/edit.
`--output-format`		`json` for scriptable output (grab `session_id`).
`--allowedTools`		Auto-approve specific tools, no prompt.
`--name`	`-n`	Name the session for later `--resume`.
`--bare`		Skip all auto-discovery for fast, reproducible CI.

Copy-paste launches

# Interactive with a starting prompt
claude "refactor the auth module"

# Headless, pipe content in
cat logs.txt | claude -p "explain the errors"

# Scripted continuation in the current dir
claude -c -p "Check for type errors"

# Start in plan mode (read-only until you approve)
claude --permission-mode plan

Capture a session ID and reuse it in a script.

session_id=$(claude -p "Start a review" --output-format json | jq -r '.session_id')
claude -p "Continue that review" --resume "$session_id"

--dangerously-skip-permissions bypasses every prompt. Prefer --permission-mode dontAsk plus an --allowedTools allowlist for locked-down automation.

Slash Commands

Context & cost

Command	Does
`/clear`	Wipe history, free context. Use between unrelated tasks.
`/compact`	Summarize to reclaim context. Steerable: `/compact focus on the API changes`.
`/context`	Live token breakdown by category.
`/cost`	Session token usage and spend.
`/memory`	Open CLAUDE.md files in your editor.
`/rewind`	Restore prior conversation/code state (erases the failed attempt).

Models & agents

Command	Does
`/model`	Switch model: `opus`, `sonnet`, `haiku`, `default`, or picker.
`/effort`	Set reasoning effort `low/medium/high`; `auto` resets.
`/agents`	Create/edit subagents; Running tab shows live ones.
`/mcp`	Server status, tools, OAuth auth, per-server cost.
`/plugin`	Discover, install, enable/disable plugins.

Project & review

Command	Does
`/init`	Scan repo, generate CLAUDE.md.
`/review [PR]`	Code review of changes or a specific PR.
`/security-review`	Security-focused analysis of current changes.
`/resume`	Pick a past conversation to continue.
`/permissions`	Manage allow/deny rules.
`/doctor`	Diagnose installation health.

REPL Shortcuts

Key / Token	Action
`Shift`+`Tab`	Cycle normal / plan / auto-accept mode.
`Esc`	Stop Claude mid-action; context preserved.
`Esc` `Esc`	Open rewind menu (same as `/rewind`).
`Option`+`T` / `Alt`+`T`	Toggle extended thinking.
`Ctrl`+`B`	Background a running task.
`@path/to/file`	Reference file contents in a prompt.
`@agent-name`	Guarantee a specific subagent runs.
`!command`	Run bash; output injected into context.
`#` (line start)	Quick-append a line to CLAUDE.md memory.

Rewinding a failed attempt erases it from context entirely. That beats correcting, which leaves the dead-end polluting your window.

Thinking Budget

Keywords escalate reasoning depth in your prompt. More thinking means deeper reasoning, not better rule-following.

think  <  think hard  <  think harder  <  ultrathink (max)

Reserve ultrathink for architecture or hard multi-file debugging. Use think / think hard for everyday moderate tasks.

Model Picker

Model	$/MTok in/out	Context	Use for

Prices verified July 28, 2026. Sonnet 5 is shown at its temporary introductory price; Mythos 5 is limited-access rather than a normal routing choice.

Rule of thumb: start with Haiku, move up only when a capability gap shows. Switching models mid-session invalidates the prompt cache.

Which Tool When

CLAUDE.md

When: facts Claude needs every session. Why: re-injected after compaction. Non-guessable commands, code style, gotchas. Prune ruthlessly.

Skill

When: a repeatable how-to that loads only on demand. Why: metadata costs ~100 tokens; the body loads when your request matches. .claude/skills/<name>/SKILL.md.

Subagent

When: verbose, self-contained work that returns a summary. Why: runs in its own fresh context, then reports back. Great for parallel research.

MCP

When: Claude needs external systems (DBs, APIs, SaaS). Why: standard tool protocol. claude mcp add; tools namespaced mcp__server__tool.

Hook

When: deterministic action the model must not skip. Why: shell/HTTP at lifecycle points. Auto-format on Edit, block bad Bash, run tests on Stop.

Plugin

When: bundle and share many of the above. Why: one installable directory of skills, agents, hooks, MCP. /plugin install name@marketplace.

Config-File Map

What	User level	Project (shared)
Memory	`~/.claude/CLAUDE.md`	`./CLAUDE.md`
Settings	`~/.claude/settings.json`	`.claude/settings.json`
Skills	`~/.claude/skills/`	`.claude/skills/`
Subagents	`~/.claude/agents/`	`.claude/agents/`
MCP	`~/.claude.json`	`.mcp.json`
Hooks	`settings.json` (hooks key)	`settings.json` (hooks key)

Untracked local overrides go in .claude/settings.local.json (gitignored). MCP local scope lives in ~/.claude.json, not that file.

Quick Recipes

# Add a remote MCP server (project-shared, checked in)
claude mcp add --transport http --scope project docs https://code.claude.com/docs/mcp

# Add a local stdio MCP server
claude mcp add playwright -- npx -y @playwright/mcp@latest

# Auto-format every edit (PostToolUse hook in settings.json)
{"hooks": {"PostToolUse": [{"matcher": "Edit|Write",
  "hooks": [{"type": "command",
  "command": "jq -r .tool_input.file_path | xargs prettier --write"}]}]}}

Hook exit codes: 0 = success, 2 = blocking error (stderr fed back to Claude), other non-zero = non-blocking. Use 2 to enforce policy.

Custom Slash Command (60 seconds)

Create .claude/commands/optimize.md (filename becomes /optimize).
Add YAML frontmatter: description, argument-hint, optional allowed-tools, model.
Write the prompt body. Use $ARGUMENTS, $1, @path, and !cmd for dynamic content.

---
description: Optimize a file for performance
argument-hint: [file path]
allowed-tools: Read, Edit, Bash(git diff:*)
---
Review @$1 for performance issues and apply safe fixes.
Current diff:
!git diff $1

Front-load context in plan mode, then execute. A stable prefix gets cached once, so every later turn reads from cache (90%+ hit rates) instead of recomputing. Fresh context is free, so /clear between unrelated tasks.

🧪

PRACTICE LAB

Try It — Hands-On Challenges

Reading builds recognition; doing builds skill. Solve real situations, get instant feedback.

Mastered

You've learned the tools — now use them. Each challenge drops you into a real situation. Type the command (or pick the move), check your answer, and unlock a full walk-through with the why behind it.

Everything runs in your browser — nothing is sent anywhere, and your progress is saved locally. Stuck? Hit Hint anytime, or reveal the solution after a try.

0SOLVED

0CHALLENGES

0BEST STREAK

🎉 Lab complete — every challenge solved!You didn't just read about Claude — you drove it. Take the Mastery Exam next to prove it.

🌍

SECTION 29

What Is AI, Really?

Strip away the hype: AI is just software doing jobs we thought needed a brain.

Mastered

⚡ WHY IT MATTERSKnowing AI's nested layers helps beginners see that chatbots like Claude are one slice of a much bigger field.

Concentric rings show AI containing Machine Learning, then Deep Learning, then Generative AI and LLMs, with example chips and a pulse moving inward to the Claude core.

🎬 Animation, explained

The rings are containment, not a timeline: AI is the biggest circle, Machine Learning sits inside it, Deep Learning inside that, and Generative AI / LLMs innermost.
The example chips on each ring anchor the abstraction — a chess engine can be AI without being ML; a spam filter is ML without being deep.
The pulse travels inward, tracing the path from the broad idea down to the specific technology Claude is built on.
Takeaway: when someone says "AI", find which ring they mean — most 2026 headlines are really about the innermost one.

Before we touch a single buzzword, let's nail down what "AI" actually means. Short version: it's software that does tasks we used to assume needed a human brain. That's it. No glowing red eyes required.

One plain sentence

Artificial Intelligence is software that performs tasks we once thought required human intelligence — spotting spam, suggesting a song you'll like, finding the fastest route, answering a messy question in plain English. When you put it that way, "AI" stops sounding like science fiction and starts sounding like Tuesday.

The two ways software can be smart

This next idea is the spine of the entire track, so it's worth slowing down for. There are really only two ways to build something "intelligent," and the difference is who writes the rules.

Rules-based (hand-written)

A human writes every if-this-then-that by hand. "IF the email contains 'FREE VIAGRA' THEN mark as spam." Predictable and easy to inspect — but a human has to imagine every case in advance, and spammers are creative.

Learned (figured out from examples)

You don't write the rules. You show the system thousands of examples ("these 10,000 were spam, these weren't") and it figures out the patterns itself. Flexible and powerful — but harder to peek inside and explain.

Almost everything people call "AI" today is the learned kind. The rules-based kind is older and still everywhere (your tax software, a thermostat) — it's just not what's making headlines.

The nesting dolls

Here's where the jargon usually gets confusing — AI, ML, Deep Learning, Generative AI, LLMs. They sound like rivals. They're not. They're Russian nesting dolls (matryoshka): open the big one and a smaller, more specialized one sits inside.

Doll	What it is	Everyday face
AI (biggest)	The whole idea: machines doing brain-like tasks	The entire field
Machine Learning	AI that learns rules from examples instead of being hand-coded	Spam filter, maps ETA
Deep Learning	ML using many-layered "neural networks" — math loosely inspired by brain cells, great with images, audio, and language	Face unlock, voice typing
Generative AI / LLMs (smallest)	Deep learning that creates new text, images, or code (an LLM = Large Language Model, the text kind)	Claude, ChatGPT

Each doll is a more specific kind of the one wrapped around it. So Claude is a Large Language Model, which is Generative AI, which is Deep Learning, which is Machine Learning, which is AI. All true at once. No contradiction.

Narrow vs general — set your expectations

Today's AI is narrow: brilliant at one job. A spam filter can't drive a car. Claude can write a poem but can't water your plants. AGI — Artificial General Intelligence, a do-anything, human-level mind — does not exist yet, no matter what a flashy demo implies.

When you hear "AI can do anything now," translate it to "this narrow AI is very good at its one thing." That mental swap will save you a lot of disappointment.

The whole story in one breath

You don't need the full history — just the shape of it. Read this as a single timeline:

1950 — Alan Turing asks "can machines think?" The field gets its name a few years later, in 1956.
1980s — Expert systems: giant hand-written rulebooks. Clever, but brittle.
1990s–2000s — Machine Learning takes over: let the data write the rules.
2012 — Deep learning explodes after neural nets crush an image-recognition contest.
2017 — The transformer is invented — the engine behind every modern LLM.
2022 — ChatGPT launches and suddenly everyone's grandma knows the word "AI."
Today — Agents: AI that doesn't just answer, but takes actions for you.

It's already in your pocket

If "AI" still feels abstract, look around — you've been using it for years:

Spam filter

Quietly sorting your inbox, learning what junk looks like.

Maps ETA

Predicting traffic to guess your arrival time within minutes.

Recommendations

The "you might also like" behind Netflix, Spotify, and your shopping cart.

Claude

An LLM you can just talk to in plain language.

Fun fact: your email spam folder was doing machine learning before it was cool — quietly judging your inbox since the late 1990s.

Most of today's magic is the learned kind of AI. So in the next section, let's open that doll and see how a machine actually learns from examples.

📈

SECTION 30

Machine Learning: Learning from Data

Stop writing the rulebook — hand the machine the photo album and let it figure out "cat."

Mastered

⚡ WHY IT MATTERSMachine learning powers nearly every modern AI product, turning past data into predictions about new, unseen situations.

A two-phase flow: training feeds labeled data into a model that adjusts its weights in a loop, then inference sends a new input through the trained model to get a prediction.

🎬 Animation, explained

Phase one (left) is training: labeled examples pour in, and the loop arrow shows the model adjusting its weights after every miss — guess, check, nudge, repeat, millions of times.
The weights dial visualizes what "learning" physically is: numbers shifting until the guesses stop being wrong.
Phase two (right) is inference: a brand-new input flows once through the now-frozen model and comes out as a prediction. No more learning happens here.
Takeaway: training is expensive and rare; inference is cheap and constant. Every Claude reply you get is inference.

Here's the big flip this section owns: instead of a human painstakingly hand-coding rules, machine learning lets the computer learn the patterns straight from data. You don't tell it what a cat is — you show it, and it works the rest out itself.

The cat-photo story

Imagine teaching a child what a cat is. You don't hand them a rulebook saying "pointy ears, whiskers, four legs, smug expression." You just point at cats over the years and say "cat." Eventually the child recognizes a cat they've never seen before — even a weird hairless one.

Machine learning does the same thing. Show a model a giant album of labeled cat photos and it learns "cat-ness" on its own. Nobody writes the rules; the rules emerge from the examples. That's the whole shift.

The working vocabulary

Four words unlock almost every ML conversation. Learn these and you're fluent enough to follow along.

Features

The inputs — the raw stuff the model looks at. For a photo, the pixels. For a house price, the square footage and zip code.

Labels

The answers we want — "cat" or "not cat," the actual sale price. The correct response we're trying to predict.

Model

The thing that learns — the recipe that turns features into a guess at the label.

Parameters / weights

The dials the model tunes. Training is just nudging these numbers until the guesses get good. (Weights are the most common kind of parameter.)

Three ways a machine can learn

There are three big "learning styles," called paradigms. One line each:

Paradigm	How it learns	Everyday picture
Supervised	From labeled examples (features + correct answers)	Flashcards with the answer on the back
Unsupervised	Finds hidden structure with no labels at all	Sorting a messy drawer into natural piles
Reinforcement	By trial and error, chasing rewards	Training a dog with treats

The cat-album example is supervised learning — every photo came with a label. That's the most common starting point, and the one to keep in your head.

Two phases of a model's life: training vs inference

This split shows up everywhere in AI, so it's worth nailing now.

Training = studying

The model pores over the examples and adjusts its dials, over and over, until its guesses match the labels. Slow, expensive, and done in big batches (then occasionally repeated to keep the model fresh).

Inference = taking the test

The trained model meets brand-new data and makes a prediction. Fast, cheap, and what happens every time you use it. Like flipping a flashlight on — you don't need to know the electronics to use it.

You'll meet "training vs inference" again and again across this track. Whenever a system is "learning," it's training; whenever it's "answering," it's inference.

Memorizing vs actually understanding

Here's the trap every model can fall into. We don't want it to memorize the album — we want it to understand cats well enough to handle ones it's never seen. That ability to handle new data is called generalization, and it's the entire point.

The failure is called overfitting. Overfitting is the student who memorizes every past exam word-for-word, then panics when the teacher changes the numbers. They aced the practice test and bombed the real one — because they learned the answer key, not the subject.

A model that scores perfectly on its training data but flops on new data is overfitting. Great on the practice test, useless in the real world. We always judge a model on data it has never seen.

Rule of thumb: if results look too perfect on the training set, be suspicious. Real understanding means doing well on fresh examples, not reciting old ones.

So far we've treated "the model" as a magic box that tunes dials. The most powerful ML models are built from layered networks of these dials stacked together — let's look under that hood next.

🕸️

SECTION 31

Neural Networks & Deep Learning

Inside every AI is a giant mixing board of dimmer switches that learned to tune itself.

Mastered

⚡ WHY IT MATTERSNeural networks power modern AI, learning patterns from data to recognize images, understand language, and make predictions.

A feed-forward network passes a signal pulse left-to-right from 3 inputs through two hidden layers to 2 outputs (the forward pass).

🎬 Animation, explained

The three dots on the left are inputs — raw numbers standing in for pixels, words, anything measurable.
The pulse sweeps through two hidden layers, and every connection it crosses multiplies the signal by a learned weight — small transformations stacking up.
Early layers catch simple patterns; deeper layers combine them into bigger ideas — that stacking is the "deep" in deep learning.
The two dots on the right are outputs (say, "cat" vs "dog" scores). One left-to-right sweep = one forward pass = one prediction.

Remember the nesting doll from earlier? Open the "machine learning" doll and inside sits deep learning. Let's crack that one open too — because it turns out to be built from millions of copies of one tiny, almost embarrassingly simple gadget: the artificial neuron. Build one, stack a bunch, and you've got the engine behind nearly every modern AI.

Step one: build a single neuron

An artificial neuron is basically a tiny calculator. It does three things in a row:

Take some inputs — numbers coming in (say, the brightness of pixels, or "was it raining?").
Multiply each by a weight, then add them up — the weight says how much that input matters. Big weight = "pay attention to this." Near-zero weight = "ignore it."
Push the total through an activation function — a gate that bends the result so the network can learn curvy, non-straight-line patterns. Loosely: a bouncer who decides how much of the signal gets waved through.

That's the whole neuron. Inputs in, one number out. Not magic — arithmetic.

Step two: stack neurons into layers

One neuron makes one little decision. The power comes from wiring thousands of them together in layers, where each layer's output feeds the next.

Input layer

Where the raw data enters — the pixels, the words, the sensor readings. No thinking happens here; it's just the loading dock.

Hidden layers

The actual processing. Each layer transforms what the last one found into something more abstract. This is where the real work lives.

Output layer

The final answer — "cat," "spam," or the next word in a sentence.

And "deep" learning? It just means there are many hidden layers stacked between input and output. That's the entire secret behind the scary-sounding word. Deep = lots of layers. That's it.

The knobs: weights are everything

Here's the anchor image to keep in your head. Picture a giant sound-mixing board covered in dimmer-switch sliders. Each weight is one slider. The activation function is the gate each signal passes through after the sliders set its level. A real network has millions (or even billions) of these sliders.

When people say a model is "trained," they mean exactly one thing: all those sliders got nudged up and down until the output came out right. Training is just tuning the knobs. And here's the kicker — no human touches them by hand. The network adjusts its own sliders, which is the next thing to explain.

How it actually learns (the loop)

So how do you tune millions of sliders with nobody touching each one? You let the network do it, in a repeating four-beat loop:

Beat	Plain-English meaning
Forward pass	Run data through and make a guess (it'll be terrible at first).
Loss	Measure how wrong the guess was — one number for "how bad."
Backpropagation	Trace the error backward to figure out which sliders caused it, and which way each should move.
Gradient descent	Nudge every slider a little in the helpful direction. Repeat — often millions of times.

Gradient descent is the all-time favorite analogy here: imagine a hiker descending a foggy mountain, feeling which way is downhill and taking one careful step at a time. "Downhill" means "less error." Do that enough and you reach a valley floor — a network that's usually right.

The model never gets told the right weights. It just keeps measuring its own mistakes and shrinking them, one tiny nudge per knob, until the output stops being wrong.

Why deep beats hand-crafted features

Before deep learning, humans had to hand-engineer the clues a program looked for — "an edge here, a curve there." Tedious, brittle, and capped by human imagination.

Deep networks flip that. Given enough examples, they discover their own hierarchy of clues: early layers find edges, middle layers assemble edges into shapes, deeper layers assemble shapes into things like faces or wheels. Nobody programmed those stages — they emerged from turning the knobs.

The turning point was 2012, when a deep network (nicknamed AlexNet) crushed the ImageNet image-recognition contest, beating hand-built methods by a stunning margin. It was basically AI's "oh — the dimmer switches actually work" moment, and the field never went back.

One honest caveat

The "it's like a brain" metaphor is handy but oversold. Real neurons are far more complex than a weighted sum, and brains don't learn by backpropagation or matrix math. "Neural network" is an inspired nickname, not a biology lesson — great for intuition, wrong as literal anatomy.

Where this is heading

You now know the whole engine: neurons, layers, weights as knobs, and a learning loop that tunes them by chasing lower error. Now take a network like this and make it enormous — billions of knobs — then feed it a huge chunk of all the text humans have ever written. Do that, and what you get is a Large Language Model. That's exactly where we go next.

💬

SECTION 32

What Is an LLM?

A giant autocomplete that picked up coding and translation as accidental side hobbies.

Mastered

⚡ WHY IT MATTERSN/A

An animated view of next-token prediction: a prompt enters the LLM, which outputs token probabilities, then loops the top token back into the sentence.

🎬 Animation, explained

The prompt slides into the LLM, which reads everything it has so far.
Out comes not an answer but a probability list — every possible next token, ranked by likelihood.
The winning token loops back and is glued onto the sentence, and the whole process immediately runs again on the longer text.
Takeaway: that's the entire trick, repeated fast — autocomplete scaled up a trillion-fold. Everything impressive about LLMs emerges from this loop.

You've used an LLM if you've ever talked to Claude, ChatGPT, or Gemini. But under the hood, the thing doing all that brilliant work is doing something almost comically simple: guessing the next chunk of text, over and over.

The one-line definition

A Large Language Model (LLM) is a very large neural network (remember those from the last section?) trained to do one simple thing: predict the next token — a token being a small piece of text, roughly a word or part of a word. That's it. Everything else — the essays, the code, the patient explanations — grows out of that single skill done at an absurd scale.

What does "large" actually mean?

Two things are large here, and both matter:

Billions of parameters

Parameters are the tunable "weights" inside the neural net — the knobs it adjusts while learning. Modern LLMs have billions to trillions of them. More knobs means more nuance it can capture.

A giant text diet

It was trained on an enormous slice of the text humans put online, plus mountains of books. Think of it as having read more than any person could in many lifetimes.

"Predict" doesn't mean "look up"

When we say it predicts, we mean it models probabilities over what comes next. Type "I'll be there in ___" and your phone suggests five minutes. An LLM does the same basic trick — just trained on a huge chunk of the internet instead of your text history, and scaled up enormously. Same idea, wildly more power.

The big misconception to kill right now: an LLM is not a database or a search engine looking up stored answers. There's no filing cabinet inside it with "capital of France → Paris." Instead, Paris comes out because that pairing dominated its training data. It's a pattern predictor generating plausible text from learned weights — a statistical map of how language works, not a lookup table.

	Search engine	LLM
What it does	Points you to where an answer might live	Tries to produce the answer itself
Source	Indexed web pages	Patterns baked into its weights
If it's wrong	It showed you a page; you can judge the source	It can state a wrong answer fluently and confidently

How it writes: the autoregressive loop

Here's the surprising part — the model doesn't compose a whole answer at once. It builds every response one token at a time, in a tight loop:

Predict a probability for every possible next token, given everything so far.
Pick one token from that distribution (often, but not always, a top-ranked one).
Append it, feed the whole thing back in, and predict again.
Repeat — sometimes thousands of times — until the answer is complete.

This predict-pick-append-repeat cycle is called autoregressive generation. Every answer you've ever gotten from an LLM is this loop run over and over. It's a bit like writing a sentence where each word is chosen knowing only the words before it.

Because step 2 involves a bit of controlled randomness, the same question can produce slightly different wording each time — that's a feature, not a glitch.

The plot twist: emergence

Here's what made LLMs a genuinely big deal. At small scale, a next-token predictor is just a clever autocomplete. But push the size up far enough — more parameters, more data — and abilities that weren't obvious at small scale start to show up: translating languages, writing working code, multi-step reasoning. Nobody hand-coded those in.

It learned to write working code and translate French essentially as a side effect of getting really, really good at finishing your sentences — nobody told it to. Researchers call these surprise skills emergent abilities, and they're a big part of why the field exploded.

Meet the players

You already know the names. Claude (Anthropic), GPT (OpenAI), Gemini (Google), and Llama (Meta) are all LLMs. Different companies, different training, same core machine: a giant neural net predicting the next token.

One more honest detail before we move on: the model never actually sees words. It sees tokens — the puzzle pieces text gets chopped into. Let's go meet them next.

🔤

SECTION 33

Tokens, Embeddings & Vectors

Models don't read words — they read tokens, then turn them into coordinates for meaning.

Mastered

⚡ WHY IT MATTERSTokens and embeddings are how AI turns text into numbers it can compute with, so meaning becomes math.

A pipeline showing raw text split into tokens, each mapped to an embedding vector, then placed as a point in a 2D space where similar words cluster.

🎬 Animation, explained

Stage one chops raw text into tokens — word-chunks, not letters or whole words. Watch where the split lines fall.
Stage two maps each token to an embedding: a long list of numbers acting as GPS coordinates for meaning.
Stage three drops each token onto the 2D map — and similar meanings land close together: 'dog' near 'puppy', far from 'invoice'.
Takeaway: models never see words, only these coordinates — which is why they can reason about meaning and also why they can't count letters.

Here's the plot twist nobody tells beginners: a language model never actually reads words, or even letters. It reads tokens — bite-sized chunks of text. Everything else in this guide is built on top of that one fact.

What's a token?

A token is a sub-word chunk: sometimes a whole word, sometimes a piece of one, sometimes just a space or a punctuation mark. As a rough rule of thumb for English, one token is about 4 characters, or roughly 0.75 of a word. So 100 words lands somewhere near 130 tokens.

Why chop words up? Because keeping a separate entry for every word ever written would be wildly inefficient. Instead, the model learns a kit of reusable pieces. Take unbelievable — it can split into un / believ / able. Those same pieces snap back together to build unthinkable, believer, readable, and thousands more. Think of tokens as the LEGO bricks of text. (Different models slice slightly differently, so the exact pieces vary — but the idea is identical.)

Want to see it? Search for an online "tokenizer" tool, paste in a sentence, and watch your text get sliced into colored chunks. It's the fastest way for this to click.

From token to meaning: the embedding

A model can't do math on the letters c-a-t. So each token gets converted into an embedding — a long list of numbers (a vector) that captures its meaning. Not the spelling. The meaning.

On its own, a token's embedding captures its general meaning, the dictionary-ish gist. The richer, in-context meaning (is "bank" a river or a vault?) gets layered on later by the transformer — that's the next section.

The anchor idea: GPS coordinates for meaning

Think about how GPS works. Two numbers — latitude and longitude — pinpoint any place on Earth. Grand Blanc, Michigan is just a pair of coordinates, and so is Detroit, and because they're close in real life, their numbers are close too.

An embedding does the same trick, but for meaning instead of geography. Its list of numbers pinpoints a word in an imaginary "meaning-space." Words with similar meanings land near each other, like houses in the same neighborhood: cat, kitten, and feline cluster together, while helicopter sits way across town.

Meaning even has direction

The famous demo: take the vector for king, subtract man, add woman, and you land remarkably close to queen.

king − man + woman ≈ queen

That's not a parlor trick — it hints that meaning is geometric and directional: a rough "royalty" direction and "gender" direction actually exist as movements through the space.

Treat this as a tidy ideal, not a guarantee. Real embeddings are messier than the clean textbook example — but the intuition (meaning has shape and direction) is genuinely how it works.

Why you should care: cost and the clock

Tokens aren't just an internal detail — they're the unit of money and the unit of memory. You're billed per token (in and out), and the model can only hold so many at once.

Tokens = your wallet

Providers price per token. A long document in plus a long answer out means more tokens, which means a bigger bill. Concise prompts are cheaper prompts.

Tokens = your budget

The context window is the total number of tokens the model can hold at one time — prompt, conversation, and reply combined. Fill it up and the oldest stuff falls out the back.

The strawberry problem

This is why an LLM can write you a flawless sonnet but fumble "how many R's are in strawberry." It sees chunky tokens, not individual letters — counting characters isn't its native language.

Text	≈ Tokens
One short word ("cat")	1
"unbelievable"	~3 (un / believ / able)
A 100-word paragraph	~130
A 10-page document	~5,000+

We'll come back to the context window when we talk about inference — for now, just hold the picture: it's a token budget, and you spend it as you go.

Tokens and embeddings are the ingredients. Up next, the transformer is the machine that takes all these meaning-coordinates and lets them talk to each other.

⚡

SECTION 34

Transformers & Attention

In 2017 one paper taught machines to read every word at once — and lit the fuse on ChatGPT.

Mastered

⚡ WHY IT MATTERSSelf-attention lets a model decide which earlier words matter most, the breakthrough that powers every modern LLM.

The word "it" sends pulsing attention beams to every other word, with the thickest, brightest beam landing on the noun it truly refers to ("animal").

🎬 Animation, explained

The highlighted word "it" is ambiguous on its own — the sentence only works if the model figures out what "it" refers to.
The beams are attention: "it" simultaneously queries every other word in the sentence, asking "are you relevant to me?"
Beam thickness = attention weight. The thickest, brightest beam locks onto the noun "it" actually refers to ("animal") — the model resolved the reference by weighting, not rules.
Takeaway: this happens for every word toward every word, in parallel — that's the transformer's core move, and why context makes LLMs smart.

Back at our 2017 stop on the timeline, a paper with the cheeky title "Attention Is All You Need" introduced the transformer — the architecture humming under the hood of virtually every modern LLM. The title was a flex, and for once it held up: attention really was the key idea, and it kicked off the entire ChatGPT era.

The heart of it: self-attention

Here's the one idea that matters. As the model reads, every token (a word or word-piece) looks at the other tokens and decides how much each one matters to its own meaning. That's it. That's self-attention.

Picture a sentence: "The trophy didn't fit in the suitcase because it was too big." What does "it" mean — the trophy or the suitcase? You knew instantly. Self-attention is how the model figures it out too: the word "it" glances at the other words, scores which one is most relevant, and locks onto "trophy." Swap "big" for "small" and the answer flips to "suitcase" — and the model's attention flips right along with it.

Think of a lively dinner party. Every word is a guest who listens to all the other guests at once and decides whose comment matters most for understanding the conversation. That simultaneous, selective listening — everyone weighing everyone else in parallel — is self-attention in one image.

Why it crushed the old way (RNNs)

Before transformers, models called RNNs (recurrent neural networks) read text the way you'd read with a finger under each word — one word at a time, left to right, trying to remember everything that came before. Slow, and forgetful: by the end of a long paragraph, the start had faded.

Attention reads the whole input at once. Two huge wins fall out of that:

Parallelizable

All tokens are processed simultaneously, not single-file. That's what makes training on internet-sized text feasible — and fast on modern hardware (GPUs love doing many things at once).

Long-range context

A word near the end can directly "talk to" a word near the start in a single step, instead of passing the message hand-to-hand. Distant connections stay sharp instead of fading.

	RNN (the old way)	Transformer / attention
Reading style	One word at a time	All words at once
Speed	Sequential, slow to train	Parallel, fast to train
Distant words	Fades over distance	Connects directly

Multi-head attention: several viewpoints at once

One round of attention is good. Transformers run several in parallel — called heads — and each head can learn to notice a different kind of relationship. Back at the dinner party: one guest tracks grammar (who's the subject, who's the verb), another tracks topic (which words are about the same thing), another tracks who-refers-to-whom (that "it" → "trophy" link). Combine all those viewpoints and you get a far richer read of the sentence than any single pass could give.

Stacked deep — the "deep" in deep learning

One block of attention isn't the whole model. These blocks are stacked many layers high, each one's output feeding into the next. Early layers tend to catch simple patterns; later layers build toward grammar, meaning, and intent — the same hierarchy-of-abstraction idea from the neural-networks section, just taller. That stacking is the "deep" in deep learning.

One honest catch: attention by itself is order-blind. On its own it treats the input like a bag of words — "dog bites man" and "man bites dog" would look identical to it. Very different news stories. So position information is added in separately (called positional encoding) to tell the model where each word sits. Attention figures out what relates to what; position tells it what came where.

If you remember one thing: a transformer reads everything at once and lets every word decide which other words to listen to. That single trick — selective listening, in parallel, stacked deep — is what powers today's LLMs.

So now we have the architecture. But an empty transformer knows nothing — it's a brilliant student who hasn't opened a single book yet. How do you actually teach it? That's training, and it's our next stop.

🏋️

SECTION 35

Training, Fine-Tuning & RLHF

A model goes to school in three stages: read the library, take a class, get coached.

Mastered

⚡ WHY IT MATTERSTraining stages turn a raw text-predictor into a helpful, safe assistant that follows instructions and human values.

A three-stage pipeline (Pretraining to SFT to RLHF) with a token traveling left to right, getting brighter and more aligned at each stage.

🎬 Animation, explained

The traveling token starts in pretraining: reading a huge slice of the internet just to learn how language works. It comes out knowledgeable but wild.
Stage two, SFT (supervised fine-tuning), shows it curated examples of good behavior — the token visibly brightens as it learns the assistant format.
Stage three, RLHF, adds human thumbs-up/thumbs-down until helpful, honest answers score highest — the token exits brightest and aligned.
Takeaway: raw capability comes from stage one; everything that makes Claude pleasant and safe comes from stages two and three.

A finished AI assistant isn't built in one step — it's educated in three. Picture a model going to school: first it reads the entire library, then it takes a class on answering questions, then mentors coach it until it's genuinely useful.

The three-class education of a model

Everything in this section happens during training — the slow, expensive "studying" phase where the model's internal numbers (its weights) get tuned. Once training wraps up, those weights are locked in and the model ships out to do its job (that's inference, covered elsewhere). Here are the three classes it takes, in order.

1. Pretraining — read the library

The model reads internet-scale text and plays one game trillions of times: predict the next token (a token is a word or word-fragment). No humans label anything — the text is its own answer key: hide what comes next, guess it, check, repeat. This is where nearly all its language skill and world knowledge come from.

2. SFT — take a class

Now it studies curated (instruction, response) pairs written by people, so it learns to actually follow instructions instead of just continuing text. This mostly shapes behavior and tone — not new facts.

3. RLHF — get coached

Humans rank competing answers. A "reward model" learns those preferences, then nudges the model toward being helpful, honest, and harmless — qualities too fuzzy to write down as labeled examples.

Stage 1: Pretraining (self-supervised next-token prediction)

The model chews through a colossal pile of text and repeatedly guesses the next token. Because the right answer is simply whatever actually came next, no human grading is needed — that's why it's called self-supervised. The result is a base model: a raw text-completer that has absorbed grammar, facts, and patterns but isn't yet a polite assistant.

A base model fresh out of pretraining has read everything ever written but never had a real conversation — like a hermit genius who needs one quick class on small talk before it's any use.

Stage 2: Supervised fine-tuning (SFT)

Here the base model keeps training, but now on hand-picked examples of good instruction-and-answer pairs. It's the difference between someone who has read every cookbook and someone who's taken a class on actually answering "what should I make for dinner?" SFT teaches how to respond — the helpful format, the tone, the do-what-was-asked behavior. It generally doesn't pour in new knowledge; that came from Stage 1.

Stage 3: RLHF and preference tuning

Some things are hard to capture with example answers — "be honest," "don't be smug," "refuse this politely." So instead of writing perfect answers, humans look at two model replies and pick the better one. A reward model learns to predict those human preferences, and reinforcement learning gently steers the main model toward answers the reward model scores highly. It's coaching: try, get ranked, adjust, repeat.

Anthropic adds a twist called Constitutional AI. Instead of humans labeling every single case, the model critiques and revises its own answers against a written set of principles — a "constitution." Claude is trained this way, which scales the coaching without needing a human to weigh in on every judgment.

The cost reality

The three stages are wildly unequal in price. Pretraining is the budget-buster; the later stages are cheap by comparison but disproportionately decide how the model behaves.

Stage	Teaches	Needs humans?	Cost
Pretraining	Language + world knowledge	No (self-supervised)	Enormous
SFT	Following instructions	Yes (curated pairs)	Much smaller
RLHF / Constitutional AI	Helpful, honest, harmless	Yes (rankings / principles)	Smaller still, but decisive

The full arc in one line: read the entire library (pretraining), take a focused class on answering questions (SFT), then get coached by mentors who rank your answers until you're genuinely helpful (RLHF / Constitutional AI).

🔮

SECTION 36

Inference, Context & Hallucinations

Inference is the model improvising — give it a script or it makes things up.

Mastered

⚡ WHY IT MATTERSKnowing what the model can "see" and how temperature shapes output helps beginners trust answers and spot AI mistakes.

A sliding context window highlights only the tokens it can read on a long scroll, a temperature dial wiggles between focused and creative, and missing context triggers a hallucination warning.

🎬 Animation, explained

The sliding highlight is the context window: the model can only "see" what's inside it — text that scrolls out is genuinely gone from its working memory.
The wiggling dial is temperature: low = careful and repeatable, high = loose and creative. Same model, different gamble per token.
The sparkle over the gap is a hallucination being born: where facts are missing, the model fills the hole with whatever is statistically plausible.
Takeaway: the three failure modes of LLMs — forgot it, gambled on it, invented it — are all visible in this one picture. Paste real context to fight all three.

Training built the brain. Inference is what happens when you actually use it — and understanding this one moment explains why models are so fluent, so fast, and occasionally so confidently, spectacularly wrong.

Inference: using the frozen brain

Remember training — millions of examples nudging billions of dials until the model got good? Inference is the exam, not the studying. The dials are now locked. Every time you chat with a model, you're running that finished, frozen brain forward to generate text. The weights don't change, the skills are fixed in place. It doesn't learn from your conversation, and it won't remember you tomorrow.

The context window: its tiny working memory

The model has no memory of your last chat and no private notebook. All it can "see" is what fits in its context window right now — think of it as the model's short-term working memory, or the desk it's allowed to spread papers on.

Here's the catch beginners always miss: that desk holds everything at once — your prompt, the entire back-and-forth history, the system instructions, and the answer it's currently writing. Anything that doesn't fit on the desk is simply gone. Not "kind of forgotten" — invisible. If a detail scrolled off the top, the model isn't ignoring it; from where it sits, that detail never existed.

This is why a long chat can suddenly "forget" something you said earlier, and why pasting the relevant document into the conversation works so much better than expecting the model to recall it. On the desk = it can use it. Off the desk = it can't.

Temperature: the creativity dial

When the model generates, it's choosing the next word from a list of likely candidates. Temperature (and related sampling settings) decides how adventurous that choice is.

Low temperature

Focused, predictable, repeatable. It picks the safest, most likely word almost every time. Great for code, data extraction, and factual answers.

High temperature

Loose, surprising, creative. It gambles on less-likely words. Great for brainstorming, poetry, and "give me ten wild ideas."

Temperature controls randomness, not correctness. Turning it down makes answers more consistent — it does not make them more true. Temperature zero doesn't make a model honest; it just makes it lean toward the same confident answer every time, right or wrong.

Why models hallucinate

A hallucination is when the model states something false with total confidence — a fake citation, a made-up date, a plausible-sounding "fact" that isn't. Beginners assume this is a bug. Mostly, it's the core mechanism working exactly as designed.

The model doesn't look up facts in a database. It predicts plausible text, one word at a time, based on patterns from training. When it has seen the answer often, the most plausible continuation happens to be correct. When it hasn't, the most plausible continuation is a confident-sounding guess — because in its training data, confident sentences almost never trail off with "...but honestly I'm not sure."

The anchor analogy: the model is an improv actor who never breaks character to say "I don't know." Hand them a script — your source document — and they're sharp and accurate. Leave them to ad-lib from memory and they'll invent details on the spot to keep the scene going. The show must go on.

The fix is shockingly effective

Here's the rule of thumb worth memorizing. When a model answers open-ended questions purely from memory, it can be wrong a meaningful chunk of the time — and you often can't tell which chunk. Hand it a source document to work from — and ask it to answer using that document — and accuracy improves dramatically, because now it's reading instead of guessing.

The takeaway every pro repeats: models are best as editors and synthesizers, not as knowledge sources. Don't ask "what does the contract say?" — paste the contract in and ask "according to this, what does it say?"

Your practical mitigation toolkit

Move	Why it helps
Give it context / sources (RAG)	RAG = retrieval-augmented generation: fetch relevant text and hand it to the model. The script instead of improv — the single biggest accuracy win.
Ask for citations	Forces it to point at where a claim came from, so you can check it (and still verify — citations can be faked too).
Lower the temperature	For factual work, fewer creative gambles means fewer invented flourishes.
Verify anything high-stakes	Medical, legal, financial, or money-moving answers always get a human check.

The knowledge cutoff

Every model was trained up to a certain date and then frozen. On its own, it knows nothing that happened after its knowledge cutoff — last week's news, today's prices, a result from this morning. Ask about recent events and it'll either admit the gap or, worse, confidently improvise. The fix is the same one as above: don't rely on stale memory, drop the fresh information straight into the context window.

Notice the pattern: nearly every fix is "give the model the right context." So what if the model could fetch its own context — search the web, read a file, run a calculator — and act on it in a loop, instead of waiting for you to paste everything in? Give it tools and a loop, and you've built an agent. That's next.

🔁

SECTION 37

Agents, Tools & the AI Loop

An agent is an LLM in a loop with hands: it acts, checks its work, and tries again.

Mastered

⚡ WHY IT MATTERSValidated the SVG against all six checks and fixed two real issues plus hardened compatibility.

A four-step agent loop (Perceive, Think, Act, Observe) with a glowing token circling clockwise and a side rack of tools the Act step calls.

🎬 Animation, explained

The glowing token circles the four stations of the agent loop: Perceive the situation, Think about what to do, Act with a tool, Observe what happened.
The tool rack on the side is what the Act step reaches for — search, code, files, APIs. Without tools, the loop degenerates back into just talking.
Notice the loop doesn't exit after one lap — the Observe result feeds the next Perceive, so mistakes get noticed and corrected.
Takeaway: an agent is not a smarter model; it's the same model given hands and a feedback loop. This is exactly how Claude Code works.

A chatbot answers and stops. An agent reads a goal, takes a real action, looks at what happened, and keeps going until the job is done. That loop is the whole difference.

The leap: from chatting to doing

So far we've treated an LLM as something you talk to: prompt in, text out, done. An agent takes that same model and drops it into a loop where it can take actions in the world — search the web, run code, edit a file — and then react to the results. One single model call is not an agent. The loop is what earns the name.

The difference between a chatbot and an agent is the difference between someone who gives you directions and someone who actually grabs the keys and drives.

The core loop, one trip around

Every agent runs the same four-beat cycle, over and over, until the goal is met:

Perceive — read the goal and the current state ("the tests are failing; here's the error").
Think — plan the next single step ("I should open the file the error points to").
Act — call a tool or run code (open the file, run the test, search the docs).
Observe — read the result, then loop back to step 1 with that new information.

The anchor: a chef tasting the sauce

Picture a cook who never tastes anything — they just follow a fixed recipe and pray. Now picture a real chef: taste the sauce (observe), decide it needs salt (think), add a pinch (act), taste again (loop). That feedback cycle — not one frozen recipe — is exactly what makes an agent an agent.

What's a "tool," really?

A tool is just a function the model is allowed to call. Each one comes with three things: a name, a plain-English description of what it does, and a typed input schema (what arguments it expects). The model reads those descriptions and picks the right tool for the moment. Without tools, an LLM can only produce text — it can say "I'll search for that," but it can't actually do it.

Web search

Fetch fresh, real information the model never memorized.

Run code

Execute a script and read the actual output — math it can verify, not guess.

Read / write files

Open a file, edit it, save it. This is how Claude Code touches your project.

Why the loop is the entire point

Feeding each result back into the next step lets the model check its own work and self-correct. Wrote a function? Run the test. Test failed? Read the error, fix the line, run it again. That tight perceive-act-observe rhythm is exactly how Claude Code operates — it doesn't just suggest a change, it makes the change, runs your tests, and reacts to what breaks.

Memory and a leash

Two practical pieces keep the loop sane. First, working memory: every tool result gets added to the conversation, so the model carries what it learned into the next step. Second, a budget — a maximum number of iterations — so it can't spin forever chasing a goal it can't reach.

Remember the hallucination problem from earlier? Giving an agent a retrieval tool is one of the cleanest fixes. Instead of recalling a fact from fuzzy memory, the agent fetches the relevant document and reads from it — this is the idea behind RAG (retrieval-augmented generation). The mantra: an LLM is a great editor but a shaky encyclopedia, so paste in the source rather than trusting it to remember.

Where loops go wrong

Failure mode	What it looks like	The fix
Infinite loop	Tries the same thing forever	A max-iteration budget and clear stop conditions
Wrong tool	Picks the search tool when it needed code	Clear tool names and descriptions
Hallucinated result	"Pretends" a tool ran and invents output	Human-in-the-loop checkpoints on risky steps

An agent acting on the world is powerful — and that power cuts both ways. The same hands that fix a file can delete one. Up next: the real limits and risks, and how to keep an agent safe.

⚖️

SECTION 38

AI Safety, Ethics & Limits

AI is a brilliant intern, not an oracle — know the limits and you become a power user.

Mastered

Here's the good news and the catch in one breath: AI is genuinely useful and genuinely limited. Knowing exactly where it falls short is what turns you into a power user instead of a worrier.

Picture a brilliant, fast new intern. Incredibly capable, worth handing real work to — but you still review the important stuff, you don't give them the company passwords, and you don't let them sign the contract alone. That's the exact posture to take with AI. Trust, but verify.

The real limits (none of these are dealbreakers — just things to know)

It hallucinates

An LLM (large language model — the kind of AI behind chatbots) predicts plausible text, not true text. With no fact to recall, it'll invent one that sounds right. It can cite a study that doesn't exist with a completely straight face — treat it like that friend who's amazing at parties but shouldn't be your only source.

It's biased

The model reflects its training data — the huge pile of text it learned from. If that text leans a certain way on a topic, the answer might too. Bias isn't a glitch bolted on; it's baked in by what it read.

No real understanding

There's no consciousness, no feelings, no "knowing" behind the words. It's a stunningly good pattern machine, not a tiny mind in a box.

Knowledge cutoff

It only learned up to a certain date, so it has no built-in memory of recent events. Ask about last week's news and it may go stale, guess, or need a live search tool to catch up.

The dangerous combo isn't that AI is sometimes wrong — it's that it's wrong confidently, in fluent, professional-sounding prose. Smooth delivery is not evidence of accuracy.

The ethics surface, briefly

Concern	What it means in plain terms
Bias & fairness	Outputs can quietly favor or harm some groups — a real risk in hiring, lending, or medical contexts.
Privacy	Depending on the tool and its settings, what you type may be processed, logged, or used to improve models. Personal and confidential data deserve caution.
Copyright & provenance	Models trained on others' work raise fair questions about credit, ownership, and consent. ("Provenance" just means where the data came from.)
Jobs	Some tasks get automated; many roles shift rather than vanish. Disruption is real and uneven.
Misinformation & deepfakes	The same tech that drafts your email can mass-produce convincing fakes. Healthy skepticism scales.

Alignment: making AI do what we actually want

Alignment is the engineering effort to make AI behave the way we genuinely intend — not just the literal words we typed. The shorthand is HHH: Helpful, Honest, and Harmless. Sounds simple. The hard part is all three at once — a model that refuses everything is harmless but useless; one that answers anything is helpful but unsafe. Balancing them is the open challenge.

Your practical user rulebook

Keep a human in the loop. AI drafts and suggests; you decide and approve. You're the editor, not the audience.
Verify high-stakes output. Medical, legal, financial, or anything you'd put your name on? Double-check it against a real source.
Never paste secrets. No passwords, API keys, client data, or anything you wouldn't hand to a stranger.
Cite and check sources. Ask for references — then actually confirm they exist and say what the AI claims.

Best practice from the pros: AI shines as an editor and synthesizer, not a memory bank. Paste in the source document and ask it to work from that — error rates plummet when it isn't recalling from thin air.

The calm takeaway

None of this is doom. These are solvable, ongoing engineering problems, and a lot of smart people are actively working on every one of them. The right stance is caution, not fear — the same everyday judgment you'd use with any powerful new tool.

You've now met every big idea in AI — from tokens and training to safety and alignment. Up next: a quick glossary to lock it all in.

📖

SECTION 39

AI Jargon Buster

A pocket phrasebook for every AI term you met on the tour — point and remember.

Mastered

You just toured the whole country of AI. This is the pocket card of key phrases to point at when one slips your mind — no memorizing required, just a quick "oh right, that's what that means." Each definition deliberately echoes the analogy from its home section, so it should feel like running into an old friend.

How to use this page

Read top to bottom and the terms build on each other; or just Ctrl+F for the one word that's bugging you. Definitions are one plain line each — zero jargon hiding inside the jargon.

Foundations

AI AGI ML Deep Learning Neural Network Parameter / Weight

Language plumbing

Token Embedding Vector Context Window

Architecture

Transformer Attention LLM

Training

Pretraining Fine-tuning RLHF

Using it

Inference Temperature Hallucination Prompt RAG

Frontier

Agent Multimodal Open vs Closed weights

The glossary

Term	One-line plain definition
AI	Any machine doing something that used to need a human brain.
AGI	The hypothetical "can do anything a person can" AI — not here yet.
ML	Machine Learning: software that learns patterns from examples instead of being hand-coded.
Deep Learning	ML using many stacked layers — the assembly line that powers today's AI.
Neural Network	A web of tiny calculators (loosely brain-inspired) that turns inputs into answers.
Parameter / Weight	The faders on a mixing board — internal numbers the model tunes while learning (weights are the main kind).
Token	The LEGO bricks of text — words and word-chunks the model reads in pieces.
Embedding	GPS coordinates for meaning — numbers that place a word on a map where similar ideas sit close.
Vector	Just the list of numbers itself — the actual coordinates an embedding lives at.
Context Window	The model's short-term memory — how much it can hold in mind at once.
Transformer	The engine design behind modern AI — it reads all the words at once instead of one by one.
Attention	Everyone in the room sharing notes — each word decides which others to listen to.
LLM	Large Language Model: phone autocomplete scaled up a trillion-fold.
Pretraining	Reading a huge pile of text to learn how language works — the raw study phase.
Fine-tuning	Extra coaching on a focused subject to specialize an already-trained model.
RLHF	Teaching the model manners with human thumbs-up / thumbs-down so it answers helpfully.
Inference	Taking the exam — actually using the trained model to get an answer.
Temperature	The creativity dial — low for careful and predictable, high for loose and surprising.
Hallucination	Confidently making something up — plausible-sounding text with no truth check.
Prompt	What you type in — your instructions and question to the model.
RAG	Letting the model peek at your documents first, so it answers from facts, not just memory.
Agent	An LLM given hands — tools like search, code, and memory so it can act, not just talk.
Multimodal	A model that also handles images, audio, or video — not text alone.
Open vs Closed weights	Open = you can download and run the model yourself; closed = you rent it through someone's API.

Stuck on a term mid-track? Two big takeaways carry most of the weight: an LLM predicts likely next text (so paste in real context to keep it honest), and temperature tunes how adventurous it gets.

If you only remember three

Token = text in pieces. Embedding = meaning as coordinates. Inference = using the trained model. Almost everything else hangs off these.

Why echoes, not new words

Each line reuses the picture from its home section on purpose — recognition beats re-learning every single time.

Recognize, don't recite

You don't need to define these cold. Spotting them in the wild and nodding "ah, I know that one" is the whole goal.

You now speak AI. Twenty-four terms ago "token" meant an arcade coin and "temperature" meant a thermostat. Tuck this card in your pocket and go enjoy the country you just toured.

🎓

FINAL EXAM

The Claude Mastery Exam

20 questions across every track — find out where you really stand

Mastered

Retrieval beats rereading — the single most reliable way to make knowledge stick is to test yourself on it. This exam pulls one question from each major topic. Answer all 20, then submit for a score and a rank.

Nothing here is sent anywhere — grading runs entirely in your browser and your best score is saved locally. Miss a few? The results tell you exactly which sections to revisit.