Living guide · Claude Code • App • API • MCP
🤖

Claude Mastery Hub

 

37deep-dive sections
5learning tracks
100+real examples
power unlocked
Start learning → Jump to cheat sheet
📘 An independent, hands-on reference for getting the most out of Claude. Commands and features evolve — always check the official Anthropic docs for the latest. Your progress is saved in this browser only.
🚀
SECTION 01

Why Master Claude

One model family, three surfaces, infinite leverage

⚡ WHY IT MATTERSLearn one mental model and it transfers everywhere — chat, terminal, and your own apps all run on the same Claude brains.
Claude modelsone shared brainOpusSonnetHaikuFable 5Claude.ai appchat · Projects · ArtifactsClaude Codeterminal + IDE agentAPI & Agent SDKbuild your ownsame models power every surface — only the interface differs
A central Claude models core (Fable 5 / Opus / Sonnet / Haiku) powering three surfaces that fan out: the Claude.ai app, Claude Code, and the API & Agent SDK.

Claude is one family of models you reach three ways — a chat app, a coding agent, and an API — and mastery is just knowing which one to grab.

Most people only ever open the chat box and stop there. The real wins come from picking the right surface for the job, then giving Claude the context it needs to nail it on the first try.

This hub builds you up in four layers: Foundations Core Power Features Mastery. Start here, then go in order or jump to what you need.

The three surfaces

Each surface runs the same underlying models (Opus, Sonnet, Haiku) but is tuned for a different kind of work.

Web & Desktop App

claude.ai and the desktop app. Best for chat, research, writing, and uploading files. Has Projects, Artifacts, and web search.

Claude Code

An agent in your terminal or IDE. Best for real coding — it reads, edits, and runs files in your repo, runs tests, and makes commits.

API

Build Claude into your own apps and scripts. Best for automation, pipelines, and products. Full control over model, prompts, and caching.

SurfaceReach it viaReach for it when…
Web / Desktopclaude.ai or the appYou want to think, research, write, or analyze files in a chat.
Claude Codeclaude in a terminalYou want code changed, tests run, or a repo refactored.
APIAnthropic SDK / HTTPYou want Claude inside your own software, on a schedule or at scale.

The 60-second "you can do X" tour

A taste of what each surface unlocks. The rest of the hub goes deep on every one of these.

In the web app

Drop in a messy spreadsheet and get back a clean chart, or have Claude build a working mini-app you can share.

Analyze a file

Upload a CSV or PDF and ask “what are the three biggest trends here?” Claude reads it and answers with citations.

Make an Artifact

“Build a tip calculator as an app.” Claude renders a live, usable tool in a side window you can keep and share.

Use a Project

Upload your style guide once; every chat in that Project follows it automatically.

In Claude Code

Ask in plain English and Claude edits your actual files. Start interactive, or run it headless in a script.

# Interactive: open the agent in your repo
claude

# One-shot: ask a question and exit (headless)
claude -p "Find and fix the off-by-one bug in pagination.py"

# Pipe a file in and let Claude explain it
cat error.log | claude -p "What caused this crash?"
Press Shift+Tab to enter plan mode. Claude can only read and think — not edit — so you approve the plan before any code changes. It is the single highest-leverage habit in Claude Code.

With the API

A few lines wires Claude into anything you build. Pick a model by name and send a message.

from anthropic import Anthropic

client = Anthropic()
msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this ticket in one line."}],
)
print(msg.content[0].text)

Pick the right model

All three surfaces let you choose a model. The rule of thumb: start cheap and fast, move up only when you hit a wall.

ModelBest forSpeedPrice (in / out per MTok)
Haiku 4.5Simple, high-volume, real-time tasksFastest$1 / $5
Sonnet 4.6Most production work — coding, analysis, contentFast$3 / $15
Opus 4.8Hardest reasoning, big refactors, long agentic runsModerate$5 / $25
In Claude Code, switch any time with /model sonnet, /model opus, or /model haiku. /model default picks a smart default and may auto-route Opus for harder turns.

The mindset that makes Claude great

The difference between a frustrating session and a magical one is rarely the model — it is how you talk to it. Four habits carry almost all the weight.

Be specific

Name files, state constraints, point to an example. “Claude can infer intent, but it can’t read your mind.” Precise instructions mean fewer corrections.

Give context

Share the goal, the relevant code, and what “done” looks like. Good context up front beats ten rounds of clarification.

Iterate

Treat the first answer as a draft. Steer, refine, and rewind. If two corrections fail, start fresh rather than piling on.

Delegate

Hand off broad research or a self-contained task to a subagent. It works in its own context and returns just the summary you need.

One thing governs everything: “Claude’s context window fills up fast, and performance degrades as it fills.” Keep each session focused on one job — fresh context is free, and a clean start beats a cluttered one.
Best workflow in three moves: explore → plan → execute. Let Claude read the code, agree on a plan in plan mode, then turn it loose. One clear scope per session keeps quality high and cost low.

How this hub is organized

  1. Foundations — the surfaces, models, and core mindset (you are here).
  2. Core — everyday moves: prompting well, Projects, slash commands, managing context.
  3. Power Features — subagents, skills, MCP servers, hooks, and plan mode.
  4. Mastery — automation, plugins, prompt caching, and multi-agent orchestration.

Work through it in order for a clean ramp, or jump straight to the layer that matches what you need today.

🧠
SECTION 02

Pick the Right Brain

Match the model to the job — pay for power only when it earns its keep

⚡ WHY IT MATTERSRouting each task to the cheapest model that can do it well turns cost and speed into a design choice, not an accident.
Right model for the jobTaskincoming workRoute bycomplexity & costdecisionHaikufast, cheaphigh-volume tasksSonneteveryday codingbest workhorseOpusdeepest reasoningarchitecture / hardFable 5most capablehardest tasks · $10/$50cheap & fast → most capable & costly: pick the smallest model that clears the bar
A task flows through a complexity-and-cost router that sends it down one of four lanes: Haiku for fast cheap high-volume work, Sonnet for everyday coding, Opus for deep reasoning, or Fable 5 for the hardest, most capable work.

Four Claude tiers, one rule: cheap for simple, balanced for daily work, deepest reasoning when it pays off — and Fable 5 when the job is genuinely hard.

In 2026 the lineup is Opus 4.8 for hard reasoning, Sonnet 4.6 as the everyday coding workhorse, and Haiku 4.5 for fast, high-volume work. Picking right saves money and time without hurting quality. The official advice is blunt: use Haiku for simple tasks, Sonnet for most production workloads, Opus only for the most complex reasoning.

🆕 New flagship (June 9, 2026): Claude Fable 5 (claude-fable-5) is now the most capable model — state-of-the-art on nearly every benchmark — at $10 / $50 per MTok. Use it for the hardest work; keep Haiku/Sonnet for everyday tasks. See What's New →

Meet the lineup

Fable 5

The new top tier (June 2026). Best-in-class coding, analysis, vision, and science; runs autonomously across millions of tokens. Premium price — reach for it only on genuinely hard tasks.

Opus 4.8

Anthropic's most capable model. Built for long-horizon agentic coding, large-scale refactors, and complex systems engineering. Moderate speed, premium price.

Sonnet 4.6

The best balance of speed and intelligence. Your default for code generation, data analysis, and agentic tool use. Fast and mid-priced.

Haiku 4.5

The fastest model with near-frontier intelligence. Great for real-time apps, bulk processing, and sub-agent tasks. Cheapest by far.

Side-by-side comparison

ModelAPI idSpeedPrice (in / out per MTok)ContextMax outputThinking
Opus 4.8claude-opus-4-8Moderate$5 / $251M128kAdaptive
Sonnet 4.6claude-sonnet-4-6Fast$3 / $151M64kExtended
Haiku 4.5claude-haiku-4-5Fastest$1 / $5200k64kExtended
Opus uses adaptive thinking (no manual toggle). Sonnet and Haiku support the standard extended-thinking toggle and thinking keywords like think hard and ultrathink.

The routing cheat sheet

When in doubt, follow this. Start cheap and upgrade only when you hit a real capability gap.

Your taskUseWhy
Bulk edits, log lines, formatting, simple Q&A, sub-agentsHaikuFastest and 5x cheaper than Opus
Main development: code gen, data analysis, tool useSonnetBest speed/intelligence balance for production
Architecture, multi-file debugging, big refactors, deep researchOpusDeepest reasoning when a wrong answer is costly
Cheap tasks → Haiku. Main dev → Sonnet. Hard reasoning → Opus. The docs even suggest starting with Haiku 4.5 and upgrading only if quality slips.

How to switch models

In Claude Code

Use /model inside a session. Run it bare to open a picker, or name a model to switch and save it as your default.

/model              # open interactive picker
/model opus         # switch to Opus and save as default
/model sonnet       # switch to Sonnet
/model haiku        # switch to Haiku
/model default      # recommended default; may auto-route Opus for hard tasks

Press s in the picker to switch for the current session only without changing your saved default.

At launch from the CLI

The --model flag sets the session model and overrides both the model setting and the ANTHROPIC_MODEL env var. Add --fallback-model for automatic failover.

claude --model sonnet
claude --model claude-opus-4-8 --fallback-model sonnet
claude -p "summarize these logs" --model haiku   # headless, cheap

In the Claude.ai app

Use the model picker near the chat input to choose Opus, Sonnet, or Haiku per conversation. Switching is per-chat and instant.

Big context windows (1M tokens)

Opus 4.6/4.7/4.8 and Sonnet 4.6 carry a 1M-token context window; Haiku 4.5 has 200k. Best of all, there is no surcharge tier — a 900k-token request costs the same per token as a 9k one, and caching plus batch discounts apply across the full window.

A huge window is not a license to dump everything in. Performance still degrades as context fills, so read only the file ranges you need and keep prompts tight even on 1M-token models.

Squeeze more value per dollar

Two built-in levers cut cost hard. Prompt caching is automatic in Claude Code — cache reads are 90% cheaper than fresh input. The Batch API gives a flat 50% off when latency does not matter.

ModelCache hit (read)Batch (in / out)
Opus 4.8$0.50 / MTok~$2.50 / $12.50
Sonnet 4.6$0.30 / MTok$1.50 / $7.50
Haiku 4.5$0.10 / MTok$0.50 / $2.50
In Claude Code, front-load context in plan mode so it caches once. Avoid switching models mid-session — a model swap invalidates the cache and forces a slow, expensive uncached turn.

Route per sub-agent, not just per session

You don't have to pick one model for everything. Give each sub-agent its own model so cheap workers stay on Haiku while the main thread reasons on Opus or Sonnet.

# .claude/agents/researcher.md frontmatter
---
name: researcher
description: Searches the codebase, use proactively
model: haiku
tools: Read, Grep, Glob
---

The model field accepts an alias (sonnet, opus, haiku), a full id, or inherit (the default). For one-off overrides, set the CLAUDE_CODE_SUBAGENT_MODEL env var, which wins over frontmatter.

Heads-up on dates

Knowledge cutoffs differ: Opus 4.8 is reliable through Jan 2026, Sonnet 4.6 through Aug 2025, Haiku 4.5 through Feb 2025. For very recent facts, prefer Opus or enable web search. Also note: Opus 4 and Sonnet 4 (the 20250514 ids) retire Jun 15 2026 — migrate to Opus 4.8 and Sonnet 4.6.
⚙️
SECTION 03

Install & First Run

From zero to your first prompt in five minutes

Claude Code is a terminal tool that turns your codebase into a workspace you can talk to.

This section gets you installed, signed in, and running your first session safely. You only do the install and login once; after that you just type claude in any project.

Step 1 — Install the CLI

Claude Code runs from your terminal. The fastest way to install it is the official script, which works on macOS and Linux.

# macOS / Linux
curl -fsSL https://claude.com/install.sh | bash

# Or via npm (any OS with Node 18+)
npm install -g @anthropic-ai/claude-code
Confirm it worked with claude --version. If the command is not found, restart your terminal so it picks up the new PATH.

Step 2 — Sign in

The first time you launch Claude Code it walks you through login in your browser. You can sign in with a Claude subscription (Pro or Max) or an Anthropic Console API account.

# Start Claude, then follow the browser prompt
claude

# Or trigger login directly any time
/login
Stuck? Run /doctor to diagnose install and auth problems, and /status to see which account you are signed in as.

Step 3 — Run Claude in a project

Always cd into a real project folder before starting. Claude reads files relative to where you launch it, so the folder you start in becomes its working directory.

  1. Open a terminal in your project root.
  2. Type claude and press Enter.
  3. Type a question in plain English and press Enter.
cd ~/Projects/my-app
claude

Three ways to start

Interactive

claude opens the live REPL where you chat back and forth. This is the normal way to work.

With a first prompt

claude "explain this repo" opens the REPL but starts working on that prompt right away.

One-shot (headless)

claude -p "list the API routes" prints one answer and exits. Great for scripts.

# Pipe a file in and get a quick answer back
cat error.log | claude -p "explain this stack trace"

The interactive REPL

The REPL is the prompt box where you type. Anything that does not start with / is a message to Claude; anything starting with / is a built-in command. Try these first.

Type thisWhat it does
/helpLists every slash command and basic usage.
/statusShows your account, model, and session info.
/clearWipes the conversation to free up context.
/costShows token usage for the session.
Press Esc to stop Claude mid-action without losing your place. This is your panic button when it heads the wrong way.

Step 4 — Generate CLAUDE.md with /init

Run /init once per project. It scans your repo and writes a CLAUDE.md file describing the structure, build and test commands, and conventions. Claude reads this file at the start of every future session, so it remembers how your project works.

/init
After /init, open CLAUDE.md and trim it. For each line ask: would removing this make Claude make a mistake? A short, sharp file beats a bloated one that gets ignored.

You can edit memory any time with /memory, or append a quick note by starting a line with # in the prompt box.

Trust and permissions (read this)

Claude Code asks before it edits files or runs shell commands. The first time you open a new folder it also asks you to confirm you trust that directory. This is your safety net, so do not blindly approve everything.

Be extra careful approving commands that delete files, push to git, or hit the network. When a prompt appears, read the actual command before choosing allow.

Permission modes

Press Shift+Tab to cycle modes during a session, or start in one with a flag. plan mode is the safest place to begin: Claude can read and think, but cannot change anything.

ModeBehavior
defaultAsks before edits and commands.
planRead-only. Explores and proposes a plan, makes no changes.
acceptEditsAuto-approves file writes; still asks for risky shell commands.
# Start a session in plan mode
claude --permission-mode plan
Avoid --dangerously-skip-permissions while learning. It skips every prompt, including destructive ones. Use it only in throwaway sandboxes you do not care about.

To see or change which tools are auto-allowed, run /permissions. Approved rules are saved so you are not asked the same thing twice.

How to exit

When you are done, leave the REPL cleanly. Your conversation is saved so you can pick it back up later.

# Inside the REPL, any of these exit:
/exit
# or press Ctrl+D
# or press Ctrl+C twice
Next session, run claude --continue (or -c) to resume the most recent conversation in that folder, or claude --resume to pick from a list.

Your first five minutes, recap

  1. Install with the script, verify with claude --version.
  2. cd into a project and run claude, then /login if prompted.
  3. Run /init to create CLAUDE.md, then prune it.
  4. Start in Shift+Tab plan mode and read each permission prompt.
  5. Exit with /exit; resume later with claude -c.
⌨️
SECTION 04

Terminal Commands & Flags

Drive Claude from your terminal: interactive, headless, and scriptable

The claude command is two tools in one: a live coding partner and a scriptable engine you can pipe, loop, and bake into CI.

Run it plain for a chat session, or add -p to get one answer and exit. Everything in between is flags that tell Claude which model, which session, and how much it's allowed to touch.

Three ways to start

Interactive REPL

Type claude alone for a full session, or claude "query" to open with a first prompt already sent.

Headless

claude -p "query" prints the answer and exits. The -p (alias --print) flag is the gateway to all scripting.

Piped input

Feed any file or command output to Claude over stdin, then ask a question about it.

# Interactive, with an opening prompt
claude "Walk me through the auth module"

# Headless one-shot: print and exit
claude -p "List the top 3 risks in this repo"

# Pipe content in, ask about it
cat error.log | claude -p "Explain the root cause in 2 sentences"
All other flags work alongside -p — including --continue, --resume, --model, --permission-mode, and --output-format.

Resuming work: continue vs resume

When: you closed the terminal but want to pick up where you left off. --continue grabs the latest; --resume lets you choose.

# Reopen the most recent conversation in this directory
claude --continue          # alias: claude -c

# Scripted continuation (combine with -p)
claude -c -p "Now check for type errors"

# Pick a specific past session by name or ID
claude --resume auth-refactor       # alias: claude -r
claude -r abc123 "Pick up the review"

# No argument = interactive picker of past sessions
claude --resume
Name a session up front with --name / -n so it's easy to find later: claude -n auth-refactor. As of v2.1.144, background sessions show in the picker tagged bg.

Picking a model

Why: match the model to the job — Haiku for speed, Sonnet for most work, Opus for the hardest reasoning. --model overrides both the model setting and the ANTHROPIC_MODEL env var.

# Use an alias
claude --model opus
claude --model sonnet

# Or a full model id
claude --model claude-sonnet-4-6

# Add a fallback if the primary is unavailable
claude --model opus --fallback-model sonnet

Permission modes: how much can it touch?

When: you want Claude to plan without editing, auto-accept edits, or lock things down for automation. --permission-mode overrides the defaultMode from your settings.

# Read, search, and think only — no edits, no destructive commands
claude --permission-mode plan

# Auto-approve file writes + mkdir/touch/mv/cp
claude --permission-mode acceptEdits

# Locked-down automation: deny anything not in your allow rules
claude -p --permission-mode dontAsk "Run the test suite"
Accepted values: default, acceptEdits, plan, auto, dontAsk, bypassPermissions. Reach for --allowedTools to auto-approve only specific tools instead of opening everything.

Adding extra working directories

Why: let Claude read and edit files outside the current folder (a sibling app, a shared lib). Each path is validated as an existing directory.

claude --add-dir ../apps ../lib
--add-dir grants file access but most .claude/ config is NOT discovered from added dirs. To persist them across sessions, set permissions.additionalDirectories in your settings file. In-session, /add-dir <path> does the same.

One-shot scripting that survives turns

How: use --output-format json to capture a session_id, then feed it back with --resume. This is the backbone of multi-step automation.

# Capture the session id from a headless run
session_id=$(claude -p "Start a code review" \
  --output-format json | jq -r '.session_id')

# Continue that exact session in a later step
claude -p "Now summarize the findings" --resume "$session_id"
For reproducible CI runs, add --bare. It skips auto-discovery of hooks, skills, plugins, MCP servers, and CLAUDE.md, giving Claude only Bash + file read/edit. Load context explicitly via --append-system-prompt, --settings, --mcp-config, or --agents. It's recommended for scripted/SDK calls and is slated to become the default for -p.

The flags that matter most

FlagAliasWhat it does
--print-pHeadless: run, print the answer, exit. Enables piping and scripting.
--continue-cResume the most recent conversation in this directory.
--resume-rResume a session by ID/name, or open a picker with no argument.
--modelSet the model by alias (sonnet, opus) or full id.
--fallback-modelModel to use if the primary is unavailable.
--permission-modeStart in plan, acceptEdits, dontAsk, etc.
--add-dirAdd extra working directories for read/edit.
--output-formatjson exposes the session_id for chaining steps.
--name-nLabel a session so you can --resume it by name.
--allowedToolsAuto-approve specific tools without prompting.
--bareSkip auto-discovery for fast, reproducible scripted runs.
--fork-sessionResume but write to a new session ID instead of reusing it.

Two flags to handle with care

# Skip ALL permission prompts (use with caution)
claude --dangerously-skip-permissions

# Safer: start in plan mode, but make bypass available later
# via the Shift+Tab mode cycle
claude --permission-mode plan --allow-dangerously-skip-permissions
--dangerously-skip-permissions removes every guardrail. The --allow-dangerously-skip-permissions variant only adds bypassPermissions to the Shift+Tab cycle — you still start in a safe mode and switch deliberately.

Monitoring parallel agents

claude agents opens a view to monitor and dispatch background sessions; claude agents --json prints live sessions as a JSON array for scripting. It shares --model, --permission-mode, --add-dir, and --mcp-config with the top-level command.

Mental model: -p for "answer and leave," -c for "keep going," --resume for "go back to that one," and --output-format json + --resume for everything stitched together in a script.
SECTION 05

Slash Commands

Type / to drive the REPL — and build your own commands

Slash commands are the steering wheel of the Claude Code REPL: type / and a menu of built-in controls and your own custom commands appears.

Built-in commands manage your session, context, models, and tools. Custom commands are just Markdown files you drop in a folder. Master both and you stop fighting the tool and start flying it.

Run /help any time to list every available command, including ones this guide does not cover. Press / at the prompt to fuzzy-search them inline.

Context control: /clear vs /compact

This is the single most important pair. Claude's context window fills fast, and quality drops as it fills. These two commands are how you fight back.

/clear

Wipes the whole conversation and frees all context. Fresh context is free. Use it between unrelated tasks.

/compact

Replaces history with a structured summary, keeping key facts. Use mid-task when you still need what came before.

SituationUse
Finished a task, starting a new unrelated one/clear
Mid-feature, context near full, still need the thread/compact
Stuck after two failed fixes on the same bug/clear, then re-prompt

/compact takes optional focus instructions so the summary keeps what matters.

/compact focus on the API changes and the failing test
Auto-compaction fires automatically near ~95% capacity. Don't wait for it — compact proactively at logical breakpoints (50-75% is a good range) so you control what survives.
Both /compact and switching models invalidate the prompt cache, forcing a slow, expensive uncached turn. Plan when you trigger them.

Pick the right model: /model

Switch the active model without restarting. Run it bare to open a picker, or pass an alias.

/model opus      # deepest reasoning, hard refactors
/model sonnet    # best all-round speed + intelligence
/model haiku     # fastest, cheap, simple tasks
/model default   # recommended default; may auto-route Opus on hard work
A rule of thumb from the docs: start on Haiku 4.5, move to Sonnet 4.6 for most production work, and reach for Opus 4.8 only for long-horizon agentic coding or complex systems work.

Session lifecycle: /resume and friends

Your conversations are saved. Pick one back up instead of starting cold.

/resume          # lists past conversations; pick one to continue
/resume auth-fix # resume a session you named earlier

From the shell, the same idea works via flags: claude --continue (alias -c) reopens the most recent conversation, and claude --resume <id-or-name> targets a specific one.

# Scriptable continuation
claude -c -p "Check for type errors"

Inspect spend and tools

/cost

Shows token usage and estimated cost for the session. On Pro/Max plans it reflects usage, not billed dollars.

/mcp

Lists configured MCP servers, connection status, and tools — and is where you authenticate (OAuth) to remote servers.

Project memory: /init, /memory, /config

These shape how Claude understands your repo and behaves across sessions.

CommandWhat it does
/initScans the repo and generates a starter CLAUDE.md with structure, build/test commands, and conventions.
/memoryOpens your CLAUDE.md memory files in your editor for direct editing.
/configOpens the settings interface for theme and toggles; values persist in settings.json.

Memory files load hierarchically: enterprise, then user (~/.claude/CLAUDE.md), then project (./CLAUDE.md), then local. Type # at the start of a prompt to quickly append a memory line.

Prune CLAUDE.md ruthlessly. For each line ask: would removing this cause Claude to make a mistake? A bloated memory file gets ignored because rules get lost in the noise.

Quality and editing helpers

/review

Requests a code review. /review <PR-number> reviews a specific pull request. /security-review focuses on security of current changes.

/agents

Opens an interactive manager to create, edit, and view subagents stored as Markdown in .claude/agents/.

/vim

Enables Vim-style modal editing for the input prompt — NORMAL/INSERT modes with h j k l and common motions.

Custom slash commands

Here is where power users pull ahead. A custom command is just a Markdown file. The filename (minus .md) becomes the command name. The body is the prompt Claude runs.

LocationScopeExample file
.claude/commands/Project (shared via git)optimize.md/optimize
~/.claude/commands/User (all your projects)standup.md/standup

Pass arguments with $ARGUMENTS (all args) or $1, $2 for positional ones. Here is .claude/commands/fix-issue.md:

---
description: Fix a GitHub issue by number
argument-hint: <issue-number>
model: sonnet
allowed-tools: Bash(gh issue view:*), Read, Edit
---
Fix issue #$1. First read it:

!gh issue view $1

Then locate the bug, apply a fix, and explain your change.

Invoke it with the argument inline:

/fix-issue 482
Inside a command body: lines starting with ! run bash and inject the output (requires allowed-tools), and @path/to/file injects that file's contents into the prompt.

Frontmatter fields you can set: description, argument-hint, model, and allowed-tools to restrict which tools the command may use.

Namespacing

Subdirectories namespace commands for organization. A file at .claude/commands/frontend/component.md is still invoked as /component but shows a project:frontend label.

.claude/commands/
  optimize.md            -> /optimize        (project)
  frontend/component.md  -> /component        (project:frontend)
~/.claude/commands/
  standup.md             -> /standup          (user)
Custom commands have merged into Skills, so a folder like .claude/skills/deploy-staging/SKILL.md also gives you /deploy-staging. Use a single .md for a one-shot prompt; use a Skill folder when you need bundled scripts and reference files.
SECTION 06

Shortcuts, @files & ! bash

Pull files in, push commands out, and steer without breaking flow

The REPL is a cockpit: @ pulls files into context, ! pushes shell output back, and a handful of keys let you steer mid-task without losing your place.

Three in-prompt prefixes do most of the heavy lifting. @ references a file or directory as context, ! runs a shell command and feeds its output into the chat, and # at the very start of a prompt appends a line to memory. Learn these and you stop copy-pasting forever.

@ — reference files

Type @ and a path to load that file or folder as context. Tab-completes paths.

! — run bash inline

Start a line with ! to run a shell command; its output joins the conversation.

# — quick memory

Start a prompt with # to append a note to a CLAUDE.md memory file on the fly.

@ — reference files and folders as context

The @ prefix tells Claude exactly which files to read instead of making it search. Be specific: pointing at the right files is the single biggest way to cut corrections. It tab-completes, so start typing the path and press Tab.

> Refactor the auth flow in @src/auth/login.ts and keep it consistent with @src/auth/session.ts

> Summarize what changed across @docs/changelog.md

> Use the patterns in @examples/ when you write the new handler
Reference a directory like @src/handlers/ to give Claude the file listing, then let it open only what it needs. Cheaper than dumping every file.
In custom slash commands, @path in the command body injects that file's contents into the prompt — the same syntax works inside .claude/commands/*.md.

! — run a shell command inline

Prefix any line with ! to execute it in your shell and drop the output straight into the chat. When: you want Claude to see real state — a failing test, git status, a directory listing — without describing it yourself. Why: the output becomes context Claude can act on immediately.

> !git status

> !npm test 2>&1 | tail -40

> !ls -la src/ && cat package.json

A common combo: run a command with !, then ask Claude to fix what it sees.

> !pytest -x
> The first failure above is in test_kelly.py. Fix the implementation, not the test.
The ! command runs in your real shell with your permissions. Treat it like a terminal — don't pipe in anything you wouldn't run yourself.

Drag, drop & paste images

Claude Code reads images. Paste a screenshot or drag an image file into the terminal and ask about it — great for UI bugs, error dialogs, diagrams, or design mockups.

ActionHow
Paste from clipboardCtrl+V (use this, not Cmd+V, on macOS)
Drag a file inDrag the image onto the terminal window
Reference a saved image@screenshots/error.png
> Here's the broken layout [paste screenshot]. The sidebar overflows on mobile — fix the CSS in @src/styles/sidebar.css

Keyboard shortcuts that matter

The keys below are the difference between fighting the tool and flowing with it. The two most valuable are Esc (stop without losing context) and Shift+Tab (cycle permission modes).

KeysWhat it does
EscStop Claude mid-action; context is preserved so you can redirect
Esc EscOpen the rewind menu (same as /rewind) to restore an earlier state
Shift+TabCycle permission modes: normal → plan → auto-accept
Ctrl+BBackground a running task so you can keep working
Ctrl+CCancel current input / interrupt
Ctrl+OToggle verbose mode (see thinking output)
Up / DownScroll through your prompt history
Option+T / Alt+TToggle extended thinking on or off

Multi-line input

To write a prompt that spans several lines, use one of these:

MethodKeys
Quick newline\ then Enter (backslash continuation)
Standard newlineOption+Enter (macOS) / Shift+Enter
One-time setupRun /terminal-setup to bind Shift+Enter reliably
Prefer Vim keys? Run /vim to get NORMAL/INSERT modes with h j k l navigation in the prompt editor.

Steering mid-task

You don't have to wait for Claude to finish a bad path. Steering early saves context and time.

  1. Hit Esc the moment it heads the wrong way — it stops with context intact.
  2. Type a correction and submit; Claude resumes with your new direction.
  3. If the whole attempt was wrong, press Esc Esc and rewind — this erases the failed attempt from context entirely.
Rewinding a dead-end beats correcting it. A correction leaves the failed approach in context where it can confuse later turns; a rewind removes it cleanly.
Use /btw to ask a quick side question in a dismissible overlay — the answer doesn't pollute your main conversation history.

Putting it together

A realistic loop using all three prefixes plus a key: gather state with !, scope the fix with @, plan it, then execute.

> !git diff --stat
> Review the changes in @src/api/routes.py against @CLAUDE.md conventions.
>   [Shift+Tab to enter plan mode first, then let it execute]
> !curl -s localhost:8000/health | jq .
Front-load context once (read files in plan mode), then execute — the stable prefix gets cached, so every follow-up turn is fast and cheap.
🖥️
SECTION 07

Claude in Your IDE

Same Claude Code engine, now living inside your editor

The VS Code extension and JetBrains plugin wire Claude Code straight into your editor, so plans, edits, and diffs show up right next to your code.

The IDE integration is not a different product. It drives the exact same Claude Code engine as the terminal, just with a graphical surface for selections, diffs, and diagnostics. You get inline-diff review, automatic selection sharing, and live error feedback without leaving your editor.

Availability and exact menus shift between editor versions. Treat the steps below as the current shape, and confirm specifics in the official docs for your editor.

Two integrations, one engine

VS Code extension

Publisher id anthropic.claude-code. Adds a native graphical panel and bundles the CLI for you. Needs VS Code 1.98.0+. Also installs in forks (Cursor, Windsurf, Kiro, Devin Desktop) and on Open VSX.

JetBrains plugin

Named Claude Code [Beta] (publisher Anthropic PBC). Covers IntelliJ, PyCharm, WebStorm, GoLand, PhpStorm, RubyMine, Rider, CLion, and Android Studio. It shells out to the claude CLI, which you must install and authenticate first.

Install: VS Code

  1. Open the Extensions panel with Cmd+Shift+X (Mac) or Ctrl+Shift+X (Windows/Linux).
  2. Search for Claude Code and click Install (or use the marketplace 'Install for VS Code' link).
  3. Open the panel and sign in. The CLI is bundled, so there is nothing else to set up.

The VS Code extension panel is the integration. You do not run any attach command. Quick-launch or focus it from the editor with Cmd+Esc (Mac) or Ctrl+Esc (Windows/Linux).

On macOS Tahoe and later, the system Game Overlay can steal Cmd+Esc. Clear it under System Settings -> Keyboard -> Keyboard Shortcuts -> Game Controllers.

Install: JetBrains

  1. Install and authenticate the CLI first: run claude once in a terminal and sign in.
  2. In the IDE, go to Settings -> Plugins -> Marketplace and search Claude Code.
  3. Click Install, then fully restart the IDE (a reload is not enough).

JetBrains settings live under Settings -> Tools -> Claude Code [Beta]: the claude command path, the 'command not found' notification toggle, Option+Enter for multi-line prompts (macOS, needs a terminal restart), and auto-updates.

If Esc will not interrupt Claude in JetBrains, go to Settings -> Tools -> Terminal and uncheck 'Move focus to the editor with Escape'.

What the IDE adds

Selection as context

Highlight code and Claude sees it automatically. The prompt footer shows the line count and the transcript logs 'Selected N lines from <file>'.

Side-by-side diff review

Every proposed edit opens in the editor's native diff viewer with accept / reject / redirect. Edit the proposal in the diff and Claude is told you changed it.

Live diagnostics

The integration shares lint and syntax errors with Claude, so it can react to real problems instead of guessing.

@-mention shortcut

One keystroke inserts a file reference with a line range, like @app.ts#5-10, so you point at exact code fast.

File-reference keys

EditorMacWindows/LinuxInserts
VS CodeOption+KAlt+K@app.ts#5-10
JetBrainsCmd+Option+KAlt+Ctrl+K@src/auth.ts#L1-99
A Read deny rule blocks selection sharing for matching files, and an eye-slash toggle hides the current selection from Claude when you need privacy on a snippet.

Connecting the terminal to the editor

Run claude inside the IDE's integrated terminal and it auto-connects: diff viewing and diagnostic sharing turn on by themselves. From an external terminal, attach with one command.

# In an external terminal, after launching claude:
/ide
# attaches the running session to your open editor

Choose where diffs render with the config command. Keep them in the IDE for visual review, or push them back to the terminal.

# Open diffs in the editor's native viewer:
/config diff tool auto

# Or keep diffs in the terminal:
/config diff tool terminal

Permission modes in the editor

In VS Code the mode indicator at the bottom of the prompt box switches behavior. Plan mode opens the plan as a full markdown document you can comment on inline before any edit happens.

ModeBehavior
normalAsks before each action
Plan modeRead-only; describes a plan and waits for approval
auto-acceptApplies edits without asking

Set the default mode in VS Code settings so every new session starts the way you want.

// VS Code settings.json
"claudeCode.initialPermissionMode": "plan"
// values: default | plan | acceptEdits | bypassPermissions

Settings: what lives where

Editor-specific options live in the extension settings, but the rules that matter for safety and tooling are shared with the CLI. The same precedence applies across CLI, VS Code, and JetBrains.

In the editor

VS Code keys like useTerminal (default false), preferredLocation (panel or sidebar), autosave (default true), and useCtrlEnterToSend. Use Shift+Enter for a newline without sending.

Shared

~/.claude/settings.json holds allowed commands, env vars, hooks, and MCP servers for both the extension and the CLI. One source of truth for permissions.

Under the hood: the 'ide' MCP server

When the VS Code extension is active it runs a local MCP server named ide that the CLI connects to automatically. It binds to 127.0.0.1 on a random high port and writes a fresh auth token to a lock file under ~/.claude/ide/ with 0600 permissions.

It hosts about a dozen tools, but only two are exposed to the model: mcp__ide__getDiagnostics and mcp__ide__executeCode. The rest are internal RPC for opening diffs, reading selections, and saving files.

JetBrains goes further by routing through reference-aware IDE operations. It exposes tools like symbol-rename refactoring, run-configuration execution, and quickfix application, so a rename touches every reference instead of doing a blind text replace.

When the IDE flow beats the terminal

Reach for the editor flow when the work is visual or edit-heavy. Stick with the pure terminal for headless runs, CI, and tight keyboard loops.

Use the IDE when

You are reviewing multi-file diffs, refactoring with reference safety, pointing at exact selections, or want live lint and syntax errors fed back automatically.

Use the terminal when

You are scripting claude -p, running in CI, chaining commands, or working over SSH where a graphical diff viewer adds nothing.

OAuth tokens are shared via the OS keychain in JetBrains, so logging out in one place logs you out everywhere. Sign out deliberately.

Quick-start in 60 seconds

  1. Install the extension or plugin and sign in.
  2. Open Claude with Cmd+Esc / Ctrl+Esc.
  3. Highlight the function you want changed so it becomes context.
  4. Ask: 'Add input validation here and run the tests.'
  5. Review the side-by-side diff, tweak it inline if needed, then accept.
Pair this with Plan mode for anything non-trivial: read the plan as a markdown doc, comment inline, then let Claude implement against it while you watch the diffs land.
🤔
SECTION 08

Plan Mode & Extended Thinking

Look before you leap, then think as hard as the problem demands

⚡ WHY IT MATTERSA read-only plan + approval gate before edits turns risky guesswork into reversible, verified progress.
Plan before you buildGather context, propose a plan, gate on approval, then act and verifyloop until doneRequestuser promptResearchread / glob / grepPropose Planno edits yetApprove?gateyesExecute editsedit / write / bashVerifyrun testsReckless: edit-firstno plan, no gateskips the gate, high rework risk
A loop diagram contrasting the safe plan-first flow (Request to Research to Plan, through an Approve gate, to Execute and Verify, looping back) against the reckless edit-first shortcut.

The two cheapest upgrades to your results: make Claude plan before it touches code, and let it think harder only when the problem is actually hard.

Plan mode and extended thinking solve different problems. Plan mode controls what Claude is allowed to do. Extended thinking controls how much it reasons before it acts. Used together, they turn a guess-and-check session into a deliberate one.

Plan Mode: explore, propose, approve, then execute

Plan mode is enforced at the tool level. Claude literally cannot edit files or run destructive commands while in it. It can only read, search, and think, then hand you a written plan to approve.

Toggle it with Shift+Tab, which cycles through normal, plan, and auto-accept modes. You can also start a session already in plan mode from the CLI.

claude --permission-mode plan
The documented workflow for the best coding results is always the same: exploreplanexecute. Plan mode is what enforces the first two steps instead of letting Claude jump straight to editing.

Why it prevents wasted work

When Claude edits first and reasons later, a wrong assumption becomes ten wrong file changes you have to unwind. A plan surfaces that assumption in one paragraph you can correct in one sentence, before any code is written.

There is also a hidden performance win. Front-loading context (reading files, fixing the plan, stating constraints) writes a stable prefix to the prompt cache once. Every later execution turn then reads from that cached prefix, which is the documented cause of very high (90%+) cache-hit rates.

Cache hits cost about 10% of base input price. A stable plan up front is not just safer, it is cheaper per execution turn than constantly changing your mind mid-edit (which invalidates the cache).

How to ask for a plan first

Be specific. Name the files, state the constraints, and tell Claude not to write code yet. Vague requests cause endless exploration that floods your context.

Read src/auth/session.ts and src/auth/tokens.ts.
Propose a plan to add refresh-token rotation.
Do NOT edit anything yet — show me the plan and wait for approval.

For a larger change, ask Claude to write the plan to disk so it survives any context reset or /clear between phases.

Investigate the checkout flow and write a phased plan to PLAN.md.
Keep each phase to one session of work. List files touched per phase.
Skip plan mode for one-sentence-diff tasks: fixing a typo, adding a log line, renaming a variable. The approval round-trip is pure overhead when the change is obvious.

Behind the scenes: the Plan subagent

In plan mode Claude uses a built-in read-only Plan subagent. Like the Explore agent, it skips CLAUDE.md and the git-status snapshot so it stays focused on reading and proposing. You don't configure it; it's part of the mode.

Extended Thinking: spend reasoning where it pays

Extended thinking is on by default with roughly a 31,999-token reasoning budget. You escalate that budget with keywords right inside your prompt. More thinking buys reasoning depth, not magic.

KeywordBudgetUse for
thinkLowestModerately tricky logic, small refactors
think hard / think harderHigherMulti-file changes, non-obvious bugs
ultrathinkMaximumArchitecture, hard multi-file debugging, costly decisions
ultrathink: our BGP failover drops sessions for ~30s on link flap.
Walk through the timers, the convergence path, and propose fixes
ranked by blast radius. Don't change configs yet.

Toggling and capping it

Flip thinking on or off mid-session with Option+T on Mac (Alt+T elsewhere). Cap the budget with an env var, or set a coarse effort level with a slash command.

# Hard cap the thinking budget for a whole session
export MAX_THINKING_TOKENS=10000

# In-session: dial reasoning effort up or down
/effort high      # low | medium | high
/effort auto      # back to default
Thinking depth does NOT improve compliance with output rules. Ultrathink can burn hundreds of reasoning tokens and still ignore an explicit formatting constraint. Reserve it for genuinely complex reasoning, not for forcing rule-following.

A model note worth knowing

The thinking story differs by model. Opus 4.x uses adaptive thinking rather than the standard extended-thinking toggle, while Sonnet 4.6 and Haiku 4.5 support extended thinking. The practical workflow is the same either way; the model decides how to apply the reasoning.

Putting them together

The highest-leverage habit is to combine them: enter plan mode, ask Claude to think hard about the approach, approve the plan, then drop back to normal mode to execute.

  1. Press Shift+Tab until you see plan mode.
  2. Prompt: think hard and propose a plan to refactor the payment retry logic. Read the relevant files first.
  3. Read the plan. Correct any wrong assumption in one line.
  4. Approve, then let Claude execute against the now-cached plan.

Plan mode

Controls permissions. Read-only until you approve. Catches bad assumptions before code is written.

Extended thinking

Controls reasoning depth. Escalate with think / think hard / ultrathink only when the problem earns it.

Together

Plan + think hard = a deliberate, cache-efficient session that wastes far fewer correction rounds.

If Claude fails the same fix twice, don't keep correcting. Use Esc+Esc (or /rewind) to erase the failed attempt from context, then re-plan with sharper constraints. Correcting leaves dead ends polluting context; rewinding removes them.
📌
SECTION 09

CLAUDE.md & Memory

Teach Claude your project once — it remembers every session

⚡ WHY IT MATTERSStack every rule once in the right layer and Claude obeys all of them automatically, with the most specific layer winning conflicts.
Layered context that persistsCLAUDE.md files concatenate — loaded broadest first, so more specific layers take precedence on conflictEnterprise policymanaged CLAUDE.md · IT-deployedUser~/.claude/CLAUDE.md · all projectsProject./CLAUDE.md · team-shared via gitLocalCLAUDE.local.md · untrackedEffective contextconcatenated + reloadsafter compactionClaudemodelload order: broadest first ↓ most specific last (wins)
Four CLAUDE.md scopes (managed policy, user, project, local) concatenate in broadest-to-most-specific order and merge into one effective context fed to Claude.

A good CLAUDE.md is the single highest-leverage upgrade you can make: it loads at the start of every session so Claude stops guessing your conventions.

CLAUDE.md is a plain Markdown file Claude reads automatically before it does anything else. Put your build commands, code style, and gotchas there once, and they apply to every conversation in that repo. It is the difference between Claude inferring your setup and Claude knowing it.

CLAUDE.md is read at the start of every session, and re-injected from disk after a /compact. That means good rules survive context resets for free.

The memory hierarchy

Memory files load in layers: enterprise, user, project, and local. More specific layers take precedence, so a project file beats your personal defaults.

LayerLocationScope
Enterprisemanaged policy fileOrg-wide, set by admins
User~/.claude/CLAUDE.mdAll your projects, private
Project./CLAUDE.mdThis repo, shared with team via git
Localproject-local, untrackedThis repo, just you (gitignored)
Commit ./CLAUDE.md so your whole team gets the same Claude behavior. Keep personal quirks in ~/.claude/CLAUDE.md or a local, gitignored memory file.

Get started fast

You almost never write CLAUDE.md from scratch. Let Claude scan the repo and draft one, then trim it.

  1. Run /init in a repo. Claude reads the codebase and generates a starter CLAUDE.md with structure, build/test commands, and conventions.
  2. Run /memory to open your memory files in your editor and clean up the draft.
  3. Prune ruthlessly (see the rule below), then commit it.

The # shortcut: add a memory mid-flow

When Claude makes a mistake or you spot a convention worth saving, start a prompt with #. Claude appends it to a memory file for you, so you never lose the lesson.

# Always run tests with `pnpm test`, never `npm test`
# The staging deploy script must be run from the repo root

It lets you pick which memory file to write to (user vs project), so a one-off correction becomes a permanent rule in seconds.

What to put in (and leave out)

The golden test for every line: "Would removing this cause Claude to make a mistake?" If no, delete it. A bloated CLAUDE.md gets ignored because the real rules drown in noise.

Include

Non-guessable bash commands, non-default code style, the exact test runner, repo etiquette, project architecture, and env quirks or gotchas.

Exclude

Anything inferable from the code, standard language conventions, and info that changes often (it goes stale and misleads).

A bloated CLAUDE.md actively hurts you. When rules get lost in noise, Claude starts ignoring all of them. Shorter and sharper beats long and thorough.

Make rules stick

Claude weights emphatic instructions more heavily. Use IMPORTANT or YOU MUST for rules that really matter.

Move sometimes-relevant material out of CLAUDE.md into Skills or path-scoped .claude/rules/ files. That keeps the always-loaded core lean while still having deep guidance available on demand.

A real CLAUDE.md snippet

Concrete, scannable, and every line earns its place:

# Project: Acme API

## Commands
- Install: `pnpm install`
- Dev server: `pnpm dev` (port 3000)
- Test: `pnpm test` (Vitest) — NEVER use npm
- Typecheck: `pnpm typecheck`
- Lint+fix: `pnpm lint --fix`

## Code style
- TypeScript strict mode; no `any`.
- Immutable data only — never mutate inputs, return new objects.
- Prefer async/await over `.then()`.

## Architecture
- API routes live in `src/routes/`, business logic in `src/services/`.
- DB access goes ONLY through `src/repos/` (repository pattern).

## Gotchas
- IMPORTANT: run `pnpm db:migrate` after pulling — schema drifts often.
- The `legacy/` folder is frozen. YOU MUST NOT edit files there.

Inspect what loaded

Not sure which memory files Claude is actually using? Two commands give you ground truth.

/memory    # opens loaded CLAUDE.md files for editing
/context   # live token breakdown — see how much memory costs
After a big edit to CLAUDE.md mid-session, remember it invalidates the prompt cache and triggers one slower, more expensive turn. Prefer editing memory between tasks, not in the middle of one.

Treat CLAUDE.md as a living document. Every time Claude repeats a mistake, capture the fix with # — over a few weeks your repo trains itself.

🎓
SECTION 10

Skills

Teach Claude a procedure once, and it loads itself exactly when needed.

⚡ WHY IT MATTERSYou can install 100+ skills for the cost of a few hundred tokens, because only the matching skill's full body ever loads.
Skill shelf (Level 1: metadata only, ~100 tokens each)pdf-toolsfill / sign / extract PDFsdata-scrapercollect public data on a schedulepostgres-patternsoptimize SQL queries & indexesswiftui-patternsSwiftUI state & navigationmarket-researchcompetitive analysis100+ skills dormant on disk (0 body tokens)Task arrives“My DB query isslow, add an index”match by descriptionread bodyContext window(kept lean)system + tool defsall skill metadatapostgres-patternsSKILL.md body (<5k tok)conversation historyOnly the matched body loads — progressive disclosure
A task matches one skill's description, and only that skill's body is read from the shelf into the lean context window while the rest stay dormant on disk.

A Skill is a folder with a SKILL.md that Claude auto-loads the moment your request matches it, then forgets again to keep context lean.

Think of a skill as a reusable playbook on disk. Claude reads only the name and description at startup, and pulls in the full instructions only when they are actually relevant. This is called progressive disclosure, and it is why skills scale where giant prompts do not.

Anatomy of a skill

Every skill is a directory containing one SKILL.md file. That file has YAML frontmatter (between --- markers) plus markdown instructions. You can bundle extra scripts, reference docs, or data files alongside it.

.claude/skills/
└── deploy-staging/
    ├── SKILL.md          # frontmatter + instructions
    ├── deploy.sh         # bundled script (run via bash)
    └── checklist.md      # reference, loaded only if needed
The directory name sets the slash command. .claude/skills/deploy-staging/SKILL.md becomes /deploy-staging. The name frontmatter field only changes the display label, not what you type.

A minimal SKILL.md

Only description is recommended; every field is technically optional. The description is what Claude reads to decide whether to activate the skill, so make it state both what it does and when to use it.

---
name: deploy-staging
description: Deploy the app to the staging environment. Use when
  the user asks to ship, deploy, or push a build to staging.
---

# Deploy to staging

1. Run the test suite: `npm test`
2. Build the bundle: `npm run build:staging`
3. Run the bundled script: `bash deploy.sh staging`
4. Confirm the health check at /healthz returns 200.

Never deploy if tests fail. Report the deployed commit SHA.
Frontmatter validation: name is max 64 chars, lowercase letters / numbers / hyphens only, and cannot contain the words 'anthropic' or 'claude'. description is max 1024 chars with no XML tags. Gerund-style names like processing-pdfs are recommended.

Where skills live

LocationScopeCommand
~/.claude/skills/All your projects (personal)/name
.claude/skills/This project, shared via git/name
Plugin skills/ dirWhoever installs the plugin/plugin:name
Custom commands merged into skills. Your old .claude/commands/optimize.md still works and still creates /optimize. For anything new, prefer a skill folder so you can bundle scripts and resources.

How activation works

You get the skill two ways. Claude can auto-activate it when your wording matches the description, or you can fire it directly by typing the slash command. Both reach the same instructions.

  1. At session start, Claude preloads only metadata: each skill's name + description (~100 tokens each).
  2. When your request matches a description, Claude reads the full SKILL.md from disk.
  3. If the body points to other files or scripts, those load (or run via bash) only when actually needed.

Progressive disclosure: the three levels

Level 1 · Metadata

Name + description, ~100 tokens per skill. Always loaded at startup so Claude knows what exists.

Level 2 · Instructions

The SKILL.md body, kept under ~5k tokens. Loaded only when the skill is triggered.

Level 3+ · Resources

Bundled docs, scripts, data. Effectively unlimited; zero context cost until a file is read or a script is run.

Controlling who can invoke

Two frontmatter switches tune visibility. They trade off context budget against convenience.

FrontmatterEffect
(default)Both you and Claude can invoke; description stays in context.
disable-model-invocation: trueOnly you can run it via /name. Description is NOT kept in context (saves budget).
user-invocable: falseOnly Claude invokes it; hidden from the / menu. Description stays in context.
Want to toggle visibility without editing the file (handy for shared repos)? Open the /skills menu, highlight a skill, press Space to cycle states (on name-only user-invocable-only off), then Enter. It writes skillOverrides in settings.

When skills beat plain prompts

Reusable procedure

You re-explain the same multi-step task often. Write it once; let it auto-load.

Keeps context lean

Long instructions sit on disk at Level 2/3 instead of bloating every turn.

Bundled determinism

Ship a tested script next to the skill so Claude runs it instead of re-generating code.

Team-shareable

Commit .claude/skills/ and the whole team inherits the same playbook.

Reach for a plain prompt instead when the task is a one-off, or so simple that a sentence does the job. Skills earn their keep on repeated, structured work.

Authoring rules that matter

  1. Keep the SKILL.md body under ~500 lines; split overflow into separate files.
  2. Keep file references one level deep, and use forward slashes in paths.
  3. Add a table of contents to any reference file longer than 100 lines.
  4. Prefer a deterministic bundled script over asking Claude to write code each time.
  5. Put the most important instructions near the top — skill bodies are re-injected after /compact but capped at 5,000 tokens each.
Subagents differ from chat. A subagent that preloads a skill (via the skills: frontmatter field) gets the FULL skill body injected at startup, not just the description — so a worker agent always has its playbook ready without needing to be triggered.
SDK gotcha: the allowed-tools frontmatter field works only in the Claude Code CLI, not via the Agent SDK. In the SDK, filter with the skills option on query() and use the main allowedTools option instead. Claude Code supports filesystem skills only — there is no API upload.
🤖
SECTION 11

Subagents & Delegation

Spin up focused helpers that work in their own context and report back

⚡ WHY IT MATTERSParallel subagents do heavy work in isolated context windows, so the orchestrator stays lean and N tasks finish in the time of one.
fan-out → isolated parallel work → fan-in & mergeOrchestratorprimary agent · owns plan & userTask toolTask toolReviewerisolated contextResearcherisolated contextTesterisolated contexteach returns one final summary → orchestrator synthesizesup to ~10 concurrent · no peer-to-peer · heavy tokens stay isolated
An orchestrator agent fans out to three parallel subagents, each with its own isolated context, which run concurrently and fan their final results back up to be merged.

Subagents are disposable specialists with their own fresh context window, so dirty research and verbose output never pollute your main chat.

A subagent is just a Markdown file with YAML frontmatter. The frontmatter sets identity and limits; the body is the subagent's system prompt. Claude can call them automatically or you can invoke them by name.

Why delegate at all

The core constraint of Claude Code is that context fills fast and quality drops as it fills. A subagent runs in its own window, does the noisy work, and returns only a short summary, keeping your main thread clean.

Isolated context

Each subagent starts fresh. Verbose searches, test logs, and dead ends stay out of your main conversation.

Scoped tools

Give a reviewer read-only access. The tools allowlist means it literally cannot edit or run destructive commands.

Cheaper model per job

Route grunt work to Haiku, hard reasoning to Opus, via the model field, independent of your main session.

Parallel work

Fan out several subagents on independent investigations at once, then let Claude merge their findings.

Where they live

Project agents go in .claude/agents/ (shared with your team). Personal agents go in ~/.claude/agents/. When names collide, the higher-priority location wins; project beats user.

LocationScopePriority
--agents CLI flagThis session onlyHighest (after managed)
.claude/agents/Project / teamHigh
~/.claude/agents/All your projectsMedium
Plugin agents/Installed pluginsLowest

A real agent definition

Only name and description are required. The tools line is an allowlist; omit it and the subagent inherits every tool from the main chat. Adding 'use proactively' to the description nudges Claude to delegate on its own.

---
name: code-reviewer
description: Reviews code for bugs, security, and style. Use proactively after writing or changing code.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a senior code reviewer. When invoked:
1. Run `git diff` to see what changed.
2. Flag bugs, security issues, and missing error handling.
3. Group findings as CRITICAL / HIGH / MEDIUM / LOW.
4. Suggest concrete fixes. Do NOT edit files yourself.
Return a short, scannable summary only.
Save the file, then in-session run /agents to verify it loaded. Agents edited directly on disk need a session restart; agents created through the /agents UI take effect immediately.

How to invoke one

Three ways to call a subagent, from least to most explicit.

  1. Auto-delegation — describe the task and Claude picks an agent based on its description.
  2. Natural language — name it: "Use the code-reviewer to check my last commit."
  3. @-mention — guarantees that exact agent runs.
# @-mention forces the named subagent to run
@agent-code-reviewer check the auth changes in src/auth/

# Or run an entire headless session AS a subagent
claude --agent code-reviewer -p "review the staged diff"

Pick the right model per agent

The model field takes an alias (haiku, sonnet, opus), a full id like claude-opus-4-8, or inherit. It defaults to inherit (same model as your main chat) when omitted.

Agent jobModelWhy
Codebase search / scanhaikuFastest, cheap ($1/$5 per MTok), near-frontier for lookup
Code review / testssonnetBest balance for most production work
Architecture / hard debuggingopusDeepest reasoning for complex, costly decisions

Running several in parallel

For independent investigations, spawn multiple subagents at once. Each gathers its own context; Claude synthesizes the results. This works best when the paths truly don't overlap.

Investigate three things in parallel, one subagent each:
1. How is auth wired in src/auth/ ?
2. Where are database migrations defined?
3. What does the CI pipeline run on push?
Report each as a short summary.
Subagents run foreground (blocking) or background (concurrent, auto-denying anything that would prompt). Press Ctrl+B to background a running task, or set background: true in frontmatter to always background it.

When delegation helps vs hurts

Delegate when

Output is verbose and you don't need it in main context, you want tool/permission limits enforced, or the work is self-contained and returns a summary.

Stay in the main chat when

You need frequent back-and-forth, phases share context, the change is a quick one-liner, or latency matters — subagents start cold and must re-gather context.

Subagents cannot spawn other subagents — there is no nested delegation. For multi-stage workflows, chain them from the main conversation or use Skills. Also note Agent, AskUserQuestion, and plan-mode tools are unavailable inside a subagent.

Two power features

Add memory: project to give an agent persistent notes across sessions (stored in .claude/agent-memory/<name>/). Add skills: to preload full skill content into the subagent at startup — subagents get the entire skill injected, not just its description.

---
name: test-runner
description: Writes and runs tests; use proactively on new features.
tools: Read, Edit, Bash
model: sonnet
memory: project
skills: python-testing
isolation: worktree
---
Write failing tests first, then implement until green. Verify 80%+ coverage.
Use isolation: worktree for risky changes — the subagent runs in a throwaway git worktree, so it can experiment without touching your working tree. Combine with memory: project so it remembers conventions between runs.

Restricting who can spawn what

In settings or an agent definition, control delegation with the Agent tool syntax. List allowed agents in parens, allow any with a bare Agent, or omit it to block spawning entirely.

# Only let this agent spawn the named workers
# (Task is the old name for Agent — still works as an alias)
Agent(researcher, test-runner)

# Disable a built-in subagent for the session
claude --disallowedTools "Agent(Explore)"

Built-ins worth knowing: Explore (Haiku, read-only search), Plan (read-only, used in plan mode), and general-purpose (all tools). Explore and Plan skip CLAUDE.md and git status for speed; every other agent loads the full memory hierarchy.

🔌
SECTION 12

MCP Servers & Connectors

Plug Claude into your tools, your data, your whole stack

⚡ WHY IT MATTERSOne open protocol lets Claude plug into any external tool, file, or service through interchangeable connectors instead of bespoke integrations.
One protocol, many toolsMCP: host spawns a client per server; JSON-RPC 2.0 over stdio or Streamable HTTPClaudeMCP Host4 Clients (1:1 per server)model reasons; host mediates all clientsHTTPstdioGitHub Serverremote · HTTPFilesystem Serverlocal subprocess · stdioBrowser Serverlocal subprocess · stdioDocs Serverremote · HTTPeach server exposesToolsResourcesPromptsJSON-RPC tools/call request → result back
The diagram shows the Claude MCP host spawning per-server clients over stdio and Streamable HTTP transports to GitHub, Filesystem, Browser, and Docs servers, each exposing tools, resources, and prompts, with a JSON-RPC request/response dot traveling to one server.

MCP (Model Context Protocol) is the universal adapter that lets Claude talk to outside tools and data — GitHub, browsers, databases, docs, and your own scripts.

Without MCP, Claude only has its built-in tools (read, edit, bash). With MCP, you bolt on new tools that show up right inside the session. One claude mcp add command wires a server in for good.

When and why to add an MCP

Add a server when Claude keeps needing data or actions it can't reach on its own. The goal is fewer copy-pastes and more real work done in-session.

Reach live data

Query a database, fetch current library docs, or read your Notion — instead of pasting it by hand.

Take real actions

Open PRs on GitHub, drive a browser with Playwright, or send a Slack message.

Persist memory

A memory server lets Claude store and recall facts across sessions.

Wrap your own scripts

Expose an internal CLI or API as a tool your whole team can call.

Each connected server adds its tool definitions to context, which fills the window faster. Run /mcp to see per-server tool cost and disable servers you aren't using.

The three transports

A transport is just how Claude talks to the server. Pick based on where the server lives.

TransportUse it forFlag
stdioLocal subprocess servers on your machinedefault (no flag)
HTTPRemote / cloud servers (recommended)--transport http
SSELegacy remote — deprecated, use HTTP--transport sse

stdio: local process

Claude launches the server as a child process and talks over stdin/stdout. The -- separates Claude's flags from the command that starts the server — everything after it is passed through untouched.

claude mcp add playwright -- npx -y @playwright/mcp@latest

HTTP: remote server

Best for hosted/cloud services. Pass a name and a URL.

claude mcp add --transport http claude-code-docs https://code.claude.com/docs/mcp
All options (--transport, --env, --scope, --header) must come BEFORE the server name. For stdio, the command goes last after --.

Auth and environment variables

Use --env to pass secrets to a local server and --header to send auth on a remote one.

# Local server with an env var
claude mcp add github --env GITHUB_TOKEN=ghp_xxx -- npx -y @modelcontextprotocol/server-github

# Remote server with a bearer token
claude mcp add --transport http my-api --header "Authorization: Bearer $TOKEN" https://api.example.com/mcp

For remote servers that use OAuth, just run /mcp in-session and authenticate through the browser prompt — no manual token needed.

Scopes: who sees this server

Scope decides where the config is stored and who can use it. Set it with --scope.

ScopeVisibilityStored in
local (default)Just you, this project~/.claude.json (under project path)
projectWhole team, shared.mcp.json in repo root (commit it)
userJust you, all projects~/.claude.json
# Share a server with your team via version control
claude mcp add --scope project --transport http docs https://code.claude.com/docs/mcp
Older versions named scopes differently: today's local was project, and today's user was global. When the same server name exists in several scopes, precedence is local > project > user > plugin > connector (one definition wins, no merging).

The .mcp.json file

Project-scoped servers live in .mcp.json at your repo root, checked into git so the whole team gets them. Claude reads it only at session start — restart after editing.

{
  "mcpServers": {
    "docs": {
      "type": "http",
      "url": "https://code.claude.com/docs/mcp"
    },
    "playwright": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest"]
    }
  }
}

The type field accepts streamable-http as an alias for http, so configs copied straight from a server's docs work unchanged.

No secrets in the file

Use variable expansion so committed configs never hold real tokens. Supports ${VAR} and ${VAR:-default} in command, args, env, url, and headers.

{
  "mcpServers": {
    "my-api": {
      "type": "http",
      "url": "https://api.example.com/mcp",
      "headers": { "Authorization": "Bearer ${API_TOKEN}" }
    }
  }
}

The /mcp command

Run /mcp in any session to see configured servers, their connection status, and the tools each one exposes. It's also where you authenticate (OAuth) to remote servers.

Project servers from a cloned repo prompt for approval on first use — a guard so a malicious repo can't silently launch processes. Approve at the prompt or in /mcp. Reset all approvals with claude mcp reset-project-choices.

Managing servers from the CLI

claude mcp list                 # all configured servers
claude mcp get github           # show details + which scope holds it
claude mcp remove github        # remove (add --scope if in several)
claude mcp add-json db '{"type":"stdio","command":"..."}'
claude mcp add-from-claude-desktop   # import (macOS/WSL only)

Example servers to start with

GitHub

Read repos, search code, open and review PRs, manage issues — all without leaving the session.

Playwright

Drive a real browser: navigate, click, fill forms, screenshot, and test web flows.

Docs (Context7 / Claude docs)

Pull current, version-accurate library documentation instead of relying on training data.

Memory

A knowledge-graph server that lets Claude store and recall facts across sessions.

Using MCP tools in automation

MCP tools are namespaced mcp__<server>__<tool>, which is how you reference them in --allowedTools for non-interactive runs.

claude -p "Open a PR for the fix" --allowedTools "mcp__github__create_pull_request"

Security: trust before you connect

An MCP server runs code on your machine (stdio) or receives your data and tokens (remote). A malicious or compromised server can read files, exfiltrate secrets, or perform actions as you. Only add servers you trust.
  1. Prefer official servers and the Anthropic connector directory over random forks.
  2. Pin versions for stdio servers rather than always pulling @latest from strangers.
  3. Never hardcode secrets in .mcp.json — use ${VAR} expansion.
  4. Treat tool descriptions and returned data as untrusted input (prompt-injection risk).
  5. Keep project servers reviewed in code review, since they ship to the whole team via git.
Add servers narrowly: enable a heavy server (like a full browser or a broad API) only in the project that needs it, and disable it elsewhere to keep both your context window and your attack surface small.
🪝
SECTION 13

Hooks (Automation)

Deterministic guardrails the model can't skip

⚡ WHY IT MATTERSHooks let you intercept Claude at fixed lifecycle moments — and PreToolUse can deny a tool before it ever runs.
Automate at every lifecycle event — one turnUserPromptSubmitonce / turnPreToolUsegate — before toolallow / denyTool runsBash / Edit / ReadPostToolUseafter successadd contextStopend of turnagentic loop — next tool callDENY (exit 2) blocks toolrejection returned to modelHooks are deterministic callbacks in the host process — not the model context.Exit 0 = proceed · Exit 2 = block + feed stderr back to Claude · PreToolUse can short-circuit a call.
A horizontal turn timeline showing hook taps fire in order (UserPromptSubmit, PreToolUse, PostToolUse, Stop) around tool execution, with PreToolUse able to block a tool via exit 2.

Hooks run your own shell commands at fixed moments in Claude's lifecycle, giving you control the model cannot talk its way around.

A hook is a command that fires automatically on an event like a file edit or a finished response. Because the harness runs it (not Claude), the behavior is deterministic and reliable. Use hooks to auto-format code, block edits to protected files, or inject fresh context.

Hooks live under a top-level hooks key in any settings file: ~/.claude/settings.json (all projects), .claude/settings.json (project, committable), or .claude/settings.local.json (gitignored). All matching hooks merge and run together.

The lifecycle events

Events fire at three cadences: once per session, once per turn, and once per tool call. Pick the event that matches when you want to act.

EventFires whenCan block?
SessionStartSession begins or resumesNo
UserPromptSubmitYou submit a prompt (before Claude reads it)Yes (erases prompt)
PreToolUseBefore a tool call runsYes (blocks call)
PostToolUseAfter a tool call succeedsNo (already ran)
StopClaude finishes respondingYes (keeps going)

How the config nests

The structure has three levels: event name, then matcher groups, then handler objects. A matcher filters which tool triggers the hook.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "echo ran" }
        ]
      }
    ]
  }
}
Matcher rules: only letters/digits/_/| means an exact match with | as "or" (e.g. Edit|Write). A *, empty, or omitted matcher matches everything. Anything else is treated as a regex (e.g. mcp__.* matches all MCP tools). Matchers are case-sensitive.

Exit codes: how a hook talks back

A command hook receives event JSON on stdin and communicates through its exit code. This is the contract you must get right.

Exit codeMeaning
0Success. stdout may carry JSON for fine control.
2Blocking error. stderr is fed back to Claude as feedback.
otherNon-blocking error. A warning shows; execution proceeds.
Use exit 2 to enforce policy, not exit 1. Only exit 2 actually blocks. A PreToolUse hook that exits 2 blocks the tool even under bypassPermissions or --dangerously-skip-permissions — PreToolUse hooks run before any permission-mode check.

Use case 1: Auto-format after every edit

The classic PostToolUse hook. It reads the edited file path from the event JSON with jq and pipes it to a formatter.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path' | xargs -r prettier --write"
          }
        ]
      }
    ]
  }
}

Swap prettier for black, gofmt, or rustfmt depending on your stack. Add an "if": "Edit(*.py)" field to scope it to one file type.

Use case 2: Block edits to protected files

A PreToolUse hook can inspect the proposed tool call and refuse it. Exit 2 cancels the edit and tells Claude why.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "f=$(jq -r '.tool_input.file_path'); case \"$f\" in *.env|*/secrets/*) echo 'Protected file, blocked.' >&2; exit 2;; esac"
          }
        ]
      }
    ]
  }
}
For structured control on PreToolUse, exit 0 and print JSON instead: hookSpecificOutput.permissionDecision set to allow, deny, or ask. Pick exit-code signaling OR JSON, never both. When several hooks run, the most restrictive wins (deny > ask > allow). Note that allow still respects existing deny/ask permission rules.

Use case 3: Inject context at session start

For UserPromptSubmit and SessionStart, stdout is injected straight into Claude's context. That makes them perfect for feeding in live state like the current git branch.

{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup",
        "hooks": [
          { "type": "command", "command": "echo \"Branch: $(git branch --show-current)\"" }
        ]
      }
    ]
  }
}

The SessionStart matcher filters by source: startup, resume, clear, or compact.

Use case 4: Forced verification on Stop

A Stop hook runs when Claude thinks it's done. Run your tests there and return decision: "block" to push Claude to keep fixing until they pass.

Stop hooks fire every time Claude finishes — including right after the hook itself forces a continue. Always check the stop_hook_active field in the input and exit 0 when it is true, or you will create an infinite loop.

Handler types beyond shell

command

Run a shell command. Gets event JSON on stdin; replies via exit code and stdout.

http

POST the event to an endpoint — good for central audit or team policy servers.

mcp_tool

Invoke an MCP tool as the handler.

prompt / agent

Run an LLM prompt or a subagent to make a judgment call at the hook point.

Handlers also accept "timeout" (seconds) and "async": true. Use async for non-blocking work like an audit log — an unmatched PostToolUse hook that appends JSONL records every action without slowing Claude down.

Manage and inspect your hooks live with the /hooks slash command, which shows what is configured across all your settings files.

🆕
WHAT'S NEW · JUNE 9, 2026

Claude Fable 5 & Mythos 5

Anthropic's new flagship — state-of-the-art on nearly every benchmark

On June 9, 2026 Anthropic released Claude Fable 5 — its most capable model yet — plus Claude Mythos 5, the same model with safeguards lifted for vetted research.

Fable 5 is state-of-the-art on nearly every benchmark tested, can work autonomously across millions of tokens, and ships at less than half the price of the earlier Mythos Preview.

What launched

Claude Fable 5

The general-availability flagship, with full safeguards. API id claude-fable-5. Best-in-class at software engineering, knowledge work, vision, and science.

Claude Mythos 5

The same underlying model with safeguards lifted in narrow areas. Restricted to Project Glasswing partners and select biology researchers — not general access.

The numbers

ItemDetail
API idclaude-fable-5
Price (in / out)$10 / $50 per million tokens
AutonomyRuns unattended across millions of tokens on long tasks
AvailabilityLive now on the Claude API and consumption-based Enterprise plans. Subscription plans: included through June 22, then via usage credits.

Why it's a big deal

Software engineering

State-of-the-art on CursorBench and top score on Cognition's FrontierCode at medium effort. In a Stripe case study it compressed months of work on a 50-million-line Ruby migration into days.

Analysis & finance

First model to break 90% on complex analytical tasks — a 10-point jump over Opus — and the highest score of any model on Hebbia's finance benchmark.

Science & vision

~10x faster protein design, novel molecular-biology hypotheses preferred 80% of the time vs. Opus-class, and strong vision work like rebuilding web apps from screenshots.

What it means for you

  • Reach for Fable 5 when a task is genuinely hard — large refactors, deep research, multi-step agentic runs — where a better answer is worth the higher token price.
  • Keep routing cheap, high-volume work to Haiku and everyday coding to Sonnet. Fable 5 is the premium tier, not the default for everything (see Pick the Right Brain).
  • In Claude Code, switch with /model claude-fable-5. Front-load context once so prompt caching keeps the cost down.
Safety: Fable 5 has three classifier-based guards — cybersecurity, biology/chemistry, and model-distillation attempts — that fall back to Claude Opus 4.8. Anthropic reports these trigger in under 5% of sessions, and external red-teamers landed zero harmful single-turn requests across 30 jailbreak techniques.

Source: anthropic.com/news/claude-fable-5-mythos-5

🧩
SECTION 14

Plugins & Marketplaces

Bundle your whole setup into one installable unit

⚡ WHY IT MATTERSOne plugin bundles commands, agents, skills, hooks, and MCP servers so a whole capability set installs into any project or team in one move.
Bundle capabilities, share themPlugin packageone installable bundle/Commands slash workflowsAgents subagent definitionsSkills SKILL.md knowledgeHooks lifecycle callbacksMCP servers external toolsinstallClaude Code project.claude/ wired up at oncecommandsagentsskills + hooksMCP configready to useother projects+ teammatesportable across projects & teams
A single Plugin package bundling Commands, Agents, Skills, Hooks, and MCP servers, installing into a Claude Code project and staying portable across other projects and teams.

A plugin packages commands, agents, skills, hooks, and MCP servers into a single thing you install once and reuse everywhere.

Building a great Claude Code setup takes work: custom skills, subagents, format-on-save hooks, the right MCP servers. A plugin wraps all of that into one directory you can share. Install it on a new machine, hand it to a teammate, or drop it into every repo and your capabilities travel with you.

What lives inside a plugin

A plugin is a self-contained folder. Components are auto-discovered from the directory layout at the plugin root, so you rarely have to wire anything up by hand.

skills/

Each skill is a folder with a SKILL.md. Namespaced as /plugin-name:skill-name to avoid collisions.

agents/

Subagents shipped with the plugin. (Plugin agents ignore hooks, mcpServers, and permissionMode for security.)

hooks/hooks.json

Lifecycle hooks (auto-format, audit logs, policy gates) that activate when the plugin is enabled.

.mcp.json

MCP servers the plugin needs, started automatically alongside the plugin.

bin/, themes/, output-styles/

Executables added to the Bash PATH, plus themes and output styles.

.claude-plugin/plugin.json

The only file inside .claude-plugin/. It is OPTIONAL — used to set name, version, and author.

Components live at the plugin ROOT, not inside .claude-plugin/. Only plugin.json goes in .claude-plugin/.

The /plugin manager

Run /plugin to open the manager — a tabbed interface you cycle with Tab / Shift+Tab.

TabWhat it does
DiscoverBrowse plugins from every marketplace you've added.
InstalledEnable, disable, or uninstall, grouped by scope.
MarketplacesAdd, remove, or update marketplace sources.
ErrorsSee what failed to load and why.

Recent versions also show a Context cost estimate, a Last updated date, and a 'Will install' breakdown listing exactly which commands, agents, skills, hooks, and MCP/LSP servers a plugin adds before you commit.

Install your first plugin

Anthropic ships a curated marketplace, claude-plugins-official, that is available automatically in every install — no setup needed.

# Install the official GitHub plugin (user scope by default)
/plugin install github@claude-plugins-official

# Activate it this session without restarting
/reload-plugins

The syntax is always plugin-name@marketplace-name. That @marketplace suffix is how Claude knows which catalog the plugin came from.

After installing, enabling, or disabling mid-session, run /reload-plugins to apply changes instead of restarting Claude.

Add a marketplace

A marketplace is just a repo (or hosted file) with a .claude-plugin/marketplace.json catalog at its root. Add one, then install from it.

# Add Anthropic's community marketplace
/plugin marketplace add anthropics/claude-plugins-community

# Now install any plugin it lists
/plugin install commit-commands@claude-community

Sources are flexible. Use whichever fits where your plugins live.

Source typeExample
GitHub owner/repoanthropics/claude-code
Git URL (GitLab, self-hosted)https://gitlab.com/team/cc-plugins.git
Local path./my-marketplace or a marketplace.json
Hosted URLremote marketplace.json
Shortcuts: /plugin market works for marketplace, and rm works for remove.

Share a plugin with your whole team

This is the real payoff. Install at project scope and the plugin is recorded in enabledPlugins in .claude/settings.json — which you commit. Every teammate who clones the repo gets the same setup.

# Write the plugin into the shared project settings file
/plugin install lint-suite@claude-community --scope project

One reviewer agent, one format hook, one test command — defined once, used by everyone. No more 'works on my machine' tooling drift.

Scripted install (CLI, non-interactive)

For CI, dotfiles, or onboarding scripts, use the non-interactive claude plugin subcommands.

# Add a marketplace and install without the UI
claude plugin marketplace add anthropics/claude-plugins-community
claude plugin install commit-commands@claude-community

# Audit what's installed (machine-readable)
claude plugin list --json
Validate before you publish: claude plugin validate checks plugin.json, every skill/agent frontmatter, and hooks.json. Scaffold a fresh one with plugin init --with skills,agents,hooks,mcp.

Build a plugin in five steps

  1. Run plugin init (or make a folder) and drop components into skills/, agents/, hooks/, and .mcp.json at the root.
  2. Reference bundled files with ${CLAUDE_PLUGIN_ROOT} in hooks and MCP configs — plugins are copied to a cache, so absolute paths break.
  3. Optionally add .claude-plugin/plugin.json with a name, version, and author. Skip version and the git commit SHA is used.
  4. Run claude plugin validate to catch frontmatter and manifest errors.
  5. Push to a repo, add a .claude-plugin/marketplace.json catalog, and share the owner/repo as a marketplace source.
Skip the marketplace entirely for local dev: any folder under a skills directory with a .claude-plugin/plugin.json loads in place as name@skills-dir on the next session — no install, no cache. Great for iterating on a plugin you're authoring.
Anthropic does NOT verify plugin contents. A plugin can add hooks, MCP servers, and bin/ executables that run on your machine. Read the 'Will install' list and trust the source before installing.

Why plugins win

Skills, agents, and hooks scattered across ~/.claude/ and individual repos are hard to move and easy to forget. A plugin makes that whole stack portable: version it, audit its context cost in /plugin, enable it per project, and update it when the source changes.

Think in layers: user scope for your personal toolkit on every project, project scope (committed) for the standards your whole team must share.
🔀
SECTION 15

Multi-Agent Workflows & Loops

From one agent to a coordinated team — without burning your token budget

⚡ WHY IT MATTERSFan out independent work to parallel subagents, then pipeline through compose and verify to ship complex tasks in a fraction of the time.
FAN-OUTPARALLEL PIPELINEFAN-INCoordinatorprimary agentResearchisolated contextResearchisolated contextResearchisolated contextComposedraft outputComposedraft outputComposedraft outputVerifyself-checkVerifyself-checkVerifyself-checkMergedresult
A Coordinator fans out to parallel Research subagents whose outputs pipeline through parallel Compose then Verify stages before fanning in to a merged result.

One agent is a worker; many agents are an assembly line that you steer from a single chair.

Subagents each run in their own fresh context window with their own tools. They do the heavy reading, then hand back a short summary. That is the whole trick: keep your main context clean while the noisy work happens elsewhere.

A subagent starts cold. It has no memory of your chat, must gather its own context, and returns only what it reports. Great for self-contained work, bad for tasks needing constant back-and-forth.

When orchestration is worth it (and when it is not)

The core constraint never changes: context fills fast and quality drops as it fills. Multi-agent setups exist to protect that budget, not to look clever. Spending more tokens only pays off when the work is parallelizable or verbose.

Use subagents / loops whenStay in the main chat when
Task spits out verbose output you do not need to keepYou need frequent back-and-forth steering
Several investigations are independent (fan-out)Phases share the same context
You want to enforce tool/permission limitsThe change is a quick one-line diff
Work is self-contained and returns a summaryLatency matters (cold start costs time)
Subagents cannot spawn other subagents. There is no nested delegation. Chain them from the main conversation, or use Skills, when you need multiple stages.

Pattern 1: Parallel fan-out (research)

Spin up several agents at once on independent slices of a problem. Each explores in its own context; the main thread then synthesizes their summaries. This works best when the paths truly do not overlap.

Investigate our auth system. Run three subagents in parallel:
1. Explore: map every login/session file and how tokens are issued
2. Explore: list all places that read or write user roles
3. Explore: find rate-limiting and lockout logic
Then summarize the attack surface in one page.

The built-in Explore agent is perfect here: it is Haiku-backed, read-only, and skips CLAUDE.md and git status to start lean. Cheap, fast, disposable.

Scope every fan-out task narrowly. An unscoped "investigate the codebase" causes infinite exploration that floods context. Name files, set boundaries, demand a short report.

Pattern 2: The pipeline (research → compose → verify)

Some work is a relay, not a sprint. Each stage runs as a separate subagent so a bloated research phase never pollutes the writing phase. Define small, single-purpose agents in .claude/agents/.

---
name: researcher
description: Gathers facts and returns a tight brief. Use proactively.
tools: Read, Grep, Glob, Bash
model: haiku
---
You investigate, then return ONLY a bulleted brief with file paths.
Never edit files. Never speculate beyond what you found.

A reviewer agent caps cost the same way by restricting tools and using a cheaper model:

---
name: verifier
description: Checks work against the brief and the tests.
tools: Read, Bash
model: sonnet
disallowedTools: Write, Edit
---
Run the test suite. Report PASS/FAIL plus the smallest fix list.
Match the model to the job. Use haiku for search and high-volume sub-tasks, sonnet for most production work, and opus for long-horizon reasoning. The model frontmatter field defaults to inherit.

Pattern 3: Find → fix → test loops

An autonomous loop repeats a step until a quality gate passes. The cleanest gate in Claude Code is a Stop hook: it fires when Claude finishes and can refuse to let it stop until tests are green.

{
  "hooks": {
    "Stop": [{
      "hooks": [{
        "type": "command",
        "command": "npm test --silent || { echo 'TESTS FAILED' >&2; exit 2; }"
      }]
    }]
  }
}

Have the hook script exit 2 on failure: that blocks the stop and feeds stderr back to Claude as a to-do, so it keeps fixing. Exit 0 lets it finish.

Always guard against infinite loops. Check the stop_hook_active field in the hook input and exit 0 when it is true, or your loop never ends.

Headless loops for scripts and CI

Drive Claude non-interactively with -p (print mode). Capture the session ID as JSON, then resume it to chain steps without losing state.

sid=$(claude -p "Find the failing test and its cause" \
        --output-format json | jq -r '.session_id')

claude -p "Fix it, then run the suite until green" \
        --resume "$sid" \
        --permission-mode acceptEdits \
        --allowedTools "Edit,Bash(npm test)"

For reproducible CI runs, add --bare to skip auto-discovery of hooks, skills, MCP, and CLAUDE.md, then load only what you need explicitly. It is the recommended mode for scripted and SDK calls.

--permission-mode dontAsk denies anything not in your allow rules; acceptEdits auto-approves file writes plus safe filesystem commands. Pair with --allowedTools so locked-down loops never hang on a prompt.

Monitoring and backgrounding agents

Long jobs do not have to block you. Press Ctrl+B to background a running task, or set background: true in an agent's frontmatter. Backgrounded agents run concurrently and auto-deny anything that would prompt.

/agents

Opens a tabbed UI. The Running tab shows live subagents; the Library tab creates and edits them.

claude agents

Dispatch and watch parallel background sessions. claude agents --json prints them for scripting.

isolation: worktree

Runs a subagent in a throwaway git worktree so parallel agents never collide on the same files.

memory: project

Gives an agent persistent cross-session memory under .claude/agent-memory/ — the recommended default.

Token discipline for any multi-agent run

  1. Plan in plan mode first (Shift+Tab) so a stable prefix caches once and every execution turn reads from it.
  2. Delegate broad searches to subagents; let only their short summaries enter main context.
  3. Write the plan to a file like PLAN.md so it survives /clear between phases.
  4. /clear between unrelated tasks — fresh context is free.
  5. Check spend with /cost and live usage with /context.
Reserve opus and ultrathink for the orchestrator's planning and synthesis, where reasoning depth pays off. Let cheap worker agents do the grunt work. That is where heavy orchestration actually beats a single agent on both quality and cost.
🛠️
SECTION 16

Claude API & Agent SDK

Build on Claude — the raw Messages API and the Agent SDK that powers Claude Code

⚡ WHY IT MATTERSThe agent loop is just one self-feeding cycle: the model keeps calling tools until it has enough to answer.
The agent loop, under the hoodprompt -> model -> tool_use -> tool_result -> model, until no tool calls remainPrompt+ system + toolsModel (Claude)evaluate & respondtext and/or tool_useRun toolharness executesRead / Edit / Bashtool_resultfed back into contextFinal answerno tool calls -> exittool_useloop back to modelrepeat the turn
A cycle showing Prompt feeding the Model, which emits tool_use to run a tool, whose tool_result loops back to the Model, repeating until the loop exits to the Final answer.

There are two ways to build on Claude: call the model directly with the Messages API, or reuse Claude Code's whole agent loop with the Agent SDK.

Pick the layer that matches your job. The Messages API is one stateless HTTP call where you own everything. The Agent SDK wraps the API in a ready-made agent loop with tools, memory, and MCP. Claude Code is the interactive CLI built on that same SDK.

Messages API

One model call. You write the tool loop, manage history, and control every byte. Best for chat, classification, extraction, RAG.

Agent SDK

The Claude Code agent loop as a library. File tools, Bash, MCP, and subagents work out of the box. Best for headless automation.

Claude Code

The interactive terminal app on top of the SDK. Best for hands-on coding and exploration, not unattended pipelines.

Layer 1 — The Anthropic Messages API

Every request is a POST to /v1/messages at https://api.anthropic.com. It is stateless: you resend the full conversation on each call (up to 100,000 messages). Three headers are required.

HeaderValue
x-api-keyyour API key
anthropic-version2023-06-01
content-typeapplication/json

Current models & list pricing

Model IDContextMax output$/MTok (in / out)
claude-opus-4-81M128k$5 / $25
claude-sonnet-4-61M64k$3 / $15
claude-haiku-4-5200k64k$1 / $5
Batch API is 50% off and cache reads cost roughly 1/10th of base input. Route cheap work to Haiku, hard reasoning to Opus.
Deprecations: claude-sonnet-4-20250514 and claude-opus-4-20250514 retire June 15 2026; claude-opus-4-1 retires Aug 5 2026. Claude 3 Haiku, Sonnet 3.7, and Haiku 3.5 are already retired and return errors. Migrate to Sonnet 4.6 / Opus 4.8.

Minimal Python request

pip install anthropic   # Python 3.9+, reads ANTHROPIC_API_KEY from env

from anthropic import Anthropic

client = Anthropic()
msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a terse senior network engineer.",
    messages=[{"role": "user", "content": "Explain BGP route reflectors in 2 sentences."}],
)
print(msg.content[0].text)

Minimal TypeScript request

npm install @anthropic-ai/sdk   // 4.9+, Node 20+; also Deno/Bun/browser

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // reads ANTHROPIC_API_KEY
const msg = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: "You are a terse senior network engineer.",
  messages: [{ role: "user", content: "Explain BGP route reflectors in 2 sentences." }],
});
console.log(msg.content[0].type === "text" && msg.content[0].text);
System prompts use the top-level system parameter (string or array of text blocks). There is NO system role in the messages array. New in Opus 4.8: mid-conversation role: "system" entries are allowed and preserve cache hits when instructions change mid-session.

Tool use (function calling) & the agentic loop

Pass a tools array — each tool has a name, description, and JSON-Schema input_schema. When Claude wants a tool it returns stop_reason: "tool_use". You run the tool and feed the result back as a tool_result block, then loop.

tools = [{
    "name": "get_weather",
    "description": "Current weather for a city",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

messages = [{"role": "user", "content": "Weather in Detroit?"}]
while True:
    resp = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=messages
    )
    messages.append({"role": "assistant", "content": resp.content})
    if resp.stop_reason != "tool_use":
        break
    results = []
    for block in resp.content:
        if block.type == "tool_use":
            out = run_my_tool(block.name, block.input)   # your code
            results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": out,
            })
    messages.append({"role": "user", "content": results})

Loop while stop_reason == "tool_use"; exit on end_turn, max_tokens, stop_sequence, or refusal. Control calls with tool_choice (auto, any, named tool, or none) and disable_parallel_tool_use: true to force at most one call.

Server-executed tools run inside Anthropic's infra — you only enable them and read results. Example: {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}. Long server loops may return stop_reason: "pause_turn"; just resend the response to continue.

Streaming, vision & PDFs

Set stream: true for SSE. The event flow is message_start then per-block content_block_start / _delta / _stop, then message_delta and message_stop. SDK helpers hide the plumbing.

with client.messages.stream(
    model="claude-sonnet-4-6", max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about subnets."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()

All current models accept images via {"type":"image","source":{...}} (base64 with media_type, or a URL source). PDFs go through document blocks and Claude reads both text and visuals.

messages=[{"role": "user", "content": [
    {"type": "image", "source": {"type": "url", "url": "https://example.com/diagram.png"}},
    {"type": "text", "text": "What VLAN topology is this?"},
]}]
SDKs require streaming for large max_tokens to avoid HTTP timeouts on long generations.

Prompt caching — cut repeat cost ~90%

Mark stable, reused content (big system prompts, docs, tool defs) with cache_control. Cache reads cost about 1/10th of base input. Place the breakpoint after the static prefix.

system=[
    {"type": "text", "text": LONG_STYLE_GUIDE,
     "cache_control": {"type": "ephemeral"}},
]

TTL is 5 minutes by default or 1 hour (now GA, no beta header). Minimum cacheable length is 1,024 tokens on Opus 4.8 / Sonnet 4.6 (4,096 on Haiku 4.5). Usage reports cache_creation_input_tokens and cache_read_input_tokens.

Pre-warm the cache by sending max_tokens: 0 — the API writes the cache at the breakpoint and returns an empty content array with stop_reason: "max_tokens" and full usage, generating no output. (Rejected inside Batch requests.)

Batch API — 50% off, async at scale

For non-urgent bulk jobs (evals, backfills, large extractions), submit many requests at once for half price. Poll until done, then read results.

batch = client.messages.batches.create(requests=[
    {"custom_id": "job-1",
     "params": {"model": "claude-haiku-4-5", "max_tokens": 256,
                "messages": [{"role": "user", "content": "Classify: ..."}]}},
])
# poll batch.processing_status until == "ended", then:
for r in client.messages.batches.results(batch.id):
    print(r.custom_id, r.result)
Batches support up to 300k output tokens on Opus 4.8/4.7/4.6 and Sonnet 4.6 with the output-300k-2026-03-24 beta header.

Extended thinking & structured outputs

Turn on deeper reasoning with thinking. Claude 4/4.5 and Sonnet 4.6 use {"type":"enabled","budget_tokens": N} (N must be < max_tokens); Opus 4.6/4.7/4.8 use {"type":"adaptive"} with interleaved thinking auto-enabled. With tool use, only tool_choice auto or none are allowed, and you must pass thinking blocks back.

Structured outputs are GA on Sonnet 4.5 / Opus 4.5 / Haiku 4.5 via output_config.format (no beta header) when you need guaranteed JSON shapes.

Official SDKs also cover Java 8+, Go 1.23+, Ruby 3.2+, C#, and PHP 8.1+, plus an ant CLI. Companion endpoints: Models API, Token Counting API, Files API, and the Admin API.

Layer 2 — The Claude Agent SDK

The Agent SDK (formerly the Claude Code SDK) runs the exact same agent loop that powers Claude Code, inside your process. It handles tool execution, context management, retries, and prompt caching so you don't write the tool loop yourself.

# Python (3.10+)
pip install claude-agent-sdk

# TypeScript — bundles a native Claude Code binary, no CLI install needed
npm install @anthropic-ai/claude-agent-sdk

Tiny Python agent

import anyio
from claude_agent_sdk import query

async def main():
    async for message in query(
        prompt="Find every TODO in src/ and summarize them."
    ):
        print(message)

anyio.run(main)

Python has two entry points: query() creates a fresh session per call and returns an async iterator; ClaudeSDKClient reuses one session across exchanges with manual lifecycle (connect / query / receive_response / interrupt / disconnect) and supports interrupts.

Tiny TypeScript agent

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Find every TODO in src/ and summarize them.",
})) {
  console.log(message);
}

In TypeScript, query() returns an async generator (the Query object) that maintains the session and exposes interrupt(), setPermissionMode(), setModel(), setMcpServers(), and close().

Built-in tools — nothing to implement

Files

Read, Write, Edit, Glob, Grep

Shell & web

Bash, WebSearch, WebFetch, Monitor

Orchestration

Agent (subagents), Skill, TaskCreate/TaskUpdate, ToolSearch, AskUserQuestion

Read-only tools (Read, Glob, Grep, read-only MCP) can run concurrently; state-mutating tools (Edit, Write, Bash) run sequentially.

Permission modes — the safety dial

ModeBehavior
defaultUnmatched tools trigger the canUseTool callback
acceptEditsAuto-approves file edits + filesystem commands (mkdir/rm/mv/cp)
planRead-only tools only
dontAskDenies anything not pre-approved (never calls the callback) — for CI
bypassPermissionsApproves all, but deny rules and hooks still apply
Evaluation order: hooks first, then deny rules (a deny blocks even in bypassPermissions), then the mode, then allow rules, then canUseTool. allowed_tools only pre-approves — it does NOT restrict Claude to those tools.

Custom tools, MCP & subagents

Define in-process tools with the @tool decorator (Python) or tool() helper (TypeScript, Zod schema), bundle with create_sdk_mcp_server() / createSdkMcpServer(), and pass via mcpServers. Return isError: true to signal failure without halting the loop; an uncaught exception stops it.

from claude_agent_sdk import tool, create_sdk_mcp_server

@tool("add", "Add two numbers", {"a": float, "b": float})
async def add(args):
    return {"content": [{"type": "text", "text": str(args["a"] + args["b"])}]}

server = create_sdk_mcp_server(name="math", tools=[add])
# pass server via the mcp_servers option of query()/ClaudeSDKClient

External MCP servers go through mcpServers/mcp_servers or a .mcp.json file, over stdio, HTTP/SSE, or in-process. Tool names follow mcp__{server}__{tool}; auto-approve with wildcards like mcp__github__*.

Subagents run in a fresh, isolated context and return only their final answer to the parent — keeping the main context lean. Define them via the agents option, as markdown in .claude/agents/, or use general-purpose. They are invoked via the Agent tool (renamed from Task in v2.1.63), so add Agent to allowedTools to auto-approve. Subagents cannot spawn their own subagents.

Hooks, sessions & cost caps

Hooks fire at lifecycle points (PreToolUse, PostToolUse, UserPromptSubmit, Stop, SubagentStop, PreCompact, and more). A PreToolUse hook can return permissionDecision allow/deny/ask/defer and even rewrite the input.

Capture ResultMessage.session_id to resume later via resume. The SDK also supports forking sessions, file checkpointing (rewind files to a prior message), reasoning effort (low→max), and guardrails max_turns and max_budget_usd.

Headless / CI automation

For pipelines, drive Claude Code non-interactively with claude -p.

claude -p "run the test suite and fix any failures" \
  --allowedTools "Bash,Read,Edit" \
  --permission-mode acceptEdits \
  --output-format json
Use --bare for reproducible CI: it skips auto-discovery of hooks, skills, plugins, MCP servers, auto-memory, and CLAUDE.md (auth must come from ANTHROPIC_API_KEY). Anthropic says --bare will become the default for -p in a future release.
Starting June 15 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive limits.

Which layer should I use?

You want to…Use
Add chat, classify/extract/summarize text, build RAGMessages API
Run a model with full control over the tool loop and historyMessages API
Ship an autonomous agent that reads/writes files, runs Bash, uses MCP, spawns subagentsAgent SDK
Automate coding tasks headless in CIAgent SDK / claude -p
Code hands-on in the terminal, explore a repo interactivelyClaude Code CLI
Rule of thumb: if you'd otherwise hand-write a tool loop with file access, retries, and context management, the Agent SDK already did it. If you just need the model's output, the Messages API is leaner. Full docs: docs.claude.com and code.claude.com/docs/en/agent-sdk/.
🎯
SECTION 17

Prompting for Best Results

Precise input, powerful output — stop guessing, start steering

Claude can infer intent, but it can't read your mind — the more precise your instructions, the fewer corrections you'll need.

Great results come from clear inputs. Tell Claude what to do, give it context, show it an example, and set the rules for "done." Then iterate in small steps and fix course early.

Anthropic's core rule: "The more precise your instructions, the fewer corrections you'll need." Vague asks waste tokens on guessing; specific asks land on the first try.

The seven habits of a strong prompt

Be specific

Name the exact task, file, and outcome. Replace "fix the bug" with the symptom, the file, and the expected behavior.

Give context

Say why, who it's for, and what came before. Context shapes the right answer; missing context invites wrong guesses.

Show an example

Point to a pattern Claude should copy: "follow the style in userService.ts."

State constraints

List the rules: no new dependencies, keep it under 50 lines, must pass pytest.

Define "done"

Give acceptance criteria so Claude can self-check before stopping.

Plan first

Ask for a plan before any edits. Review it, then say go. Cheaper to fix a plan than a diff.

Iterate small

One scoped change at a time beats one giant request. Smaller steps are easier to verify and correct.

Before and after: vague vs strong

The same goal, written two ways. The strong version names the file, the rule, and the test that proves success.

Vague (forces guessing)

fix the login bug

Strong (one-shot ready)

In @src/auth/login.ts, users with expired sessions get a
500 instead of a 401 redirect to /login.

Reproduce: call loginHandler() with an expired token.
Constraint: don't change the token schema; reuse the
existing redirectToLogin() helper.
Done when: the expired-token test in auth.test.ts passes
and `npm test auth` is green.
Notice the four parts: where (file), what's wrong (symptom), the rules (constraints), and the proof (acceptance test). Most strong prompts hit all four.

Point at files with @

Typing @ opens a file picker so Claude reads the exact file — no broad searching that floods context. Reference the real path instead of describing the code.

Refactor @src/api/users.py to use the repository pattern
in @src/api/orders.py as the reference. Keep public
function signatures unchanged.
File reads dominate context usage. Pointing Claude at one or two specific files keeps the window lean and the cache stable, so follow-up turns stay fast and cheap.

Ask for a plan first

Plan Mode is the highest-leverage habit. Toggle it with Shift+Tab (cycles normal / plan / auto-accept). In plan mode Claude can read, search, and think — but it literally cannot edit files until you approve.

  1. Enter plan mode with Shift+Tab or launch with claude --permission-mode plan.
  2. Let Claude explore the relevant files and propose a plan.
  3. Read the plan, correct the approach, then approve it.
  4. Claude executes against a stable, cached context.
Skip plan mode for one-sentence diffs — typos, log lines, renames. Use it for anything multi-file or risky.

Dial reasoning with thinking keywords

Extended thinking is on by default. Escalate the budget with keywords when a problem is genuinely hard.

PhraseUse for
thinkModerately tricky logic or a small refactor
think hard / think harderMulti-file changes, non-obvious bugs
ultrathinkArchitecture, hard debugging, costly decisions
More thinking improves reasoning depth, not rule compliance. Ultrathink can reason for hundreds of tokens and still break an explicit output rule — so still state your constraints plainly.

Correct course early

The moment an answer drifts, stop it. Don't let a wrong approach pile up in context.

Stop it

Press Esc to interrupt mid-action with context preserved, then redirect.

Erase the failure

Press Esc Esc (or /rewind) to undo a bad attempt. Rewinding deletes it from context entirely — cleaner than correcting on top of it.

Reset after two misses

After two failed corrections on the same issue, run /clear and re-prompt with a sharper ask. Fresh context is free.

Correcting leaves the failed approach polluting the window; rewinding removes it. Prefer rewind over "no, try again" when an attempt goes off the rails.

Do / Don't grid

DoDon't
Name the exact file: @src/db/pool.tsSay "the database file"
State acceptance criteria ("done when tests pass")Leave "done" undefined
Ask for a plan before editsLet Claude rewrite five files unprompted
Point to a pattern to copyAssume Claude knows your house style
Scope investigations narrowlySay "investigate the codebase" (floods context)
Iterate one change at a timeBundle ten unrelated asks in one prompt
/clear between unrelated tasksPile new tasks onto stale context
Rewind a failed attemptKeep correcting on top of a broken approach

Delegate research to keep context clean

Unscoped "investigate" requests cause endless exploration that floods your main window. Hand research-heavy work to a subagent that runs in its own fresh context and returns only a short summary.

Use the Explore subagent to find every place we call the
Stripe API, then report back a one-paragraph summary with
file paths. Don't edit anything.
Save reusable plans and decisions to a file like PLAN.md. It survives /compact and /clear, so a fresh session can pick up exactly where you left off.
💬
SECTION 18

Claude.ai: Chat, Projects & Artifacts

Turn the chat box into a coworking studio with persistent knowledge and live artifacts

The Claude.ai app is more than a chat box: Projects give Claude a memory for your work, Artifacts give it a canvas, and styles give it your voice.

Most people just type into the chat and start over every time. Power users wire up Projects, Artifacts, and custom instructions so Claude shows up already knowing the context. This section shows how those pieces snap together into a real coworking setup.

Chat

The conversation. Where you ask, refine, and steer.

Projects

A workspace with its own chat history, knowledge base, and instructions.

Artifacts

A live side window rendering code, docs, and apps next to the chat.

Styles

Reusable voice and formatting rules applied to Claude's output.

Projects: a memory for a body of work

A Project is a self-contained workspace. It keeps its own chat history plus a knowledge base of files you upload once and reuse in every chat inside it. Projects are on every plan, including Free, which is capped at 5 projects.

The two pillars are project knowledge (uploaded docs, text, and code) and project instructions (custom instructions that set tone and role for every chat in that project).

Use one Project per ongoing effort, not per question. A "Network Architecture" project with your diagrams, configs, and a role instruction means every new chat starts already grounded, with no re-pasting.

When knowledge gets big: RAG

On paid plans (Pro, Max, Team, Enterprise), when your knowledge base nears the context limit, Claude automatically switches to Retrieval Augmented Generation. RAG expands capacity up to 10x by pulling only the relevant chunks per question instead of stuffing everything into context.

A claude.ai Project (lives in your account, holds a cloud knowledge base, shareable on Team/Enterprise) is NOT the same as a Cowork project in Claude Desktop (lives only on your computer, points at local folders). You can link a claude.ai Project into a Cowork project without merging them.

Project instructions: set the role once

Project instructions are the single highest-leverage Project setting. They tell Claude who it is and how to answer for the whole project, so you stop repeating yourself.

You are my senior network architecture reviewer.
Audience: enterprise infra teams (Juniper, Arista, Palo Alto).
Always: cite the config file by name, flag security risks first,
and give a one-line summary before the detail.
Never invent interface names — ask if a config is missing.

Styles: control the voice

Styles change how Claude writes, independent of any Project. Open the Search and tools menu and pick Use style. Four presets ship in: Normal Concise Formal Explanatory. Normal and Concise are always visible and can't be hidden.

Custom styles are where it gets useful. You can build one two ways:

  1. Upload or paste a writing sample (pdf, doc, or txt) so Claude matches your voice.
  2. Or pick Describe style for a starting point, then use Use custom instructions (advanced) to give exact rules.

Styles can be renamed, previewed, edited (by Claude or via Set Instructions Manually), and deleted.

Paste a few of your best LinkedIn posts or past emails into a custom style. Claude then drafts in your real tone instead of generic AI prose — far less rewriting on your end.

Artifacts: the coworking canvas

Artifacts are a dedicated side window for standalone content: code, documents, data visualizations, and full apps that render live beside the chat. They're generally available on every plan and on iOS and Android, with their own Artifacts space in the sidebar.

The workflow is conversational. You ask, the artifact appears, you point at what's wrong, and Claude edits it in place — no copy-paste loop.

Live preview

HTML, React, and visualizations render instantly so you see results, not just code.

Iterate in place

"Make the header sticky" edits the same artifact instead of restating everything.

AI-powered apps

Embed Claude into a shareable app via a text-based API — no API keys needed.

MCP + storage

On paid plans, artifacts read/write tools like Slack and persist data.

AI-powered Artifacts

You can build a shareable app with Claude intelligence baked in. It runs on Anthropic's infrastructure, needs no API keys, and — importantly — usage counts against each end-user's own subscription, not yours. So you can share a tool without paying for everyone's calls.

On Pro, Max, Team, and Enterprise (web and desktop), Artifacts also support MCP integration and persistent storage (personal or shared). That lets an artifact read and write to external tools like Asana, Google Calendar, Slack, and custom MCP servers.

Sharing differs by plan. Free and Pro users can publish and remix Artifacts with the community. Team and Enterprise users share Projects (and the Artifacts inside them) in a secure shared workspace.

File uploads and file creation

Drop files straight into a chat for analysis: CSV, TSV, PDFs, images, and more. Multiple files work in one chat and stay downloadable for the whole conversation.

Claude can also create files. On all plans (web, Desktop, Mobile), it generates downloadable .xlsx, .pptx, .docx, and PDF files, plus PNG charts and Python data analysis. Turn it on at Settings > Capabilities > Code execution and file creation.

In a Project, your knowledge-base files are now reachable from Claude's code-execution environment while staying in context, and outputs can save directly to Google Drive. Upload a raw CSV, ask for a cleaned pivot, and get a real .xlsx back.

Web search and Research

Web search must be turned on by you (toggle it in chat settings/tools). Once on, Claude decides when a prompt needs fresh data, searches, and answers with citations to its sources.

Research mode is a paid-only (Pro/Max/Team/Enterprise) agentic step up. It runs many progressive searches across the web and connected internal sources like Gmail, Google Calendar, and Google Docs. It needs web search on, and you enable it from the + button > Research.

Web search is off by default and Claude won't browse without it. If you get a "my knowledge has a cutoff" answer to a current-events question, you forgot to enable the toggle.

Putting it together: a coworking session

The real power is the combination. Here's a typical flow that uses all four surfaces at once.

  1. Create a Project, upload your reference docs, and write a role instruction.
  2. Pick (or build) a custom style so output lands in your voice.
  3. Open a chat, turn on web search, and ask Claude to draft something.
  4. Claude renders the draft as an Artifact you can preview live.
  5. Refine by pointing at the artifact; export it as .docx or .pptx when done.
SurfaceHoldsBest for
ProjectKnowledge + instructionsPersistent context for one effort
ChatThe conversationAsking and steering
ArtifactRendered outputLive code, docs, and apps
StyleVoice rulesConsistent tone everywhere
Treat Project instructions like a tiny CLAUDE.md for the web app: keep them short and high-signal. A focused 5-line instruction beats a wall of text Claude has to wade through every chat.
💰
SECTION 19

Cost & Speed Optimization

Pay Haiku prices for Opus-quality work

The cheapest token is the one you never send — route smart, cache big, and keep context lean.

Two levers control your bill: which model runs the task, and how many tokens it reads each turn. Master both and you cut cost 5–10x without losing quality. The single rule under everything: the context window fills up fast, and quality drops as it fills.

Pick the right model for the job

Opus 4.8 costs 5x more per token than Haiku 4.5. Using Opus to fix a typo is like renting a crane to hang a picture. Match the model to the difficulty, not your mood.

ModelIn / Out (per MTok)SpeedUse it for
Haiku 4.5$1 / $5FastestHigh-volume, simple edits, search, subagents, real-time
Sonnet 4.6$3 / $15FastMost production work: code gen, analysis, tool use
Opus 4.8$5 / $25ModerateHard reasoning, big refactors, long-horizon agentic runs
Anthropic's own rule of thumb: start with Haiku 4.5 and upgrade only when you hit a real capability gap. Don't reach for Opus by default.

Switch the active model mid-session with /model. Use /model default to let Claude auto-route harder turns to Opus for you.

/model haiku      # cheap, fast turns
/model sonnet     # the workhorse
/model opus       # bring out the heavy reasoning
/model default    # let Claude pick per task

Route from the command line

Set the model per run for scripts and one-offs. A headless triage pass on Haiku, then escalate only the hard cases.

# Cheap, scripted lint check on Haiku
cat build.log | claude -p "summarize the errors" --model haiku

# Escalate one tough bug to Opus
claude --model opus "think hard about the race condition in worker.py"
--model overrides both the model setting and the ANTHROPIC_MODEL env var. Add --fallback-model so jobs still run if your first choice is unavailable.

Push cheap, verbose work to subagents

A subagent on Haiku can churn through a noisy codebase search and hand back a short summary — the verbose output never pollutes your main (pricier) context. Set model: haiku in the agent frontmatter.

---
name: log-scanner
description: Use proactively to grep logs and return a short summary
tools: Read, Grep, Glob, Bash
model: haiku
---
Search the logs, extract only the failing requests, and report a 5-line summary.

Prompt caching: pay once for stable context

Caching is automatic in Claude Code — you never add cache_control yourself. The API caches by exact prefix match, so a cache hit costs just 0.1x base input (90% cheaper). The catch: any change anywhere in the prefix recomputes everything after it.

OperationCost vs base input
Cache hit (read)0.1x — 90% off
5-minute cache write1.25x
1-hour cache write2x
Cache killers: switching models mid-task, editing CLAUDE.md mid-session, and /compact. Each invalidates the cache and forces a slow, expensive uncached turn. Do these at boundaries, not in the middle of a flow.
Plan mode is the secret to 90%+ cache hit rates. Front-load all your file reads and constraints once (the stable prefix), then every execution turn reads from that cached prefix. Toggle with Shift+Tab and explore before you edit.

Keep context lean

File reads dominate context, and a bloated window both costs more and degrades quality. Fresh context is free — use it.

/clear between tasks

Wipes history entirely. Use it whenever you switch to unrelated work so old tokens stop riding along on every turn.

/compact at breakpoints

Summarizes history into a compact form. Steer it: /compact focus on the API changes. Do it proactively around 50–75%, not at the auto-trigger.

Scope your @ files

Reference exact files and line ranges, not whole directories. Prefer rg/grep over dumping files into the chat.

/context to audit

Shows a live token breakdown by category so you can see what's eating the window before it bites.

Rewinding beats correcting. EscEsc (or /rewind) erases a failed attempt from context entirely, while correcting leaves the dead-end approach polluting every later turn.

Be specific to avoid re-work

Vague prompts cause endless exploration that floods context and burns tokens. "Investigate the auth bug" can spiral; a scoped request lands in one pass.

# Costly: unscoped, invites infinite exploration
claude "investigate why login is slow"

# Lean: names the file, states the constraint
claude "in src/auth/session.ts, find why validateToken() does a DB call per request; propose a cache"

Batch and reuse work

Stop paying startup costs over and over. Reuse sessions in scripts instead of cold-starting each time, and let the cache stay warm across follow-ups.

# Capture a session id, then continue it cheaply
session_id=$(claude -p "Start a review of the diff" --output-format json | jq -r '.session_id')
claude -p "Now check for type errors" --resume "$session_id"
For the API and async work, the Batch API runs jobs at 50% off (e.g. Sonnet 4.6 drops to $1.50 in / $7.50 out per MTok). Ideal for large, non-interactive jobs you don't need answered in real time.

Trim CLAUDE.md ruthlessly

CLAUDE.md loads on every session and re-loads after every compaction — every line is a recurring tax. For each line ask: "Would removing this cause Claude to make mistakes?" If not, cut it. A bloated file also makes Claude ignore the rules that matter.

Watch the meter

You can't optimize what you don't measure. Check spend and usage as you go.

/cost       # token usage + estimated cost this session
/context    # live token breakdown by category
/mcp        # per-server tool cost — prune noisy servers
Reserve ultrathink for genuinely hard, costly-to-get-wrong decisions (architecture, multi-file debugging). For everyday work, plain think or think hard gives the depth you need without the token bill.

Your cost-saving habits, in order

  1. Default to Haiku or Sonnet; reach for Opus only on hard reasoning.
  2. Plan first (Shift+Tab) to write one stable, cached prefix.
  3. Avoid mid-task model switches and CLAUDE.md edits — they nuke the cache.
  4. /clear between unrelated tasks; /compact at logical breakpoints.
  5. Scope prompts and @ files; rewind failures instead of correcting them.
  6. Reuse sessions in scripts; batch non-urgent jobs at 50% off.
  7. Check /cost and /context to catch waste early.
🔒
SECTION 20

Security Hardening

Give Claude autonomy without giving it the keys to your machine

⚡ WHY IT MATTERSLeast privilege means every tool call must pass a deny-first gate before it can ever touch your system.
Permission gating sits between the model's tool request and executionTool requestmodel proposes a callPermissioncheckdeny rules checked FIRST (deny-first precedence)order: deny → ask → allow · first match winsAllowrun, no promptAskprompt userDenyblock · absoluteA deny at ANY scope overrides an allow at any other scope — enforced by the harness, not the model.
A tool request enters the permission check, where deny rules are evaluated first, then routed to Allow (run), Ask (prompt), or Deny (block).

Claude Code is powerful enough to edit files, run shells, and call external servers — so the question isn't whether to trust it, but how to scope that trust.

Security in Claude Code is enforced by the harness itself, not by the model. A prompt or CLAUDE.md can never widen what Claude is allowed to do — only your settings, permission modes, and hooks can. That separation is the whole game.

Key principle: permission rules are checked by Claude Code before any tool runs. Instructions inside the conversation cannot override them.

The permission model

Every tool call is matched against three rule types. They are evaluated deny → ask → allow, and the first match wins, so a deny always beats an allow at any other level.

RuleEffect
allowRun the tool with no prompt
askPrompt for confirmation each time
denyBlock it outright — always wins

Rules use the format Tool or Tool(specifier). Manage them interactively with /permissions or write them into settings.json.

{
  "permissions": {
    "allow": [
      "Read(src/**)",
      "Bash(npm test)",
      "Bash(npm run lint)"
    ],
    "ask": [
      "Bash(git push:*)"
    ],
    "deny": [
      "Read(.env)",
      "Read(./secrets/**)",
      "Bash(curl:*)"
    ]
  }
}
A bare-name deny like "Bash" removes the tool from Claude entirely. A scoped deny like "Bash(rm *)" only blocks matching calls. Choose the narrower one that fits the risk.

Settings precedence

When rules conflict, this order decides (highest to lowest). A deny at any level cannot be overridden by an allow at any other level.

  1. Managed (admin) settings — not even CLI flags override these
  2. Command-line arguments
  3. Local project settings (.claude/settings.local.json)
  4. Shared project settings (.claude/settings.json)
  5. User settings (~/.claude/settings.json)

Permission modes

Modes set the default posture for unmatched tools. Cycle them with Shift+Tab or set permissions.defaultMode.

default

Reads freely, prompts on first use of anything that writes or runs.

plan

Read-only exploration. Enforced at the tool level — Claude literally cannot edit or run destructive commands.

acceptEdits

Auto-accepts file edits plus filesystem commands (mkdir, touch, rm, mv, cp, sed) in the working dir.

dontAsk

Auto-denies anything not pre-approved. Built for non-interactive CI.

auto

Auto-approves with a background safety classifier. Research preview — treat as experimental.

bypassPermissions

Skips all prompts. Only safe inside an isolated environment.

Start every non-trivial task in plan mode. Let Claude explore and propose a plan with zero write access, then switch out to implement.

The truth about --dangerously-skip-permissions

This flag (and the bypassPermissions mode) removes per-action review completely and skips protected-path checks. The only thing left is a circuit breaker that still prompts on rm -rf / and rm -rf ~ — and it offers no protection against prompt injection.

Never run --dangerously-skip-permissions on your host machine for convenience. It is designed for isolated, no-host-damage environments only — containers, VMs, or dev containers without internet access.

The flag is blocked when running as root or via sudo on Linux/macOS (that check is skipped inside a recognized sandbox). The safer move is almost always a scoped allowlist instead.

# Risky: blanket bypass on your laptop
claude --dangerously-skip-permissions   # don't do this on the host

# Safer: pre-approve exactly the tools you need
claude -p "run the test suite and fix failures" \
  --allowedTools "Read" "Edit" "Bash(npm test)" "Bash(npm run lint)"
Admins can disable the escape hatch entirely with permissions.disableBypassPermissionsMode: "disable" (and disableAutoMode for auto) in managed settings.

Protected paths

In every mode except bypassPermissions, Claude Code refuses to auto-approve edits to sensitive locations. You don't configure these — they're built in.

Protected dirsProtected files
.git, .vscode, .idea, .devcontainer, .husky, .cargo, .claude.bashrc, .zshrc, .profile, .envrc, .npmrc, .gitconfig, .mcp.json, .claude.json

Sandboxing risky autonomy

The sandboxed Bash tool gives OS-level filesystem and network isolation — Seatbelt on macOS, bubblewrap on Linux/WSL2. It covers Bash commands and their children only (not Read, Edit, WebFetch, or MCP). Turn it on with /sandbox or sandbox.enabled: true.

The sandbox is the only thing that enforces path limits for all processes. Read/Edit deny rules only cover Claude's built-in tools and recognized commands like cat or sed — not an arbitrary script that opens a file itself.
{
  "sandbox": {
    "enabled": true,
    "filesystem": {
      "denyRead": ["~/.aws", "~/.ssh", "~/.config/gcloud"],
      "denyWrite": ["~/"]
    },
    "network": {
      "allowedDomains": ["api.anthropic.com", "registry.npmjs.org"]
    }
  }
}

The default read policy still allows credential dirs, so Anthropic recommends adding denyRead for ~/.aws and ~/.ssh yourself. For org-wide enforcement, managed settings can set failIfUnavailable: true (refuse to start without a sandbox) and allowUnsandboxedCommands: false (kill the dangerouslyDisableSandbox escape hatch).

Whole-process isolation

For maximum autonomy, two options wrap more than just Bash:

Dev container

The claude-code repo ships an example .devcontainer/ with a default-deny iptables firewall, a non-root user, and no unapproved egress — a starting point for unattended runs. It's a convention, not a boundary Claude enforces.

sandbox-runtime

@anthropic-ai/sandbox-runtime (beta) wraps the entire process in Seatbelt/bubblewrap so every tool, hook, and MCP server is constrained — denying all write and network by default.

Secret handling

Claude Code stores its own API keys encrypted via credential management, and blocks web-fetching commands like curl and wget by default. Your job is to keep your project's secrets out of reach.

  1. Never hardcode keys — read them from environment variables (ANTHROPIC_API_KEY, etc.).
  2. Add secret files to .gitignore and add Read deny rules for them.
  3. Run a pre-commit secret scan as a hard gate before any commit.
# .gitignore
.env
.env.*
secrets/
*.pem
Deny reads on secret files in settings, but remember: only the sandbox stops an arbitrary script from reading them. For high-autonomy runs, combine deny rules with a sandbox denyRead.

Trusting MCP servers

MCP servers are separate processes that run unconstrained on your host unless the whole Claude Code process is sandboxed. A server you add can read your data and run code — treat adding one like installing a dependency with full machine access.

Only add MCP servers you trust or that Anthropic has reviewed. The allowed-server list is checked into source control so your team can audit it.

First-time codebase runs and new MCP servers require trust verification. Note that verification is disabled in non-interactive -p mode (except --worktree, which still requires accepted trust) — so don't pipe an untrusted repo straight into headless mode.

When you do auto-approve MCP tools, scope them with wildcards rather than reaching for bypassPermissions. Note that acceptEdits does not auto-approve MCP tools at all.

{
  "permissions": {
    "allow": ["mcp__github__*"],
    "deny": ["mcp__github__delete_repository"]
  }
}

Hooks as guardrails

Hooks are your programmable safety net. A PreToolUse hook runs before the permission prompt and can deny a call, force a prompt, or skip it. Across the evaluation flow, hooks run first — even before deny rules — making them the strongest enforcement point you control.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path' | grep -qE '(^|/)(\\.env|secrets/)' && { echo 'Blocked: protected path' >&2; exit 2; } || exit 0"
          }
        ]
      }
    ]
  }
}

An exit code of 2 tells Claude Code to block the call and feed the message back to Claude. Use the same pattern to scan a diff for secrets before a commit, or to protect any path your team treats as off-limits.

A ConfigChange hook can audit or block changes to settings during a session — so Claude can't quietly loosen its own policy mid-task.

Hardening checklist

  1. Default to plan mode for unfamiliar work; switch to edit only after reviewing the plan.
  2. Maintain a least-privilege allow list — specific tools and commands, not blanket grants.
  3. Add deny rules for secret files, credential dirs, and dangerous commands (curl, rm *).
  4. Never run --dangerously-skip-permissions on your host — only inside a container, VM, or dev container.
  5. Enable the sandbox (sandbox.enabled: true) with denyRead for ~/.aws and ~/.ssh on autonomous runs.
  6. Keep secrets in env vars; .gitignore them; gate commits with a secret scan.
  7. Add only trusted MCP servers; scope tool approval with wildcards, never bypassPermissions.
  8. Use PreToolUse hooks to block protected paths and scan diffs before commit.
  9. Run a fresh-context /security-review on the diff before treating work as done.
Layer these. No single control is complete — deny rules, sandbox, hooks, and isolation each cover a gap the others leave open.
💻
SECTION 21

Best Practices: Writing Code

Plan, scope, test, verify — then trust.

Great code from Claude is not luck — it comes from a tight loop: plan first, change small, verify always, commit clean.

The goal is not code that runs. It is code that is correct, readable, and maintainable. Every habit below points at that.

Anthropic's #1 rule: give Claude a check it can run itself (tests, build, linter, a screenshot). If you adopt only one practice, make it verification.

The Core Loop

Use the same four-phase workflow Anthropic recommends. It keeps each change small enough to review and reversible if it goes wrong.

  1. Explore in plan mode — read-only, no edits. Claude reads the code and asks questions.
  2. Get a written plan — ask for a step-by-step approach before any code.
  3. Implement & verify — leave plan mode, build it, run the checks against the plan.
  4. Review the diff & commit — read every change, then commit with a clear message.

1. Ask for a Plan First

Plan mode is enforced at the tool level, not just a polite request. In plan mode Claude can only read, search, and think — it cannot edit files or run destructive commands. Enter it with Shift+Tab to cycle modes, or the /plan command.

# Start a session locked to read-only planning
claude --permission-mode plan

When to skip it: Anthropic's rule of thumb is simple. If you could describe the diff in one sentence (a typo, a log line, a rename), skip the plan. Plan when the approach is uncertain, the change spans multiple files, or the code is unfamiliar.

A plan you can read is a plan you can correct before it costs you a bad diff. Persist big plans to a PLAN.md file so they survive context compaction.

2. Point Claude at the Right Files

Don't make Claude guess where code lives. Reference exact files with @ so it loads only what matters and keeps context lean.

# Mention specific files and a line range
Review @src/auth/login.ts#L40-95 and the test in
@src/auth/login.test.ts. The session cookie isn't
being set on success. Find the root cause.

In the IDE, Option+K (VS Code) or Cmd+Option+K (JetBrains) inserts an @-mention for the current selection with its path and line range automatically.

Asking Claude to "investigate the codebase" with no scope makes it read hundreds of files and flood the context window. Scope it, or delegate broad exploration to a subagent that returns only a summary.

3. Make Small, Testable Changes

The Common Workflows guidance is explicit: do refactoring in small, testable increments, and keep backward compatibility when it matters. Small diffs are easy to read, easy to test, and easy to revert.

Give yourself a safety net with git. Have Claude branch per feature so a bad turn is throwaway.

# Discardable safety net before a risky change
git checkout -b fix/session-cookie

4. Test-Driven Development

Claude defaults to writing implementation first — you have to invert that on purpose. Tell it to write failing tests first, give concrete examples, and run the tests after implementing.

Write failing tests FIRST for isValidEmail(), then implement.
Cases: "user@example.com" -> true, "nope" -> false,
"" -> false, "a@b.c" -> true.
Run the tests after implementing and show me the output.
After Claude fixes a mistake, ask it to update CLAUDE.md with a rule so it won't repeat it. Claude is good at writing rules for itself.

5. Anchor Conventions in CLAUDE.md

CLAUDE.md is read at the start of every conversation, before anything else. It is the most reliable channel for your project's conventions — build commands, test runner, code-style rules, and gotchas. Generate a starter with /init, then trim it.

IncludeExclude
Bash commands Claude can't guessAnything inferable from the code
Non-default code-style rulesStandard language conventions
Test runner & how to run itFull API docs (link instead)
Architecture decisions, env quirksSelf-evident advice
Keep it short — aim under ~200 lines. A bloated CLAUDE.md makes Claude ignore real instructions. If a rule keeps getting ignored, the file is probably too long and the rule is getting lost. For each line ask: "Would removing this cause a mistake?"
# CLAUDE.md (excerpt)
## Commands
- Test:  pytest -q
- Lint:  ruff check . && mypy src
- Build: make build

## Style (YOU MUST)
- Type hints on every function signature
- Immutable data — never mutate in place
- Async/await for all IO

6. Review the Diff Before You Trust It

This is where correctness is won or lost. Read the diff — don't just accept it. Watch for the "trust-then-verify gap": plausible-looking code that misses edge cases.

# Open the interactive diff viewer
/diff

For a second opinion, run the bundled review skills. They review the current diff in a fresh subagent that sees only the change and the criteria — not Claude's own reasoning.

/code-review

Correctness pass on the diff in a clean context. Add --fix to apply findings.

/security-review

Security-focused pass for auth, input handling, and secrets.

/simplify

Cleanup-only review — reuse, simplification, lower altitude. No bug hunting.

7. Run It, Then Commit

Code that compiles is not code that works. Run the build and the test suite, address the root cause of any failure (don't suppress the error), then commit.

pytest -q && ruff check . && git add -A
git commit -m "fix: set session cookie on successful login"
Anthropic's stated rule for the trust gap: "If you can't verify it, don't ship it."

Do / Don't Grid

Do: plan complex work

Read-only plan mode before multi-file or unfamiliar changes.

Don't: one-shot a big feature

Large unscoped diffs are unreviewable and unrevertable.

Do: give a runnable check

Tests, build exit code, linter, or a screenshot close the loop.

Don't: trust on sight

Plausible code hides edge-case bugs. Read the diff, run it.

Do: point with @

Name exact files and line ranges to keep context lean.

Don't: say "investigate"

Unscoped exploration floods context and degrades quality.

Do: /clear between tasks

Reset context for unrelated work; one scope per session.

Don't: bloat CLAUDE.md

Long files bury the rules Claude actually needs.

A Tight Example Loop

Putting it together on a real bug fix — small, verified, committed:

  1. Shift+Tab into plan mode: Bug: @api/cart.py#L20-60 double-charges on retry. Diagnose only.
  2. Approve the plan, then leave plan mode and ask for a failing test that reproduces the double charge.
  3. Let Claude implement the fix and run pytest -q until the new test passes.
  4. Run /code-review on the diff, then /diff and read it yourself.
  5. Run the full suite + linter, then commit: fix: make checkout idempotent on retry.
Keep loops tight. When context fills, run /context to see usage and /compact focus on the API changes to summarize while keeping file paths, decisions, and test commands.
📝
SECTION 22

Best Practices: Docs & Writing

Turn Claude into your fastest, most consistent co-writer

Claude writes its best docs when you hand it a role, an audience, a goal, hard constraints, and a sample of the voice you want.

Vague prompts produce vague prose. The fix is a simple frame you reuse for every writing task: who Claude is, who it's writing for, what success looks like, and the rules it must follow. Add your own source material and a voice sample, and quality jumps immediately.

The five-part writing prompt

Every strong writing request answers five questions. Skip one and Claude guesses, which is where bland output comes from. Keep them explicit and up front.

Role

"You are a staff engineer writing internal docs." Sets vocabulary and depth.

Audience

"For new hires with no Kubernetes background." Controls jargon and assumptions.

Goal

"Reader can deploy the service in 10 minutes." Gives a measurable target.

Constraints

Length, tone, format: "Under 400 words, plain, Markdown with headers."

Source + voice

Paste real notes and a paragraph in the voice you want copied.

Give Claude a voice sample, not a voice adjective. "Match the tone of the paragraph below" beats "write professionally" every time, because Claude can imitate concrete examples far better than abstract labels.

Lock a consistent style

If you write the same kinds of docs often, stop re-pasting style rules. Save them once so every session starts from your house style.

In the Claude app: Projects + custom instructions

Create a Project, drop your style guide and example docs into its knowledge, and put the rules in the Project's custom instructions. Every chat in that Project inherits them.

In Claude Code: CLAUDE.md

CLAUDE.md is read at the start of every session, so it is the most reliable place for writing conventions. Keep it short; bloated files get ignored.

# Writing style
- Audience: external developers integrating our API.
- Voice: direct, second person, present tense. No marketing fluff.
- Format: Markdown. H2/H3 headings. Code in fenced blocks with language tags.
- IMPORTANT: every code sample must be copy-pasteable and runnable as-is.
- Sentences under 25 words. Define an acronym on first use.
For style rules that only apply to docs (not code), use a path-scoped rule in .claude/rules/ with a paths: ["docs/**"] frontmatter field so it loads only when you touch doc files, keeping your main context lean.

Work section by section, not all at once

Long documents drift when generated in one shot. Ask for an outline first, approve it, then fill one section at a time. You catch structure problems before any prose is written.

  1. Ask for an outline only: "Draft a section outline for this README. No prose yet."
  2. Edit the outline directly and paste it back as the agreed structure.
  3. Request one section at a time, feeding the relevant source for each.
  4. After the draft, run a self-critique pass before you accept it.

Make Claude critique its own draft

A fresh-eyes review catches what the drafting pass missed. Tell Claude exactly what to look for so it doesn't just rewrite for the sake of it.

Review the draft above as a skeptical editor. Flag ONLY:
1. Claims that need a source or are unverifiable
2. Sentences over 25 words or with passive voice
3. Steps a beginner could not follow
4. Anything off-tone vs. the voice sample
List issues as a numbered table, then give a revised version.
A reviewer told to "find problems" will almost always report some, even in clean text. Scope the critique to issues that genuinely affect clarity or correctness so you don't burn cycles polishing what is already fine.

Generate tables, diagrams, and structured output

Claude is strong at turning messy notes into clean structure. Ask for the exact format you need.

Tables and comparison grids

Turn these release notes into a Markdown table with columns:
Feature | What changed | Action needed | Breaking?
Sort breaking changes to the top.

Diagrams as code (Mermaid)

Ask for Mermaid and you get a diagram that renders in GitHub, many docs tools, and Markdown viewers, and stays editable as text.

Draw the auth flow as a Mermaid sequenceDiagram:
user -> web app -> auth service -> token -> protected API.
Return only the fenced ```mermaid block.

Strict structured output via the API

When another program consumes Claude's output, enforce a schema. Structured outputs are GA on Claude API via output_config.format (no beta header) on recent models.

output_config={
  "format": {
    "type": "json_schema",
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "summary": {"type": "string"},
        "tags": {"type": "array", "items": {"type": "string"}}
      },
      "required": ["title", "summary", "tags"]
    }
  }
}

Artifacts: draft documents live

In the Claude app, ask for an Artifact when you want an editable, self-contained document you can iterate on in a side panel instead of scrolling chat. Great for a README, a spec, or an email you'll keep refining.

Iterate by pointing, not by re-pasting. Once a doc lives in an Artifact, say "tighten the intro" or "add an error-handling section after step 3" and Claude edits in place, keeping the rest untouched.

Before / after: a real prompt upgrade

Same task, two prompts. The second wins because it supplies role, audience, goal, constraints, and a voice anchor.

Before vague

Write a README for my CLI tool.

After framed

You are a developer-experience writer. Audience: developers who
found this tool on GitHub and have 2 minutes to decide if it fits.
Goal: they can install and run their first command without leaving
the page.

Constraints: Markdown; under 350 words; sections = What it does /
Install / Quickstart / Config. Tone: terse and friendly, like the
sample below. Every command must be copy-pasteable.

Voice sample: "No config, no daemon. Point it at a folder and go."

Source (use only these facts):
- Installs via: npm i -g foldermap
- First command: foldermap ./src
- Reads a .foldermaprc for ignore globs

Output the outline first and wait for my OK before drafting.

Quality checklist before you ship a doc

CheckWhat good looks like
Audience fitNo undefined jargon; assumptions match the reader's level.
Goal metA reader could actually do the thing the doc promises.
Every claim sourcedFacts come from your material, not invented by the model.
Commands runYou pasted and executed each snippet; none are guesses.
Voice consistentReads like one author and matches your sample.
Length honoredWithin the word/section limits you set.
Structure scannableClear headings, short paragraphs, tables where useful.
Always verify the facts yourself. Claude can write a confident, well-structured doc around a wrong detail. If a claim or command can't be checked against your source, cut it or confirm it before publishing.
🗂️
SECTION 23

Organizing Your Workflow

Run a system, not a pile of prompts

⚡ WHY IT MATTERSMatch each need to its right Claude extension surface and you stop forcing one mechanism to do another's job.
Need to addcapability?Conventions / rulespersistent behaviorCLAUDE.mdReusable know-howloaded on demandSkillFocused heavy taskisolated contextSubagentExternal tool / dataconnect a serviceMCPAuto guardraildeterministic gateHookShare a bundledistribute togetherPluginBuild a productprogrammatic loopAPI / SDK
A decision map fanning from 'Need to add capability?' to the seven Claude surfaces, each paired with the situation it fits.

Power users don't write better prompts — they run a better system around the prompts.

One goal per thread. Break it into a task list, sequence the steps, and feed Claude only the context that step needs. Everything else in this section is in service of that one idea.

Orchestrate the goal, not the prompt

For anything with 3+ distinct steps, Claude Code already tracks a todo list for you. As of Claude Code v2.1.142 it uses structured Task tools (TaskCreate, TaskUpdate, TaskList) instead of the old TodoWrite. Your job is to set the scope and let the list keep the work honest.

Keep one thread per goal. When you switch goals, switch threads (or /clear). Mixing two goals in one context is the fastest way to make Claude forget the first one.

Anthropic's recommended rhythm for non-trivial work is four phases: explore, plan, implement, commit. Stay read-only while you explore so nothing changes before you have a plan.

  1. Explore in plan mode (read-only) — press Shift+Tab or run /plan.
  2. Plan — ask for a detailed, file-by-file implementation plan.
  3. Implement — leave plan mode and build, verifying against the plan.
  4. Commit — descriptive message, then open a PR.
Skip the plan for tiny changes. Anthropic's rule: “If you could describe the diff in one sentence, skip the plan.”

Durable context: where memory lives

A focused thread is short-lived. Durable context is what survives a /clear or a new session. You have three layers — use the right one.

Projects (Claude app)

Attach files and a custom instruction set to a workspace so every chat in that Project starts with the same background. Best for non-code work — research, writing, planning.

CLAUDE.md (Claude Code)

Read at the start of every session, before anything else. The most reliable channel for project conventions. Generate a starter with /init, then trim it.

Auto memory (MEMORY.md)

Claude writes notes to itself across sessions. The first 200 lines or 25KB load at startup — anything past that is ignored. Toggle with /memory.

CLAUDE.md lives in a few places, and they stack. Use /memory to see which files are actually loaded.

~/.claude/CLAUDE.md      # applies to every session, all projects
./CLAUDE.md              # project root, checked into git (team shares it)
./.claude/CLAUDE.md      # project, alternate location
./CLAUDE.local.md        # personal, gitignored
src/api/CLAUDE.md        # per-directory, loads on demand when you touch that dir
Keep CLAUDE.md short — target under 200 lines. A bloated file makes Claude ignore the real instructions. For each line ask: “Would removing this cause Claude to make a mistake?” If not, cut it.
After Claude makes a mistake you corrected, ask it to add a rule to CLAUDE.md so it won't repeat it. Claude is good at writing rules for itself — and use YOU MUST or IMPORTANT on rules that keep getting ignored.

Manage context deliberately

The context window is the resource that matters most. Performance drops as it fills, and Claude starts forgetting early instructions. Treat free context like a budget you spend on purpose.

/context          # see what is eating your context window
/mcp              # per-server token cost (MCP schemas are deferred until used)
/clear            # full reset between unrelated tasks
/compact          # summarize history, preserving key decisions
/compact focus on the auth refactor and the failing tests   # targeted compact

For long single tasks Claude Code auto-compacts near the limit — it drops old tool output first, then summarizes while keeping code, file paths, and decisions. You can also bias what survives with a Compact Instructions section in CLAUDE.md.

Pull in only the files a step needs with scoped @ mentions: @src/auth.ts or @app.ts#5-10. Don't ask Claude to “investigate” with no scope — it will read hundreds of files and flood your context.

For broad exploration, delegate to a subagent. It runs in its own fresh context window and returns only a summary, so the research, logs, and doc-reads never touch your main thread.

Which mechanism do I reach for?

This is the question power users get wrong most often. Match the job to the lightest tool that does it.

Reach for…When you want…Loads
CLAUDE.mdProject conventions Claude needs every session (build cmd, test runner, style rules it can't guess)Always, at startup
SkillDeep how-to for a specific task, only when relevant (e.g. “generate a PDF”)On demand, by name/description
Path-scoped ruleInstructions tied to certain files (.claude/rules/ with a paths frontmatter)Only when matching files are touched
SubagentIsolate noisy work (research, review, tests) so the main context stays cleanSpawned per task, returns a summary
MCP serverConnect Claude to an external system (GitHub, a DB, Notion, browser)Tool schemas deferred until used
HookDeterministic action on every event (auto-format on save, block a command)The harness runs it — not the model
PluginBundle and share skills, commands, agents, and MCP config as one unitInstalled once, brings its pieces
API / Agent SDKRun the agent loop in your own code, headless, on your infrastructureYou build and deploy it
Rule of thumb: if it should fire automatically and reliably, it's a hook — instructions in CLAUDE.md are advice the model may skip, hooks are enforced by the harness.

Patterns for repeatable work

Anything you do more than twice should become a command. Custom slash commands are just markdown files — drop one in .claude/commands/ and it becomes /name.

# .claude/commands/ship.md
Review the current diff with /code-review, fix any CRITICAL
or HIGH issues, run the test suite, then write a conventional
commit message and open a PR. Verify the build passes first.

Define reusable subagents the same way, in .claude/agents/, with frontmatter to route cheap work to a faster model and scope its tools.

---
description: Reviews diffs for correctness. Use proactively after edits.
model: haiku                # route cheap, frequent work to Haiku 4.5
tools: Read, Grep, Bash
isolation: worktree         # run in its own git worktree
---
You are an adversarial reviewer. You see only the diff and the
acceptance criteria. Flag ONLY gaps that affect correctness.
For a big change, run /batch to split it into 5–30 independent units, each in its own git worktree — then 3–5 sessions work in parallel without colliding. Persist the plan to PLAN.md so it survives any compaction.

An operating rhythm you can adopt

Put it together into a loop you run for every meaningful goal.

  1. Frame — one goal, fresh thread. /clear if reusing a session.
  2. Explore + plan — plan mode, then a written plan saved to a file.
  3. Scope context@ only the files in play; delegate wide research to a subagent.
  4. Implement in increments — small, testable steps; give Claude a check it can run.
  5. Verify — run /code-review in a fresh context before calling it done.
  6. Capture — commit, and fold any new lesson into CLAUDE.md.
  7. Reset/compact for a long task, /clear before the next goal.
Watch the “trust-then-verify gap”: a plausible-looking implementation that misses edge cases. Anthropic's fix is blunt — “If you can't verify it, don't ship it.” Verification is their #1 practice; if you adopt only one habit, make it this.
🎤
SECTION 24

From the Source: Boris Cherny

A complete, slide-by-slide walkthrough of Boris Cherny's Claude Code masterclass.

A full walkthrough of "Mastering Claude Code in 30 Minutes" by Boris Cherny — the creator of Claude Code — turned into slide-by-slide notes you can act on.

Source: Mastering Claude Code in 30 Minutes — Boris Cherny (Code w/ Claude, Anthropic). Quotes paraphrased.

From the Source: Boris Cherny on Claude Code

This section distills a talk by Boris Cherny — a member of technical staff at Anthropic and the creator of Claude Code — titled "Mastering Claude Code in 30 Minutes." Everything here comes straight from his words at Code w/ Claude.

Boris built Claude Code. So when he describes how it's meant to be used, he isn't guessing at the design intent — he set it.

Slide 1 — What Claude Code Is

A new kind of AI assistant

Past coding assistants completed a line, or a few lines, at a time — autocomplete on steroids. Claude Code is different: it is fully agentic. It is meant to build whole features, write entire functions and files, and fix entire bugs in one go.

Old way: line completion

Suggests the next line or two as you type. You stay the driver for every keystroke.

Claude Code: fully agentic

You describe the outcome — "build this feature," "fix this bug" — and it figures out the steps end to end.

Works with ALL your tools — no workflow change

A core design goal: you don't change how you work to use it. Claude Code slots into your existing setup instead of replacing it.

DimensionWhat's supported
IDEs / editorsVS Code, Xcode, JetBrains, Vim/Emacs
TerminalEvery terminal — it's terminal-native
Where it runsLocally, over remote SSH, inside tmux
ScopeGeneral purpose — not tied to one language or stack

A power tool, intentionally un-opinionated

When you open Claude Code, you just see a prompt bar — nothing else. That minimalism is deliberate.

It's a power tool. Anthropic intentionally does not push you toward one "blessed" workflow. The blank prompt is an invitation to use it however your work demands.

Slide 2 — Recommended First-Time Setup

Boris recommends a short setup pass the first time you run Claude Code. Each step is a one-time investment that makes everything afterward smoother.

  1. Run /terminal-setup to fix multi-line input.
  2. Run /theme to pick a theme that's comfortable for your eyes.
  3. Run /install-github-app to bring Claude into your GitHub flow.
  4. Customize your allowed tools so you stop getting prompted for routine actions.
  5. (macOS) Turn on Dictation so you can speak your prompts.

1. /terminal-setup — better multi-line input

By default, adding a new line in a terminal prompt is awkward. This command wires up Shift+Enter for new lines instead of forcing you to type trailing backslashes.

/terminal-setup
# now: Shift+Enter = newline (no more \ at line ends)

2. /theme — light, dark, and colorblind-friendly

Set the theme to match your environment and accessibility needs, including a Daltonize option for colorblind users.

/theme
# choose: light  |  dark  |  Daltonize (colorblind)

3. /install-github-app — @mention Claude on GitHub

This installs the new GitHub app. Once it's set up, you can @mention Claude on any GitHub issue or pull request and have it respond right there in the GitHub UI.

/install-github-app
# then on any issue/PR:  @claude please review this and suggest a fix

4. Customize your allowed tools

Claude Code asks for approval before taking certain actions. If there's something you get prompted about constantly, allow-list it once so you're not interrupted every time.

Boris does this for the things he's prompted about a lot. The goal is to remove repeated friction — not to blindly approve everything. Allow what you trust; keep approvals on what changes system state.

5. Pro tip — talk to Claude Code (macOS Dictation)

Enable Dictation in System Settings > Accessibility > Dictation, then hit the dictation key twice and just speak your prompt out loud.

Talk to Claude Code like it's another engineer on your team. Specific spoken prompts work great, and dictating saves a ton of typing — you can describe a whole task faster than you can type it.

Slide 3 — Tip 1: Start with Codebase Q&A

This is Boris's number-one recommendation and the single best way to begin with Claude Code. Before you ask it to write anything, just point it at your repo and ask questions. It is also exactly what Anthropic teaches every new hire on day one of technical onboarding.

"Just ask questions about your codebase." Anthropic teaches this to new engineers on day one: download Claude Code, set it up, then start asking about the code.

The onboarding result

Codebase Q&A collapsed the time it takes a new engineer to become productive at Anthropic.

OnboardingBefore Claude CodeAfter Claude Code
Time to productive~2–3 weeks~2–3 days

Zero setup — and your code stays yours

There is nothing to configure before you can ask questions. Unlike tools that build a search index or upload your repo to a server, Claude Code reads the code locally, in place, on demand.

No indexing

No index to build, no embeddings, no wait. You point it at the repo and ask immediately.

Stays local

Code is NOT uploaded to a remote database. It stays on your machine and you stay in control.

No training

Anthropic does NOT train generative models on your code.

Zero setup

No remote service, no indices to maintain — just open the prompt and ask.

Deeper than text search

Plain text search (grep / Ctrl-F) only finds literal string matches. Claude Code goes a level deeper — it actually understands the code, so it can answer relational questions and surface real examples, at the quality of a wiki or internal docs.

> How is the AuthSession class instantiated and used?

Instead of listing every line containing AuthSession, it finds the real call sites and shows you how the class is actually used in practice.

Ask about git history

This is where it shines. The model knows git — it was never system-prompted to do this; "the model is awesome" and reaches for git on its own.

> Why does this function have 15 arguments named this weird way?

It looks through the git history to find who introduced those arguments, what issues the commits link to, and summarizes the whole story back to you.

Ask about GitHub issues

Using web fetch, Claude Code can read a linked GitHub issue and pull that context into its answer — connecting the code in front of you to the discussion behind it.

The Monday standup trick

Boris asks this every Monday before standup. It reads the git log, already knows his username, and produces a readout he pastes straight into a doc.

> What did I ship this week?
Codebase Q&A is also the best way to learn how to prompt. It teaches you the boundary of what Claude can do in one shot, two shots, or three shots — versus what needs an interactive back-and-forth (REPL) session.

Slide 4 — Tip 2: Editing Code with a Small Toolset

Claude Code is fully agentic. Agentic means: you give the model tools, and it figures out how to use them to accomplish your goal — you don't micromanage the steps.

The whole toolset is tiny

There are only a handful of core tools. The power comes from how Claude strings them together to explore, brainstorm, and then edit.

Edit files

Make changes to your source files.

Run bash

Execute commands in your shell.

Search files

Find and read the relevant code.

You never name a specific tool in your prompt. You just say what you want done — Claude decides which tools to use and in what order to get there.
> Add rate limiting to the /login endpoint and update the tests.

From that one sentence, Claude will search the codebase to find the endpoint, read the surrounding code, edit the handler, and run the tests — chaining its small toolset together on its own.

Slide 5 — Brainstorm & Plan Before Big Features

The most common failure mode: handing Claude an enormous task cold. People ask it to "implement this 3,000-line feature" in one go. Sometimes it nails it — but sometimes it builds the wrong thing, because it guessed at decisions you never spelled out.

Don't one-shot huge features blind. If Claude has to guess at the design, you may get 3,000 lines of the wrong thing.

The easiest fix: just ask it to think first

You do not need plan mode or any special tool. Plain language in your prompt is enough.

> Brainstorm a few approaches for this feature, make a plan,
  run it by me, and ask for my approval before you write any code.
"You don't need plan mode or special tools — just ask." Telling Claude to brainstorm, plan, and check in with you before coding fixes the wrong-thing problem almost every time.
  1. Explore the relevant code
  2. Brainstorm a few options
  3. Write a plan
  4. Get your approval
  5. Then — and only then — write the code

Slide 6 — The "Commit, Push, PR" Incantation

Once a change is done, one short phrase hands the whole git wrap-up to Claude.

> commit, push, and open a PR

From that, Claude makes a commit, creates a branch, pushes it, and opens a pull request — the full ship sequence.

It matches your conventions automatically

Claude reads your code, your history, and your git log to figure out how your team writes commit messages — and then matches that format. This is not system-prompted; the model works it out from context.

Commit

Stages and commits the change.

Branch + push

Creates a branch and pushes it up.

Open PR

Opens the pull request for review.

Match style

Reads git log to mirror your commit-message format.

Two short incantations cover most everyday work: "brainstorm a plan and check with me before coding," then once it's done — "commit, push, PR." No special modes, no tool names. Just plain English.

Slide 7 — Plug in your team's tools: bash/CLI and MCP

Claude Code is agentic: you give it tools, it figures out how to use them. Tip 3 in the talk is about extending its tiny built-in toolset (edit files, run bash, search files) with the tools your team already lives in. There are two kinds.

1. Bash / CLI tools

Tell Claude to use a command-line tool. The trick: point it at --help so it learns the tool on the spot, then it starts using it correctly.

2. MCP tools

Add an MCP server, tell Claude how to use it, and it begins calling it. Extremely powerful on a new codebase — hand Claude every tool your team already uses.

Teaching Claude a CLI

You don't pre-program specific tools — you just describe what you want and let Claude explore. The "made-up Barley CLI" example: introduce a tool by name and let it read its own help text.

> We use the `barley` CLI for deploys. Run `barley --help` to
  learn it, then deploy the staging environment.
If you find yourself re-explaining the same CLI every session, put it in CLAUDE.md so it persists. One-off discovery via --help is great; repeated use belongs in shared context.

Adding an MCP server

MCP (Model Context Protocol) lets Claude talk to external systems — issue trackers, design tools, browsers, internal services. Boris's headline example: a Puppeteer MCP server that lets Claude drive a real browser and screenshot the result.

# Register an MCP server, then just ask Claude to use it
claude mcp add puppeteer -- npx -y @modelcontextprotocol/server-puppeteer

> Use the Puppeteer server to open localhost:3000 and screenshot it.
"Extremely powerful on a new codebase — give it all the tools your team already uses." Plugging in CLIs + MCP is how Claude inherits your team's muscle memory.

Slide 8 — Workflows: explore → plan → confirm, then let Claude CHECK its work

The most common effective workflow is simple: let Claude explore a bit, plan a bit, and ask for confirmation before it writes code. You don't need a special mode — just ask.

> Before you write any code: brainstorm a few approaches,
  make a plan, run it by me, and wait for my approval.
People sometimes ask for an "enormous 3,000-line feature" in one shot. Sometimes it nails it — sometimes it builds the wrong thing. The cheapest insurance is to make it think and confirm first.

The big one: give Claude a way to check its own work

This is the most powerful pattern in the talk. When Claude has a feedback tool to verify what it built, it can iterate by itself — running, looking, fixing, re-running — until the result is near-perfect. No human in the loop between passes.

Unit / integration tests

Claude writes tests, runs them, reads failures, and fixes the code until they pass.

Puppeteer screenshots

It screenshots a web UI in a headless browser, compares to the target, and adjusts.

iOS simulator screenshots

It captures the running app from the simulator and self-corrects layout/visuals.

  1. Give Claude a mock and say "build this web UI."
  2. It gets close on the first pass.
  3. Give it a check (tests / screenshot tool) and let it iterate 2–3 times.
  4. It converges to near-perfect on its own.
The whole trick: pair a build instruction with a way to verify the result. With a feedback loop, Claude's 2nd and 3rd attempts are dramatically better than its first.

Slide 9 — CLAUDE.md: the simplest way to give Claude context

More context = smarter decisions. As an engineer you carry tons of unwritten context; CLAUDE.md is how you hand some of that to Claude. It's a special filename in the project root (the same directory you start Claude in), and it's auto-read into context at the start of every session — it rides along on your first user turn.

Local vs checked-in

Project CLAUDE.md (checked in)

Commit it, share it with the team. Write it once and everyone benefits — the network effect.

Local CLAUDE.md (not checked in)

Just for you. Personal notes and preferences you don't want in source control.

What to put in it

Common bash commands

Build, test, lint, run — the commands you'd tell a new hire.

Common MCP tools

Which servers exist and how the team uses them.

Architecture decisions

Key choices and conventions worth knowing up front.

Important / core files

Where the load-bearing code lives.

Style guide

Formatting and code conventions to follow.

Anything onboarding-relevant

What you'd need to know to work in this codebase.

Keep it SHORT. A bloated CLAUDE.md burns context on every session and is actually less useful — concise beats comprehensive.

Nested CLAUDE.md

You can drop a CLAUDE.md into child directories. Unlike the root file (always loaded), nested ones are pulled in on demand — only when Claude is working in that subtree. Great for module-specific guidance without bloating the global context.

repo/
├── CLAUDE.md            # root — always loaded
├── frontend/
│   └── CLAUDE.md        # loaded when Claude works in frontend/
└── services/payments/
    └── CLAUDE.md        # loaded on demand for payments work
There's also an enterprise-level root CLAUDE.md shared across all codebases and users — org-wide context that follows every engineer.

Slide 10 — The context & config hierarchy

It's not just CLAUDE.md — everything in Claude Code is hierarchical. Config flows from your repo, up to your personal global settings, up to org-wide enterprise policy. This applies to CLAUDE.md, slash commands, permissions, and MCP servers alike.

LevelScopeGoverns
Project Specific to one git repo. Check it in to share, or keep it personal/local. Project CLAUDE.md · repo slash commands · .mcp.json · repo permissions
Global (user) Applies across all of your projects on this machine. Your personal CLAUDE.md · your slash commands · your permissions · your MCP servers
Enterprise Global config rolled out to all employees, org-wide. Mandated CLAUDE.md · shared slash commands · auto-approved and blocked commands/URLs · org MCP servers

Permissions: auto-approve AND block, org-wide

Permissions work in both directions at the enterprise level:

Auto-approve

Check a safe command (e.g. your test runner) into enterprise policy so it's auto-approved for every employee — no per-prompt clicking.

Block (hard deny)

Block a command or a URL org-wide — e.g. a URL that should never be fetched. Employees can't override it, so it can never be fetched.

.mcp.json — ship MCP servers with the repo

Check a .mcp.json into the repository and anyone who runs Claude Code there is prompted to install those MCP servers automatically.

// .mcp.json (checked into the repo)
{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
    }
  }
}
Real example from Anthropic: their apps repo ships a Puppeteer MCP server via checked-in .mcp.json, so every engineer can drive end-to-end tests and screenshot/iterate without installing anything themselves.

Slide 11 — /memory and the # shortcut

Because memory is hierarchical, you need a way to see and edit it. Two tools do that.

/memory

Shows every memory file currently being pulled into context — enterprise policy, your user memory, the project CLAUDE.md, and any nested CLAUDE.md — and lets you open and edit a specific one.

Type #

The fastest way to remember something. Type #, write the note, and pick which memory file it lands in. It's incorporated immediately.

/memory            # list & edit all active memory files

# always run the test suite with `pnpm test --silent`
   ↑ press '#', type the note, choose the destination file
A common use: Claude keeps using a tool the wrong way. Hit #, correct it, and the fix is written into CLAUDE.md right away — so it sticks for future sessions instead of being re-explained each time.

Slide 12 — Recommendation: start with shared project context

If you're unsure where to begin in the hierarchy, Boris's answer is clear: shared project context. Write your CLAUDE.md, slash commands, permissions, and .mcp.json once, check them into the repo, and share them with the team.

"Write it once, share it with the team." Shared project context has a network effect — every engineer who clones the repo inherits the same commands, tools, conventions, and guardrails, and the whole team's Claude gets smarter together.
If you want…Start here
Maximum payoff for the least effortChecked-in project CLAUDE.md
Team tools available to everyoneChecked-in .mcp.json
Consistent, safe commands across the teamRepo (then enterprise) permissions
Personal-only tweaksLocal CLAUDE.md / global config

Slide 13 — Key Bindings: The Hidden Keyboard Layer

Boris flags these because most people never discover them — Claude Code looks like a single prompt bar, but a handful of keystrokes unlock its real power. Each one is about steering Claude mid-flight or feeding it exactly the context it needs.

KeyWhat it doesWhen Boris uses it
Shift+TabAccept edits / cycle into auto-accept-edits mode — file edits are auto-applied, but bash commands still need approvalWhen Claude is clearly on the right track or iterating on tests; you can always ask Claude to undo later
#Make Claude remember something — it's incorporated into CLAUDE.md immediately, and you pick which memory file it goes to"It's not using a tool correctly" — teach it once so the fix persists across sessions
!Drop into bash mode and run a command locally — the command and its output both go into the context windowLong-running commands, or anything you want Claude to "see" without it running the command itself
@Mention files and folders to pull them into contextPointing Claude at the exact files/dirs that matter
EscStop whatever Claude is doing — anytime, safely; it won't corrupt the sessionClaude suggested a 20-line edit, 19 lines are perfect — stop it, correct the one line, redo
Esc EscJump back through conversation historyRewinding to an earlier point in the session
Ctrl+RSee the full output — the exact same thing Claude sees in its context windowInspecting truncated output / understanding what the model is actually working from

Resuming work across sessions

You don't lose a session when you close the terminal. Two startup flags bring you right back.

# Pick a previous session to resume
claude --resume

# Continue the most recent session
claude --continue
The Esc key is the unsung hero. Because stopping is always safe, you can interrupt Claude freely instead of letting a near-perfect edit go wrong — interrupt, give one line of correction, and let it redo.
The difference between ! bash mode and just asking Claude to run a command: with ! you run it locally and the command + output land in context — perfect for long-running commands or output you want Claude to reason over.

Slide 14 — The SDK Is Already On Your Machine

There's no separate package to install. The -p / --print flag is the Claude Code SDK. Pass a prompt, get a result back, and exit — non-interactive, scriptable, composable.

Boris's framing: think of claude -p as a super-intelligent Unix utility. Give it a prompt, get JSON back. Pipe in, pipe out — exactly like grep, jq, or any other tool in your pipeline.

The shape of a call

You control the prompt, which tools Claude is allowed to use (including specific bash commands), and the output format — plain text, a JSON blob, or a stream of JSON you can process as it arrives.

# Basic one-shot
claude -p "summarize the failing tests"

# Constrain the toolset (allow-list specific tools/commands)
claude -p "triage this build" --allowedTools "Bash(git log:*)" "Read"

# Structured output for programmatic consumption
claude -p "list the changed files and why" --output-format json

# Streaming JSON to process incrementally
claude -p "analyze this log" --output-format stream-json

Piping: in and out

Because it reads stdin and writes stdout, claude -p slots into any shell pipeline.

# Feed it command output, parse the result with jq
git status | claude -p "which of these files are risky to commit?" --output-format json | jq '.result'

# Pull a giant log from a GCP bucket and find what's interesting
gsutil cat gs://my-bucket/prod.log | claude -p "what looks anomalous here?"

# Fetch data from the Sentry CLI and let Claude act on it
sentry-cli issues list | claude -p "group these by likely root cause"

CI

Run Claude as a step in your continuous-integration pipeline — review diffs, summarize failures, gate merges.

Incident response

Pipe logs, traces, or Sentry output straight in so Claude can triage and recommend actions during an outage.

Pipelines

Drop it anywhere data flows — read a giant log, find what matters, hand the result to the next stage.

A colleague gave a dedicated deep-dive on the SDK right after Boris's talk. His own verdict on the SDK's potential: "We've barely scratched the surface."

Slide 15 — Advanced Parallelism: Many Claudes at Once

Boris calls himself a "Claude normie" — one Claude at a time, a few terminal tabs for a few repos. But the power users he sees, inside and outside Anthropic, almost always run many Claudes in parallel.

How the power users scale up

tmux + SSH

SSH sessions and tmux tunnels into running Claude sessions, so multiple agents persist and survive disconnects.

Multiple checkouts

Several separate checkouts of the same repo, each running its own Claude, so many tasks progress at once.

Git worktrees

Use git worktree for isolation — each Claude works in its own branch/directory without stepping on the others.

# Spin up an isolated worktree so a parallel Claude can't collide with your main checkout
git worktree add ../feature-x -b feature-x
# ...then start Claude inside that directory
The unifying idea is isolation: give each parallel Claude its own workspace (a separate checkout or a worktree) so they don't corrupt each other's state. Anthropic is actively making this easier — the goal is to run as many sessions as you want and get a lot done in parallel.

Slide 16 — Q&A Gems

The hardest thing to build: safe bash

Boris named making bash commands safe as the single hardest part of building Claude Code. Bash is inherently dangerous — it changes system state in unexpected ways — yet approving every command by hand kills productivity, and not everyone runs inside Docker.

  1. Read-only detection — some commands are recognized as read-only and treated as low risk.
  2. Static analysis — figures out which commands can be safely combined (e.g. chained commands) before running them.
  3. Tiered permissions — a layered allow-list / block-list system that operates at different levels (project, global, enterprise).

Fully multimodal from day one

Claude Code has always been multimodal — it's just hard to discover in a terminal. There are three ways to give it an image.

Drag & drop

Drag an image straight onto the terminal.

File path

Give it a path to an image file.

Copy-paste

Paste an image directly into the prompt.

Boris's favorite use: implementing from mocks. Drag in the mock, tell Claude to build it, and give it a Puppeteer server so it can screenshot its own work and iterate toward near-perfect.

Why a CLI and not an IDE?

At Anthropic people use a huge range of editors — VS Code, Zed, Xcode, Vim, Emacs. The terminal is the common denominator that works for everyone. And given how fast the model is improving, there's a real chance people won't be using IDEs the same way by year's end — so they deliberately avoid over-investing in UI layers that may not matter.

How much does Anthropic itself use it?

Roughly 80% of technical staff at Anthropic use Claude Code every day — including researchers, who use it for things like the notebook tool to edit and run notebooks.
Boris's throughline: don't over-prompt. Give Claude good context (CLAUDE.md), the right tools, and a way to check its own work — then let the model cook.

▶︎ Watch the full talk: youtube.com/watch?v=B_KAEqiC-0Q

🛟
SECTION 25

Troubleshooting & Pro Tips

Recover fast, then level up

Most Claude Code problems are context problems in disguise — know the recovery move and you stay fast.

Performance degrades as the context window fills, so almost every fix below is about protecting that one resource. Learn the symptom, the cause, and the keystroke that resets it.

The fast triage table

SymptomLikely causeFix
Replies slow, vague, or “forgetting” earlier factsContext window near full/compact at a breakpoint, or /clear for a new task
Output drifts off the goalGoal buried under noise / failed attemptsRe-state the goal, switch to plan mode (Shift+Tab)
Same fix fails twice in a rowFailed approaches polluting context/rewind to erase, or /clear and re-prompt
Constant permission promptsNo allow rules / wrong mode/permissions add allow rules; set --permission-mode acceptEdits
“File is outside allowed directories”Path not in working dirs/add-dir <path> or restart with --add-dir
MCP tools missing / server redNot connected, bad transport, or auth/mcp to check status and re-auth
Edited .mcp.json but nothing changedConfig read only at startupRestart the session
Installation acting weirdBroken setup / stale state/doctor to diagnose health

When context is full

Two moves, two situations. Use /clear when you are starting something unrelated — fresh context is free and gives the cleanest results. Use /compact when you need to keep working on the same thread but reclaim room.

/clear                       # wipe history, start a fresh task
/compact                     # summarize history, keep key facts
/compact focus on the API changes   # steer what the summary keeps
Compact proactively at logical breakpoints (around 50–75% full) instead of waiting for auto-compaction near ~95%. You control what survives.
/context shows a live token breakdown by category so you can see exactly what is eating your window before deciding to clear or compact.

Re-anchor after a reset

Project-root CLAUDE.md and MEMORY.md are re-injected from disk after a compact, but skill descriptions and path-scoped rules are lost until re-triggered. Persist plans to a file so they survive any reset.

echo "Goal: migrate auth to OAuth. Constraints: no schema change." > PLAN.md
# after /clear or /compact:
# "Read PLAN.md and continue from step 3."

When output drifts

Drift means the model lost the thread, not that it got dumber. First, re-state the goal in one specific sentence with the file names involved. Second, drop into plan mode so it must read and think before it touches anything.

Plan mode is enforced at the tool level — Claude literally cannot edit files or run destructive commands. Toggle it with Shift+Tab (cycles normal → plan → auto-accept). Workflow: explore → plan → code.

If a correction fails twice, stop correcting. Rewinding erases the failed attempt entirely; correcting leaves the bad approach in context where it keeps misleading the model.

Esc          # stop Claude mid-action, context preserved
Esc Esc      # open rewind menu (or /rewind) to restore prior state

Permission & sandbox issues

Prompts are a feature, but you can tune the friction. Add durable allow rules with /permissions, or start the session in a looser mode for trusted work.

claude --permission-mode acceptEdits   # auto-approve file writes + mkdir/mv/cp
claude --permission-mode plan          # read-only until you approve a plan
claude --allowedTools "Bash(git*),Edit" # auto-approve specific tools only

For “outside allowed directories” errors, grant access to the extra path. Note that --add-dir grants file access but does not load that dir’s .claude/ config.

/add-dir ../shared-lib              # in-session
claude --add-dir ../apps ../lib    # at launch
Avoid --dangerously-skip-permissions for routine work. Prefer --allow-dangerously-skip-permissions, which only adds bypassPermissions to the Shift+Tab cycle so you can start safely in plan mode and switch to it later.

MCP server not connecting

Start with /mcp — it shows every server, its connection status, available tools, and is where you complete OAuth for remote servers. From there, check the three usual causes: transport, scope, and restart timing.

  1. Run /mcp and read the status. Red usually means a crashed subprocess or an unfinished OAuth.
  2. Verify the transport. Cloud servers need --transport http; sse is deprecated; omit the flag for local stdio servers.
  3. If you just edited .mcp.json, restart — it is read only at session start.
  4. For a project-scoped server, approve it at the first-use prompt (a guard against cloned repos auto-launching processes).
claude mcp add --transport http docs https://code.claude.com/docs/mcp
claude mcp add playwright -- npx -y @playwright/mcp@latest
claude mcp get docs           # shows which scope holds it
claude mcp reset-project-choices   # re-trigger project approval prompts
All flags (--transport, --env, --scope, --header) must come BEFORE the server name. For stdio, everything after -- is the launch command, passed untouched.
Claude Code reads only ~/.claude.json and the project-root .mcp.json. It does NOT read ~/.claude/mcp.json — a common reason a server “won’t load.”

High-leverage pro habits

One scope per session

Fresh context is free. Finish a task, /clear, start the next. Don’t let unrelated work share a window.

Plan before you build

Explore, plan in plan mode, then execute. Front-loading context writes a stable prefix that caches — driving 90%+ cache hits.

Be specific

Name the files, state constraints, point to an example pattern. Precise instructions mean far fewer corrections.

Scope investigations

Never say a bare “investigate” — it floods context. Delegate research to a subagent that returns a short summary.

Prune CLAUDE.md

For each line ask: would removing this cause a mistake? A bloated memory file makes Claude ignore the rules that matter.

Read narrow

File reads dominate context. Read only the ranges you need and prefer rg/grep over dumping whole files.

Route by cost

Haiku for simple/high-volume, Sonnet for most work, Opus for the hardest reasoning. Switch with /model.

Reserve ultrathink

Escalate ‘think’ → ‘think hard’ → ‘ultrathink’ only for genuinely hard reasoning. More thinking aids depth, not output compliance.

Persist plans and decisions to disk (PLAN.md) and break large work into per-session phases with a /clear between them. External files survive any context reset; conversation history does not.
Avoid cache-killers mid-session: switching models, editing CLAUDE.md, and /compact all invalidate the prompt cache and trigger a slow, expensive uncached turn. Batch those changes at natural breakpoints.
📋
SECTION 26

Master Cheat Sheet

The one tab you keep open

Every flag, command, shortcut, and "which tool when" call in one scannable place.

CLI Flags (the daily ten)

FlagAliasDoes
--print-pHeadless: run once, print, exit. Pipe-friendly.
--continue-cResume the most recent conversation in this dir.
--resume-rResume by ID/name, or open the picker.
--modelSet model: sonnet, opus, or full id.
--permission-modedefault / acceptEdits / plan / auto / dontAsk / bypassPermissions.
--add-dirAdd extra working dirs for read/edit.
--output-formatjson for scriptable output (grab session_id).
--allowedToolsAuto-approve specific tools, no prompt.
--name-nName the session for later --resume.
--bareSkip all auto-discovery for fast, reproducible CI.

Copy-paste launches

# Interactive with a starting prompt
claude "refactor the auth module"

# Headless, pipe content in
cat logs.txt | claude -p "explain the errors"

# Scripted continuation in the current dir
claude -c -p "Check for type errors"

# Start in plan mode (read-only until you approve)
claude --permission-mode plan
Capture a session ID and reuse it in a script.
session_id=$(claude -p "Start a review" --output-format json | jq -r '.session_id')
claude -p "Continue that review" --resume "$session_id"
--dangerously-skip-permissions bypasses every prompt. Prefer --permission-mode dontAsk plus an --allowedTools allowlist for locked-down automation.

Slash Commands

Context & cost

CommandDoes
/clearWipe history, free context. Use between unrelated tasks.
/compactSummarize to reclaim context. Steerable: /compact focus on the API changes.
/contextLive token breakdown by category.
/costSession token usage and spend.
/memoryOpen CLAUDE.md files in your editor.
/rewindRestore prior conversation/code state (erases the failed attempt).

Models & agents

CommandDoes
/modelSwitch model: opus, sonnet, haiku, default, or picker.
/effortSet reasoning effort low/medium/high; auto resets.
/agentsCreate/edit subagents; Running tab shows live ones.
/mcpServer status, tools, OAuth auth, per-server cost.
/pluginDiscover, install, enable/disable plugins.

Project & review

CommandDoes
/initScan repo, generate CLAUDE.md.
/review [PR]Code review of changes or a specific PR.
/security-reviewSecurity-focused analysis of current changes.
/resumePick a past conversation to continue.
/permissionsManage allow/deny rules.
/doctorDiagnose installation health.

REPL Shortcuts

Key / TokenAction
Shift+TabCycle normal / plan / auto-accept mode.
EscStop Claude mid-action; context preserved.
Esc EscOpen rewind menu (same as /rewind).
Option+T / Alt+TToggle extended thinking.
Ctrl+BBackground a running task.
@path/to/fileReference file contents in a prompt.
@agent-nameGuarantee a specific subagent runs.
!commandRun bash; output injected into context.
# (line start)Quick-append a line to CLAUDE.md memory.
Rewinding a failed attempt erases it from context entirely. That beats correcting, which leaves the dead-end polluting your window.

Thinking Budget

Keywords escalate reasoning depth in your prompt. More thinking means deeper reasoning, not better rule-following.

think  <  think hard  <  think harder  <  ultrathink (max)
Reserve ultrathink for architecture or hard multi-file debugging. Use think / think hard for everyday moderate tasks.

Model Picker

Model$/MTok in/outContextUse for
Opus 4.8$5 / $251MLong-horizon agentic coding, big refactors, hard reasoning.
Sonnet 4.6$3 / $151MMost production work: code, analysis, tool use.
Haiku 4.5$1 / $5200kReal-time, high-volume, sub-agents, cost-sensitive.
Rule of thumb: start with Haiku, move up only when a capability gap shows. Switching models mid-session invalidates the prompt cache.

Which Tool When

CLAUDE.md

When: facts Claude needs every session. Why: re-injected after compaction. Non-guessable commands, code style, gotchas. Prune ruthlessly.

Skill

When: a repeatable how-to that loads only on demand. Why: metadata costs ~100 tokens; the body loads when your request matches. .claude/skills/<name>/SKILL.md.

Subagent

When: verbose, self-contained work that returns a summary. Why: runs in its own fresh context, then reports back. Great for parallel research.

MCP

When: Claude needs external systems (DBs, APIs, SaaS). Why: standard tool protocol. claude mcp add; tools namespaced mcp__server__tool.

Hook

When: deterministic action the model must not skip. Why: shell/HTTP at lifecycle points. Auto-format on Edit, block bad Bash, run tests on Stop.

Plugin

When: bundle and share many of the above. Why: one installable directory of skills, agents, hooks, MCP. /plugin install name@marketplace.

Config-File Map

WhatUser levelProject (shared)
Memory~/.claude/CLAUDE.md./CLAUDE.md
Settings~/.claude/settings.json.claude/settings.json
Skills~/.claude/skills/.claude/skills/
Subagents~/.claude/agents/.claude/agents/
MCP~/.claude.json.mcp.json
Hookssettings.json (hooks key)settings.json (hooks key)
Untracked local overrides go in .claude/settings.local.json (gitignored). MCP local scope lives in ~/.claude.json, not that file.

Quick Recipes

# Add a remote MCP server (project-shared, checked in)
claude mcp add --transport http --scope project docs https://code.claude.com/docs/mcp

# Add a local stdio MCP server
claude mcp add playwright -- npx -y @playwright/mcp@latest
# Auto-format every edit (PostToolUse hook in settings.json)
{"hooks": {"PostToolUse": [{"matcher": "Edit|Write",
  "hooks": [{"type": "command",
  "command": "jq -r .tool_input.file_path | xargs prettier --write"}]}]}}
Hook exit codes: 0 = success, 2 = blocking error (stderr fed back to Claude), other non-zero = non-blocking. Use 2 to enforce policy.

Custom Slash Command (60 seconds)

  1. Create .claude/commands/optimize.md (filename becomes /optimize).
  2. Add YAML frontmatter: description, argument-hint, optional allowed-tools, model.
  3. Write the prompt body. Use $ARGUMENTS, $1, @path, and !cmd for dynamic content.
---
description: Optimize a file for performance
argument-hint: [file path]
allowed-tools: Read, Edit, Bash(git diff:*)
---
Review @$1 for performance issues and apply safe fixes.
Current diff:
!git diff $1
Front-load context in plan mode, then execute. A stable prefix gets cached once, so every later turn reads from cache (90%+ hit rates) instead of recomputing. Fresh context is free, so /clear between unrelated tasks.
🌍
SECTION 27

What Is AI, Really?

Strip away the hype: AI is just software doing jobs we thought needed a brain.

⚡ WHY IT MATTERSKnowing AI's nested layers helps beginners see that chatbots like Claude are one slice of a much bigger field.
What Is AI, Really? — nested fields, smallest is newest Each ring lives inside the one before it. The pulse travels to today's LLM core. AI Machine Learning Deep Learning Gen AI / LLMs Spam filter Recommendations Image recognition Claude (LLM) All AI is rule-driven smarts; ML learns from data; deep learning uses neural nets; LLMs generate language.
Concentric rings show AI containing Machine Learning, then Deep Learning, then Generative AI and LLMs, with example chips and a pulse moving inward to the Claude core.

Before we touch a single buzzword, let's nail down what "AI" actually means. Short version: it's software that does tasks we used to assume needed a human brain. That's it. No glowing red eyes required.

One plain sentence

Artificial Intelligence is software that performs tasks we once thought required human intelligence — spotting spam, suggesting a song you'll like, finding the fastest route, answering a messy question in plain English. When you put it that way, "AI" stops sounding like science fiction and starts sounding like Tuesday.

The two ways software can be smart

This next idea is the spine of the entire track, so it's worth slowing down for. There are really only two ways to build something "intelligent," and the difference is who writes the rules.

Rules-based (hand-written)

A human writes every if-this-then-that by hand. "IF the email contains 'FREE VIAGRA' THEN mark as spam." Predictable and easy to inspect — but a human has to imagine every case in advance, and spammers are creative.

Learned (figured out from examples)

You don't write the rules. You show the system thousands of examples ("these 10,000 were spam, these weren't") and it figures out the patterns itself. Flexible and powerful — but harder to peek inside and explain.

Almost everything people call "AI" today is the learned kind. The rules-based kind is older and still everywhere (your tax software, a thermostat) — it's just not what's making headlines.

The nesting dolls

Here's where the jargon usually gets confusing — AI, ML, Deep Learning, Generative AI, LLMs. They sound like rivals. They're not. They're Russian nesting dolls (matryoshka): open the big one and a smaller, more specialized one sits inside.

DollWhat it isEveryday face
AI (biggest)The whole idea: machines doing brain-like tasksThe entire field
Machine LearningAI that learns rules from examples instead of being hand-codedSpam filter, maps ETA
Deep LearningML using many-layered "neural networks" — math loosely inspired by brain cells, great with images, audio, and languageFace unlock, voice typing
Generative AI / LLMs (smallest)Deep learning that creates new text, images, or code (an LLM = Large Language Model, the text kind)Claude, ChatGPT

Each doll is a more specific kind of the one wrapped around it. So Claude is a Large Language Model, which is Generative AI, which is Deep Learning, which is Machine Learning, which is AI. All true at once. No contradiction.

Narrow vs general — set your expectations

Today's AI is narrow: brilliant at one job. A spam filter can't drive a car. Claude can write a poem but can't water your plants. AGI — Artificial General Intelligence, a do-anything, human-level mind — does not exist yet, no matter what a flashy demo implies.

When you hear "AI can do anything now," translate it to "this narrow AI is very good at its one thing." That mental swap will save you a lot of disappointment.

The whole story in one breath

You don't need the full history — just the shape of it. Read this as a single timeline:

  1. 1950 — Alan Turing asks "can machines think?" The field gets its name a few years later, in 1956.
  2. 1980sExpert systems: giant hand-written rulebooks. Clever, but brittle.
  3. 1990s–2000sMachine Learning takes over: let the data write the rules.
  4. 2012Deep learning explodes after neural nets crush an image-recognition contest.
  5. 2017 — The transformer is invented — the engine behind every modern LLM.
  6. 2022ChatGPT launches and suddenly everyone's grandma knows the word "AI."
  7. TodayAgents: AI that doesn't just answer, but takes actions for you.

It's already in your pocket

If "AI" still feels abstract, look around — you've been using it for years:

Spam filter

Quietly sorting your inbox, learning what junk looks like.

Maps ETA

Predicting traffic to guess your arrival time within minutes.

Recommendations

The "you might also like" behind Netflix, Spotify, and your shopping cart.

Claude

An LLM you can just talk to in plain language.

Fun fact: your email spam folder was doing machine learning before it was cool — quietly judging your inbox since the late 1990s.
Most of today's magic is the learned kind of AI. So in the next section, let's open that doll and see how a machine actually learns from examples.
📈
SECTION 28

Machine Learning: Learning from Data

Stop writing the rulebook — hand the machine the photo album and let it figure out "cat."

⚡ WHY IT MATTERSMachine learning powers nearly every modern AI product, turning past data into predictions about new, unseen situations.
Machine Learning: learning from data, then predicting 1 . TRAINING — adjust weights from labeled data Labeled data input + answer Model adjusts weights repeat & correct Check error guess vs answer 2 . INFERENCE — use the trained model on new input New input no answer yet Trained model weights frozen Prediction answer
A two-phase flow: training feeds labeled data into a model that adjusts its weights in a loop, then inference sends a new input through the trained model to get a prediction.

Here's the big flip this section owns: instead of a human painstakingly hand-coding rules, machine learning lets the computer learn the patterns straight from data. You don't tell it what a cat is — you show it, and it works the rest out itself.

The cat-photo story

Imagine teaching a child what a cat is. You don't hand them a rulebook saying "pointy ears, whiskers, four legs, smug expression." You just point at cats over the years and say "cat." Eventually the child recognizes a cat they've never seen before — even a weird hairless one.

Machine learning does the same thing. Show a model a giant album of labeled cat photos and it learns "cat-ness" on its own. Nobody writes the rules; the rules emerge from the examples. That's the whole shift.

The working vocabulary

Four words unlock almost every ML conversation. Learn these and you're fluent enough to follow along.

Features

The inputs — the raw stuff the model looks at. For a photo, the pixels. For a house price, the square footage and zip code.

Labels

The answers we want — "cat" or "not cat," the actual sale price. The correct response we're trying to predict.

Model

The thing that learns — the recipe that turns features into a guess at the label.

Parameters / weights

The dials the model tunes. Training is just nudging these numbers until the guesses get good. (Weights are the most common kind of parameter.)

Three ways a machine can learn

There are three big "learning styles," called paradigms. One line each:

ParadigmHow it learnsEveryday picture
SupervisedFrom labeled examples (features + correct answers)Flashcards with the answer on the back
UnsupervisedFinds hidden structure with no labels at allSorting a messy drawer into natural piles
ReinforcementBy trial and error, chasing rewardsTraining a dog with treats

The cat-album example is supervised learning — every photo came with a label. That's the most common starting point, and the one to keep in your head.

Two phases of a model's life: training vs inference

This split shows up everywhere in AI, so it's worth nailing now.

Training = studying

The model pores over the examples and adjusts its dials, over and over, until its guesses match the labels. Slow, expensive, and done in big batches (then occasionally repeated to keep the model fresh).

Inference = taking the test

The trained model meets brand-new data and makes a prediction. Fast, cheap, and what happens every time you use it. Like flipping a flashlight on — you don't need to know the electronics to use it.

You'll meet "training vs inference" again and again across this track. Whenever a system is "learning," it's training; whenever it's "answering," it's inference.

Memorizing vs actually understanding

Here's the trap every model can fall into. We don't want it to memorize the album — we want it to understand cats well enough to handle ones it's never seen. That ability to handle new data is called generalization, and it's the entire point.

The failure is called overfitting. Overfitting is the student who memorizes every past exam word-for-word, then panics when the teacher changes the numbers. They aced the practice test and bombed the real one — because they learned the answer key, not the subject.

A model that scores perfectly on its training data but flops on new data is overfitting. Great on the practice test, useless in the real world. We always judge a model on data it has never seen.
Rule of thumb: if results look too perfect on the training set, be suspicious. Real understanding means doing well on fresh examples, not reciting old ones.

So far we've treated "the model" as a magic box that tunes dials. The most powerful ML models are built from layered networks of these dials stacked together — let's look under that hood next.

🕸️
SECTION 29

Neural Networks & Deep Learning

Inside every AI is a giant mixing board of dimmer switches that learned to tune itself.

⚡ WHY IT MATTERSNeural networks power modern AI, learning patterns from data to recognize images, understand language, and make predictions.
Neural Network — Forward Pass A signal travels left to right; each node sums its inputs and fires onward. Input Hidden 1 Hidden 2 Output prediction →
A feed-forward network passes a signal pulse left-to-right from 3 inputs through two hidden layers to 2 outputs (the forward pass).

Remember the nesting doll from earlier? Open the "machine learning" doll and inside sits deep learning. Let's crack that one open too — because it turns out to be built from millions of copies of one tiny, almost embarrassingly simple gadget: the artificial neuron. Build one, stack a bunch, and you've got the engine behind nearly every modern AI.

Step one: build a single neuron

An artificial neuron is basically a tiny calculator. It does three things in a row:

  1. Take some inputs — numbers coming in (say, the brightness of pixels, or "was it raining?").
  2. Multiply each by a weight, then add them up — the weight says how much that input matters. Big weight = "pay attention to this." Near-zero weight = "ignore it."
  3. Push the total through an activation function — a gate that bends the result so the network can learn curvy, non-straight-line patterns. Loosely: a bouncer who decides how much of the signal gets waved through.

That's the whole neuron. Inputs in, one number out. Not magic — arithmetic.

Step two: stack neurons into layers

One neuron makes one little decision. The power comes from wiring thousands of them together in layers, where each layer's output feeds the next.

Input layer

Where the raw data enters — the pixels, the words, the sensor readings. No thinking happens here; it's just the loading dock.

Hidden layers

The actual processing. Each layer transforms what the last one found into something more abstract. This is where the real work lives.

Output layer

The final answer — "cat," "spam," or the next word in a sentence.

And "deep" learning? It just means there are many hidden layers stacked between input and output. That's the entire secret behind the scary-sounding word. Deep = lots of layers. That's it.

The knobs: weights are everything

Here's the anchor image to keep in your head. Picture a giant sound-mixing board covered in dimmer-switch sliders. Each weight is one slider. The activation function is the gate each signal passes through after the sliders set its level. A real network has millions (or even billions) of these sliders.

When people say a model is "trained," they mean exactly one thing: all those sliders got nudged up and down until the output came out right. Training is just tuning the knobs. And here's the kicker — no human touches them by hand. The network adjusts its own sliders, which is the next thing to explain.

How it actually learns (the loop)

So how do you tune millions of sliders with nobody touching each one? You let the network do it, in a repeating four-beat loop:

BeatPlain-English meaning
Forward passRun data through and make a guess (it'll be terrible at first).
LossMeasure how wrong the guess was — one number for "how bad."
BackpropagationTrace the error backward to figure out which sliders caused it, and which way each should move.
Gradient descentNudge every slider a little in the helpful direction. Repeat — often millions of times.

Gradient descent is the all-time favorite analogy here: imagine a hiker descending a foggy mountain, feeling which way is downhill and taking one careful step at a time. "Downhill" means "less error." Do that enough and you reach a valley floor — a network that's usually right.

The model never gets told the right weights. It just keeps measuring its own mistakes and shrinking them, one tiny nudge per knob, until the output stops being wrong.

Why deep beats hand-crafted features

Before deep learning, humans had to hand-engineer the clues a program looked for — "an edge here, a curve there." Tedious, brittle, and capped by human imagination.

Deep networks flip that. Given enough examples, they discover their own hierarchy of clues: early layers find edges, middle layers assemble edges into shapes, deeper layers assemble shapes into things like faces or wheels. Nobody programmed those stages — they emerged from turning the knobs.

The turning point was 2012, when a deep network (nicknamed AlexNet) crushed the ImageNet image-recognition contest, beating hand-built methods by a stunning margin. It was basically AI's "oh — the dimmer switches actually work" moment, and the field never went back.

One honest caveat

The "it's like a brain" metaphor is handy but oversold. Real neurons are far more complex than a weighted sum, and brains don't learn by backpropagation or matrix math. "Neural network" is an inspired nickname, not a biology lesson — great for intuition, wrong as literal anatomy.

Where this is heading

You now know the whole engine: neurons, layers, weights as knobs, and a learning loop that tunes them by chasing lower error. Now take a network like this and make it enormous — billions of knobs — then feed it a huge chunk of all the text humans have ever written. Do that, and what you get is a Large Language Model. That's exactly where we go next.

💬
SECTION 30

What Is an LLM?

A giant autocomplete that picked up coding and translation as accidental side hobbies.

⚡ WHY IT MATTERSN/A
What Is an LLM? — It Predicts the Next Word, One Token at a Time Prompt "The cat sat on the ___" LLM Probability of next token mat 0.62 floor 0.18 roof 0.05 mat Autoregressive loop: append the winning token, then predict again Result: The cat sat on the mat
An animated view of next-token prediction: a prompt enters the LLM, which outputs token probabilities, then loops the top token back into the sentence.

You've used an LLM if you've ever talked to Claude, ChatGPT, or Gemini. But under the hood, the thing doing all that brilliant work is doing something almost comically simple: guessing the next chunk of text, over and over.

The one-line definition

A Large Language Model (LLM) is a very large neural network (remember those from the last section?) trained to do one simple thing: predict the next token — a token being a small piece of text, roughly a word or part of a word. That's it. Everything else — the essays, the code, the patient explanations — grows out of that single skill done at an absurd scale.

What does "large" actually mean?

Two things are large here, and both matter:

Billions of parameters

Parameters are the tunable "weights" inside the neural net — the knobs it adjusts while learning. Modern LLMs have billions to trillions of them. More knobs means more nuance it can capture.

A giant text diet

It was trained on an enormous slice of the text humans put online, plus mountains of books. Think of it as having read more than any person could in many lifetimes.

"Predict" doesn't mean "look up"

When we say it predicts, we mean it models probabilities over what comes next. Type "I'll be there in ___" and your phone suggests five minutes. An LLM does the same basic trick — just trained on a huge chunk of the internet instead of your text history, and scaled up enormously. Same idea, wildly more power.

The big misconception to kill right now: an LLM is not a database or a search engine looking up stored answers. There's no filing cabinet inside it with "capital of France → Paris." Instead, Paris comes out because that pairing dominated its training data. It's a pattern predictor generating plausible text from learned weights — a statistical map of how language works, not a lookup table.

Search engineLLM
What it doesPoints you to where an answer might liveTries to produce the answer itself
SourceIndexed web pagesPatterns baked into its weights
If it's wrongIt showed you a page; you can judge the sourceIt can state a wrong answer fluently and confidently

How it writes: the autoregressive loop

Here's the surprising part — the model doesn't compose a whole answer at once. It builds every response one token at a time, in a tight loop:

  1. Predict a probability for every possible next token, given everything so far.
  2. Pick one token from that distribution (often, but not always, a top-ranked one).
  3. Append it, feed the whole thing back in, and predict again.
  4. Repeat — sometimes thousands of times — until the answer is complete.

This predict-pick-append-repeat cycle is called autoregressive generation. Every answer you've ever gotten from an LLM is this loop run over and over. It's a bit like writing a sentence where each word is chosen knowing only the words before it.

Because step 2 involves a bit of controlled randomness, the same question can produce slightly different wording each time — that's a feature, not a glitch.

The plot twist: emergence

Here's what made LLMs a genuinely big deal. At small scale, a next-token predictor is just a clever autocomplete. But push the size up far enough — more parameters, more data — and abilities that weren't obvious at small scale start to show up: translating languages, writing working code, multi-step reasoning. Nobody hand-coded those in.

It learned to write working code and translate French essentially as a side effect of getting really, really good at finishing your sentences — nobody told it to. Researchers call these surprise skills emergent abilities, and they're a big part of why the field exploded.

Meet the players

You already know the names. Claude (Anthropic), GPT (OpenAI), Gemini (Google), and Llama (Meta) are all LLMs. Different companies, different training, same core machine: a giant neural net predicting the next token.

One more honest detail before we move on: the model never actually sees words. It sees tokens — the puzzle pieces text gets chopped into. Let's go meet them next.

🔤
SECTION 31

Tokens, Embeddings & Vectors

Models don't read words — they read tokens, then turn them into coordinates for meaning.

⚡ WHY IT MATTERSTokens and embeddings are how AI turns text into numbers it can compute with, so meaning becomes math.
Tokens, Embeddings & Vectors: turning words into numbers Text is split into tokens, each token becomes a vector, and similar meanings land close together. raw text unbelievable tokenizer splits it un believ able 3 tokens embedding vectors un [0.21, -0.8, 0.5] bel [0.9, 0.12, -0.3] abl [-0.4, 0.6, 0.7] rows of numbers 2D vector space (similar words cluster) believe trust faith banana apple tk a new token lands next to words with similar meaning
A pipeline showing raw text split into tokens, each mapped to an embedding vector, then placed as a point in a 2D space where similar words cluster.

Here's the plot twist nobody tells beginners: a language model never actually reads words, or even letters. It reads tokens — bite-sized chunks of text. Everything else in this guide is built on top of that one fact.

What's a token?

A token is a sub-word chunk: sometimes a whole word, sometimes a piece of one, sometimes just a space or a punctuation mark. As a rough rule of thumb for English, one token is about 4 characters, or roughly 0.75 of a word. So 100 words lands somewhere near 130 tokens.

Why chop words up? Because keeping a separate entry for every word ever written would be wildly inefficient. Instead, the model learns a kit of reusable pieces. Take unbelievable — it can split into un / believ / able. Those same pieces snap back together to build unthinkable, believer, readable, and thousands more. Think of tokens as the LEGO bricks of text. (Different models slice slightly differently, so the exact pieces vary — but the idea is identical.)

Want to see it? Search for an online "tokenizer" tool, paste in a sentence, and watch your text get sliced into colored chunks. It's the fastest way for this to click.

From token to meaning: the embedding

A model can't do math on the letters c-a-t. So each token gets converted into an embedding — a long list of numbers (a vector) that captures its meaning. Not the spelling. The meaning.

On its own, a token's embedding captures its general meaning, the dictionary-ish gist. The richer, in-context meaning (is "bank" a river or a vault?) gets layered on later by the transformer — that's the next section.

The anchor idea: GPS coordinates for meaning

Think about how GPS works. Two numbers — latitude and longitude — pinpoint any place on Earth. Grand Blanc, Michigan is just a pair of coordinates, and so is Detroit, and because they're close in real life, their numbers are close too.

An embedding does the same trick, but for meaning instead of geography. Its list of numbers pinpoints a word in an imaginary "meaning-space." Words with similar meanings land near each other, like houses in the same neighborhood: cat, kitten, and feline cluster together, while helicopter sits way across town.

Meaning even has direction

The famous demo: take the vector for king, subtract man, add woman, and you land remarkably close to queen.

king − man + woman ≈ queen

That's not a parlor trick — it hints that meaning is geometric and directional: a rough "royalty" direction and "gender" direction actually exist as movements through the space.

Treat this as a tidy ideal, not a guarantee. Real embeddings are messier than the clean textbook example — but the intuition (meaning has shape and direction) is genuinely how it works.

Why you should care: cost and the clock

Tokens aren't just an internal detail — they're the unit of money and the unit of memory. You're billed per token (in and out), and the model can only hold so many at once.

Tokens = your wallet

Providers price per token. A long document in plus a long answer out means more tokens, which means a bigger bill. Concise prompts are cheaper prompts.

Tokens = your budget

The context window is the total number of tokens the model can hold at one time — prompt, conversation, and reply combined. Fill it up and the oldest stuff falls out the back.

The strawberry problem

This is why an LLM can write you a flawless sonnet but fumble "how many R's are in strawberry." It sees chunky tokens, not individual letters — counting characters isn't its native language.

Text≈ Tokens
One short word ("cat")1
"unbelievable"~3 (un / believ / able)
A 100-word paragraph~130
A 10-page document~5,000+

We'll come back to the context window when we talk about inference — for now, just hold the picture: it's a token budget, and you spend it as you go.

Tokens and embeddings are the ingredients. Up next, the transformer is the machine that takes all these meaning-coordinates and lets them talk to each other.
SECTION 32

Transformers & Attention

In 2017 one paper taught machines to read every word at once — and lit the fuse on ChatGPT.

⚡ WHY IT MATTERSSelf-attention lets a model decide which earlier words matter most, the breakthrough that powers every modern LLM.
Self-Attention: every word looks at every other word The focus word “it” weighs the others — thicker, brighter beam = more attention The tired animal crossed because it focus word Stacked layers ×N attention repeats, refining meaning each layer Multi-head many heads look at different relations at once strong weak
The word "it" sends pulsing attention beams to every other word, with the thickest, brightest beam landing on the noun it truly refers to ("animal").

Back at our 2017 stop on the timeline, a paper with the cheeky title "Attention Is All You Need" introduced the transformer — the architecture humming under the hood of virtually every modern LLM. The title was a flex, and for once it held up: attention really was the key idea, and it kicked off the entire ChatGPT era.

The heart of it: self-attention

Here's the one idea that matters. As the model reads, every token (a word or word-piece) looks at the other tokens and decides how much each one matters to its own meaning. That's it. That's self-attention.

Picture a sentence: "The trophy didn't fit in the suitcase because it was too big." What does "it" mean — the trophy or the suitcase? You knew instantly. Self-attention is how the model figures it out too: the word "it" glances at the other words, scores which one is most relevant, and locks onto "trophy." Swap "big" for "small" and the answer flips to "suitcase" — and the model's attention flips right along with it.

Think of a lively dinner party. Every word is a guest who listens to all the other guests at once and decides whose comment matters most for understanding the conversation. That simultaneous, selective listening — everyone weighing everyone else in parallel — is self-attention in one image.

Why it crushed the old way (RNNs)

Before transformers, models called RNNs (recurrent neural networks) read text the way you'd read with a finger under each word — one word at a time, left to right, trying to remember everything that came before. Slow, and forgetful: by the end of a long paragraph, the start had faded.

Attention reads the whole input at once. Two huge wins fall out of that:

Parallelizable

All tokens are processed simultaneously, not single-file. That's what makes training on internet-sized text feasible — and fast on modern hardware (GPUs love doing many things at once).

Long-range context

A word near the end can directly "talk to" a word near the start in a single step, instead of passing the message hand-to-hand. Distant connections stay sharp instead of fading.

 RNN (the old way)Transformer / attention
Reading styleOne word at a timeAll words at once
SpeedSequential, slow to trainParallel, fast to train
Distant wordsFades over distanceConnects directly

Multi-head attention: several viewpoints at once

One round of attention is good. Transformers run several in parallel — called heads — and each head can learn to notice a different kind of relationship. Back at the dinner party: one guest tracks grammar (who's the subject, who's the verb), another tracks topic (which words are about the same thing), another tracks who-refers-to-whom (that "it" → "trophy" link). Combine all those viewpoints and you get a far richer read of the sentence than any single pass could give.

Stacked deep — the "deep" in deep learning

One block of attention isn't the whole model. These blocks are stacked many layers high, each one's output feeding into the next. Early layers tend to catch simple patterns; later layers build toward grammar, meaning, and intent — the same hierarchy-of-abstraction idea from the neural-networks section, just taller. That stacking is the "deep" in deep learning.

One honest catch: attention by itself is order-blind. On its own it treats the input like a bag of words — "dog bites man" and "man bites dog" would look identical to it. Very different news stories. So position information is added in separately (called positional encoding) to tell the model where each word sits. Attention figures out what relates to what; position tells it what came where.
If you remember one thing: a transformer reads everything at once and lets every word decide which other words to listen to. That single trick — selective listening, in parallel, stacked deep — is what powers today's LLMs.

So now we have the architecture. But an empty transformer knows nothing — it's a brilliant student who hasn't opened a single book yet. How do you actually teach it? That's training, and it's our next stop.

🏋️
SECTION 33

Training, Fine-Tuning & RLHF

A model goes to school in three stages: read the library, take a class, get coached.

⚡ WHY IT MATTERSTraining stages turn a raw text-predictor into a helpful, safe assistant that follows instructions and human values.
Training, Fine-Tuning & RLHF — how a raw model becomes an aligned assistant Follow the token as it advances through three stages, getting brighter and better aligned. 1 Pretraining huge text corpus predict the next word 2 SFT instruction pairs Q: prompt A: good answer learns to follow instructions 3 RLHF humans rank to reward model to tuned human ranks best reward model scores guide tuning aligned raw
A three-stage pipeline (Pretraining to SFT to RLHF) with a token traveling left to right, getting brighter and more aligned at each stage.

A finished AI assistant isn't built in one step — it's educated in three. Picture a model going to school: first it reads the entire library, then it takes a class on answering questions, then mentors coach it until it's genuinely useful.

The three-class education of a model

Everything in this section happens during training — the slow, expensive "studying" phase where the model's internal numbers (its weights) get tuned. Once training wraps up, those weights are locked in and the model ships out to do its job (that's inference, covered elsewhere). Here are the three classes it takes, in order.

1. Pretraining — read the library

The model reads internet-scale text and plays one game trillions of times: predict the next token (a token is a word or word-fragment). No humans label anything — the text is its own answer key: hide what comes next, guess it, check, repeat. This is where nearly all its language skill and world knowledge come from.

2. SFT — take a class

Now it studies curated (instruction, response) pairs written by people, so it learns to actually follow instructions instead of just continuing text. This mostly shapes behavior and tone — not new facts.

3. RLHF — get coached

Humans rank competing answers. A "reward model" learns those preferences, then nudges the model toward being helpful, honest, and harmless — qualities too fuzzy to write down as labeled examples.

Stage 1: Pretraining (self-supervised next-token prediction)

The model chews through a colossal pile of text and repeatedly guesses the next token. Because the right answer is simply whatever actually came next, no human grading is needed — that's why it's called self-supervised. The result is a base model: a raw text-completer that has absorbed grammar, facts, and patterns but isn't yet a polite assistant.

A base model fresh out of pretraining has read everything ever written but never had a real conversation — like a hermit genius who needs one quick class on small talk before it's any use.

Stage 2: Supervised fine-tuning (SFT)

Here the base model keeps training, but now on hand-picked examples of good instruction-and-answer pairs. It's the difference between someone who has read every cookbook and someone who's taken a class on actually answering "what should I make for dinner?" SFT teaches how to respond — the helpful format, the tone, the do-what-was-asked behavior. It generally doesn't pour in new knowledge; that came from Stage 1.

Stage 3: RLHF and preference tuning

Some things are hard to capture with example answers — "be honest," "don't be smug," "refuse this politely." So instead of writing perfect answers, humans look at two model replies and pick the better one. A reward model learns to predict those human preferences, and reinforcement learning gently steers the main model toward answers the reward model scores highly. It's coaching: try, get ranked, adjust, repeat.

Anthropic adds a twist called Constitutional AI. Instead of humans labeling every single case, the model critiques and revises its own answers against a written set of principles — a "constitution." Claude is trained this way, which scales the coaching without needing a human to weigh in on every judgment.

The cost reality

The three stages are wildly unequal in price. Pretraining is the budget-buster; the later stages are cheap by comparison but disproportionately decide how the model behaves.

StageTeachesNeeds humans?Cost
PretrainingLanguage + world knowledgeNo (self-supervised)Enormous
SFTFollowing instructionsYes (curated pairs)Much smaller
RLHF / Constitutional AIHelpful, honest, harmlessYes (rankings / principles)Smaller still, but decisive
The full arc in one line: read the entire library (pretraining), take a focused class on answering questions (SFT), then get coached by mentors who rank your answers until you're genuinely helpful (RLHF / Constitutional AI).
🔮
SECTION 34

Inference, Context & Hallucinations

Inference is the model improvising — give it a script or it makes things up.

⚡ WHY IT MATTERSKnowing what the model can "see" and how temperature shapes output helps beginners trust answers and spot AI mistakes.
Inference, Context & Hallucinations The model only "sees" tokens inside its context window. Temperature tunes focus vs. creativity. Long scroll of tokens The cat sat on the warm soft mat near fire all day context window (what it sees) Temperature dial low high focused to creative ! Hallucination missing context = guess Tokens outside the frame are forgotten — the model cannot read them. If needed facts fall outside the window, the model may invent them.
A sliding context window highlights only the tokens it can read on a long scroll, a temperature dial wiggles between focused and creative, and missing context triggers a hallucination warning.

Training built the brain. Inference is what happens when you actually use it — and understanding this one moment explains why models are so fluent, so fast, and occasionally so confidently, spectacularly wrong.

Inference: using the frozen brain

Remember training — millions of examples nudging billions of dials until the model got good? Inference is the exam, not the studying. The dials are now locked. Every time you chat with a model, you're running that finished, frozen brain forward to generate text. The weights don't change, the skills are fixed in place. It doesn't learn from your conversation, and it won't remember you tomorrow.

The context window: its tiny working memory

The model has no memory of your last chat and no private notebook. All it can "see" is what fits in its context window right now — think of it as the model's short-term working memory, or the desk it's allowed to spread papers on.

Here's the catch beginners always miss: that desk holds everything at once — your prompt, the entire back-and-forth history, the system instructions, and the answer it's currently writing. Anything that doesn't fit on the desk is simply gone. Not "kind of forgotten" — invisible. If a detail scrolled off the top, the model isn't ignoring it; from where it sits, that detail never existed.

This is why a long chat can suddenly "forget" something you said earlier, and why pasting the relevant document into the conversation works so much better than expecting the model to recall it. On the desk = it can use it. Off the desk = it can't.

Temperature: the creativity dial

When the model generates, it's choosing the next word from a list of likely candidates. Temperature (and related sampling settings) decides how adventurous that choice is.

Low temperature

Focused, predictable, repeatable. It picks the safest, most likely word almost every time. Great for code, data extraction, and factual answers.

High temperature

Loose, surprising, creative. It gambles on less-likely words. Great for brainstorming, poetry, and "give me ten wild ideas."

Temperature controls randomness, not correctness. Turning it down makes answers more consistent — it does not make them more true. Temperature zero doesn't make a model honest; it just makes it lean toward the same confident answer every time, right or wrong.

Why models hallucinate

A hallucination is when the model states something false with total confidence — a fake citation, a made-up date, a plausible-sounding "fact" that isn't. Beginners assume this is a bug. Mostly, it's the core mechanism working exactly as designed.

The model doesn't look up facts in a database. It predicts plausible text, one word at a time, based on patterns from training. When it has seen the answer often, the most plausible continuation happens to be correct. When it hasn't, the most plausible continuation is a confident-sounding guess — because in its training data, confident sentences almost never trail off with "...but honestly I'm not sure."

The anchor analogy: the model is an improv actor who never breaks character to say "I don't know." Hand them a script — your source document — and they're sharp and accurate. Leave them to ad-lib from memory and they'll invent details on the spot to keep the scene going. The show must go on.

The fix is shockingly effective

Here's the rule of thumb worth memorizing. When a model answers open-ended questions purely from memory, it can be wrong a meaningful chunk of the time — and you often can't tell which chunk. Hand it a source document to work from — and ask it to answer using that document — and accuracy improves dramatically, because now it's reading instead of guessing.

The takeaway every pro repeats: models are best as editors and synthesizers, not as knowledge sources. Don't ask "what does the contract say?" — paste the contract in and ask "according to this, what does it say?"

Your practical mitigation toolkit

MoveWhy it helps
Give it context / sources (RAG)RAG = retrieval-augmented generation: fetch relevant text and hand it to the model. The script instead of improv — the single biggest accuracy win.
Ask for citationsForces it to point at where a claim came from, so you can check it (and still verify — citations can be faked too).
Lower the temperatureFor factual work, fewer creative gambles means fewer invented flourishes.
Verify anything high-stakesMedical, legal, financial, or money-moving answers always get a human check.

The knowledge cutoff

Every model was trained up to a certain date and then frozen. On its own, it knows nothing that happened after its knowledge cutoff — last week's news, today's prices, a result from this morning. Ask about recent events and it'll either admit the gap or, worse, confidently improvise. The fix is the same one as above: don't rely on stale memory, drop the fresh information straight into the context window.

Notice the pattern: nearly every fix is "give the model the right context." So what if the model could fetch its own context — search the web, read a file, run a calculator — and act on it in a loop, instead of waiting for you to paste everything in? Give it tools and a loop, and you've built an agent. That's next.
🔁
SECTION 35

Agents, Tools & the AI Loop

An agent is an LLM in a loop with hands: it acts, checks its work, and tries again.

⚡ WHY IT MATTERSValidated the SVG against all six checks and fixed two real issues plus hardened compatibility.
Agents, Tools & the AI Loop An agent keeps cycling: it thinks, acts with a tool, sees the result, and repeats until done. Perceive read the task Think (LLM) decide next step Act call a tool Observe see the result picks one Tools search look things up run code compute & test edit file change the work
A four-step agent loop (Perceive, Think, Act, Observe) with a glowing token circling clockwise and a side rack of tools the Act step calls.

A chatbot answers and stops. An agent reads a goal, takes a real action, looks at what happened, and keeps going until the job is done. That loop is the whole difference.

The leap: from chatting to doing

So far we've treated an LLM as something you talk to: prompt in, text out, done. An agent takes that same model and drops it into a loop where it can take actions in the world — search the web, run code, edit a file — and then react to the results. One single model call is not an agent. The loop is what earns the name.

The difference between a chatbot and an agent is the difference between someone who gives you directions and someone who actually grabs the keys and drives.

The core loop, one trip around

Every agent runs the same four-beat cycle, over and over, until the goal is met:

  1. Perceive — read the goal and the current state ("the tests are failing; here's the error").
  2. Think — plan the next single step ("I should open the file the error points to").
  3. Act — call a tool or run code (open the file, run the test, search the docs).
  4. Observe — read the result, then loop back to step 1 with that new information.

The anchor: a chef tasting the sauce

Picture a cook who never tastes anything — they just follow a fixed recipe and pray. Now picture a real chef: taste the sauce (observe), decide it needs salt (think), add a pinch (act), taste again (loop). That feedback cycle — not one frozen recipe — is exactly what makes an agent an agent.

What's a "tool," really?

A tool is just a function the model is allowed to call. Each one comes with three things: a name, a plain-English description of what it does, and a typed input schema (what arguments it expects). The model reads those descriptions and picks the right tool for the moment. Without tools, an LLM can only produce text — it can say "I'll search for that," but it can't actually do it.

Web search

Fetch fresh, real information the model never memorized.

Run code

Execute a script and read the actual output — math it can verify, not guess.

Read / write files

Open a file, edit it, save it. This is how Claude Code touches your project.

Why the loop is the entire point

Feeding each result back into the next step lets the model check its own work and self-correct. Wrote a function? Run the test. Test failed? Read the error, fix the line, run it again. That tight perceive-act-observe rhythm is exactly how Claude Code operates — it doesn't just suggest a change, it makes the change, runs your tests, and reacts to what breaks.

Memory and a leash

Two practical pieces keep the loop sane. First, working memory: every tool result gets added to the conversation, so the model carries what it learned into the next step. Second, a budget — a maximum number of iterations — so it can't spin forever chasing a goal it can't reach.

Remember the hallucination problem from earlier? Giving an agent a retrieval tool is one of the cleanest fixes. Instead of recalling a fact from fuzzy memory, the agent fetches the relevant document and reads from it — this is the idea behind RAG (retrieval-augmented generation). The mantra: an LLM is a great editor but a shaky encyclopedia, so paste in the source rather than trusting it to remember.

Where loops go wrong

Failure modeWhat it looks likeThe fix
Infinite loopTries the same thing foreverA max-iteration budget and clear stop conditions
Wrong toolPicks the search tool when it needed codeClear tool names and descriptions
Hallucinated result"Pretends" a tool ran and invents outputHuman-in-the-loop checkpoints on risky steps
An agent acting on the world is powerful — and that power cuts both ways. The same hands that fix a file can delete one. Up next: the real limits and risks, and how to keep an agent safe.
⚖️
SECTION 36

AI Safety, Ethics & Limits

AI is a brilliant intern, not an oracle — know the limits and you become a power user.

Here's the good news and the catch in one breath: AI is genuinely useful and genuinely limited. Knowing exactly where it falls short is what turns you into a power user instead of a worrier.

Picture a brilliant, fast new intern. Incredibly capable, worth handing real work to — but you still review the important stuff, you don't give them the company passwords, and you don't let them sign the contract alone. That's the exact posture to take with AI. Trust, but verify.

The real limits (none of these are dealbreakers — just things to know)

It hallucinates

An LLM (large language model — the kind of AI behind chatbots) predicts plausible text, not true text. With no fact to recall, it'll invent one that sounds right. It can cite a study that doesn't exist with a completely straight face — treat it like that friend who's amazing at parties but shouldn't be your only source.

It's biased

The model reflects its training data — the huge pile of text it learned from. If that text leans a certain way on a topic, the answer might too. Bias isn't a glitch bolted on; it's baked in by what it read.

No real understanding

There's no consciousness, no feelings, no "knowing" behind the words. It's a stunningly good pattern machine, not a tiny mind in a box.

Knowledge cutoff

It only learned up to a certain date, so it has no built-in memory of recent events. Ask about last week's news and it may go stale, guess, or need a live search tool to catch up.

The dangerous combo isn't that AI is sometimes wrong — it's that it's wrong confidently, in fluent, professional-sounding prose. Smooth delivery is not evidence of accuracy.

The ethics surface, briefly

ConcernWhat it means in plain terms
Bias & fairnessOutputs can quietly favor or harm some groups — a real risk in hiring, lending, or medical contexts.
PrivacyDepending on the tool and its settings, what you type may be processed, logged, or used to improve models. Personal and confidential data deserve caution.
Copyright & provenanceModels trained on others' work raise fair questions about credit, ownership, and consent. ("Provenance" just means where the data came from.)
JobsSome tasks get automated; many roles shift rather than vanish. Disruption is real and uneven.
Misinformation & deepfakesThe same tech that drafts your email can mass-produce convincing fakes. Healthy skepticism scales.

Alignment: making AI do what we actually want

Alignment is the engineering effort to make AI behave the way we genuinely intend — not just the literal words we typed. The shorthand is HHH: Helpful, Honest, and Harmless. Sounds simple. The hard part is all three at once — a model that refuses everything is harmless but useless; one that answers anything is helpful but unsafe. Balancing them is the open challenge.

Your practical user rulebook

  1. Keep a human in the loop. AI drafts and suggests; you decide and approve. You're the editor, not the audience.
  2. Verify high-stakes output. Medical, legal, financial, or anything you'd put your name on? Double-check it against a real source.
  3. Never paste secrets. No passwords, API keys, client data, or anything you wouldn't hand to a stranger.
  4. Cite and check sources. Ask for references — then actually confirm they exist and say what the AI claims.
Best practice from the pros: AI shines as an editor and synthesizer, not a memory bank. Paste in the source document and ask it to work from that — error rates plummet when it isn't recalling from thin air.

The calm takeaway

None of this is doom. These are solvable, ongoing engineering problems, and a lot of smart people are actively working on every one of them. The right stance is caution, not fear — the same everyday judgment you'd use with any powerful new tool.

You've now met every big idea in AI — from tokens and training to safety and alignment. Up next: a quick glossary to lock it all in.
📖
SECTION 37

AI Jargon Buster

A pocket phrasebook for every AI term you met on the tour — point and remember.

You just toured the whole country of AI. This is the pocket card of key phrases to point at when one slips your mind — no memorizing required, just a quick "oh right, that's what that means." Each definition deliberately echoes the analogy from its home section, so it should feel like running into an old friend.

How to use this page

Read top to bottom and the terms build on each other; or just Ctrl+F for the one word that's bugging you. Definitions are one plain line each — zero jargon hiding inside the jargon.

Foundations

AI AGI ML Deep Learning Neural Network Parameter / Weight

Language plumbing

Token Embedding Vector Context Window

Architecture

Transformer Attention LLM

Training

Pretraining Fine-tuning RLHF

Using it

Inference Temperature Hallucination Prompt RAG

Frontier

Agent Multimodal Open vs Closed weights

The glossary

TermOne-line plain definition
AIAny machine doing something that used to need a human brain.
AGIThe hypothetical "can do anything a person can" AI — not here yet.
MLMachine Learning: software that learns patterns from examples instead of being hand-coded.
Deep LearningML using many stacked layers — the assembly line that powers today's AI.
Neural NetworkA web of tiny calculators (loosely brain-inspired) that turns inputs into answers.
Parameter / WeightThe faders on a mixing board — internal numbers the model tunes while learning (weights are the main kind).
TokenThe LEGO bricks of text — words and word-chunks the model reads in pieces.
EmbeddingGPS coordinates for meaning — numbers that place a word on a map where similar ideas sit close.
VectorJust the list of numbers itself — the actual coordinates an embedding lives at.
Context WindowThe model's short-term memory — how much it can hold in mind at once.
TransformerThe engine design behind modern AI — it reads all the words at once instead of one by one.
AttentionEveryone in the room sharing notes — each word decides which others to listen to.
LLMLarge Language Model: phone autocomplete scaled up a trillion-fold.
PretrainingReading a huge pile of text to learn how language works — the raw study phase.
Fine-tuningExtra coaching on a focused subject to specialize an already-trained model.
RLHFTeaching the model manners with human thumbs-up / thumbs-down so it answers helpfully.
InferenceTaking the exam — actually using the trained model to get an answer.
TemperatureThe creativity dial — low for careful and predictable, high for loose and surprising.
HallucinationConfidently making something up — plausible-sounding text with no truth check.
PromptWhat you type in — your instructions and question to the model.
RAGLetting the model peek at your documents first, so it answers from facts, not just memory.
AgentAn LLM given hands — tools like search, code, and memory so it can act, not just talk.
MultimodalA model that also handles images, audio, or video — not text alone.
Open vs Closed weightsOpen = you can download and run the model yourself; closed = you rent it through someone's API.
Stuck on a term mid-track? Two big takeaways carry most of the weight: an LLM predicts likely next text (so paste in real context to keep it honest), and temperature tunes how adventurous it gets.

If you only remember three

Token = text in pieces. Embedding = meaning as coordinates. Inference = using the trained model. Almost everything else hangs off these.

Why echoes, not new words

Each line reuses the picture from its home section on purpose — recognition beats re-learning every single time.

Recognize, don't recite

You don't need to define these cold. Spotting them in the wild and nodding "ah, I know that one" is the whole goal.

You now speak AI. Twenty-four terms ago "token" meant an arcade coin and "temperature" meant a thermostat. Tuck this card in your pocket and go enjoy the country you just toured.