Why Master Claude
One model family, three surfaces, infinite leverage
Claude is one family of models you reach three ways — a chat app, a coding agent, and an API — and mastery is just knowing which one to grab.
Most people only ever open the chat box and stop there. The real wins come from picking the right surface for the job, then giving Claude the context it needs to nail it on the first try.
This hub builds you up in four layers: Foundations Core Power Features Mastery. Start here, then go in order or jump to what you need.
The three surfaces
Each surface runs the same underlying models (Opus, Sonnet, Haiku) but is tuned for a different kind of work.
Web & Desktop App
claude.ai and the desktop app. Best for chat, research, writing, and uploading files. Has Projects, Artifacts, and web search.
Claude Code
An agent in your terminal or IDE. Best for real coding — it reads, edits, and runs files in your repo, runs tests, and makes commits.
API
Build Claude into your own apps and scripts. Best for automation, pipelines, and products. Full control over model, prompts, and caching.
| Surface | Reach it via | Reach for it when… |
|---|---|---|
| Web / Desktop | claude.ai or the app | You want to think, research, write, or analyze files in a chat. |
| Claude Code | claude in a terminal | You want code changed, tests run, or a repo refactored. |
| API | Anthropic SDK / HTTP | You want Claude inside your own software, on a schedule or at scale. |
The 60-second "you can do X" tour
A taste of what each surface unlocks. The rest of the hub goes deep on every one of these.
In the web app
Drop in a messy spreadsheet and get back a clean chart, or have Claude build a working mini-app you can share.
Analyze a file
Upload a CSV or PDF and ask “what are the three biggest trends here?” Claude reads it and answers with citations.
Make an Artifact
“Build a tip calculator as an app.” Claude renders a live, usable tool in a side window you can keep and share.
Use a Project
Upload your style guide once; every chat in that Project follows it automatically.
In Claude Code
Ask in plain English and Claude edits your actual files. Start interactive, or run it headless in a script.
# Interactive: open the agent in your repo
claude
# One-shot: ask a question and exit (headless)
claude -p "Find and fix the off-by-one bug in pagination.py"
# Pipe a file in and let Claude explain it
cat error.log | claude -p "What caused this crash?"
With the API
A few lines wires Claude into anything you build. Pick a model by name and send a message.
from anthropic import Anthropic
client = Anthropic()
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this ticket in one line."}],
)
print(msg.content[0].text)
Pick the right model
All three surfaces let you choose a model. The rule of thumb: start cheap and fast, move up only when you hit a wall.
| Model | Best for | Speed | Price (in / out per MTok) |
|---|---|---|---|
| Haiku 4.5 | Simple, high-volume, real-time tasks | Fastest | $1 / $5 |
| Sonnet 4.6 | Most production work — coding, analysis, content | Fast | $3 / $15 |
| Opus 4.8 | Hardest reasoning, big refactors, long agentic runs | Moderate | $5 / $25 |
/model sonnet, /model opus, or /model haiku. /model default picks a smart default and may auto-route Opus for harder turns.The mindset that makes Claude great
The difference between a frustrating session and a magical one is rarely the model — it is how you talk to it. Four habits carry almost all the weight.
Be specific
Name files, state constraints, point to an example. “Claude can infer intent, but it can’t read your mind.” Precise instructions mean fewer corrections.
Give context
Share the goal, the relevant code, and what “done” looks like. Good context up front beats ten rounds of clarification.
Iterate
Treat the first answer as a draft. Steer, refine, and rewind. If two corrections fail, start fresh rather than piling on.
Delegate
Hand off broad research or a self-contained task to a subagent. It works in its own context and returns just the summary you need.
How this hub is organized
- Foundations — the surfaces, models, and core mindset (you are here).
- Core — everyday moves: prompting well, Projects, slash commands, managing context.
- Power Features — subagents, skills, MCP servers, hooks, and plan mode.
- Mastery — automation, plugins, prompt caching, and multi-agent orchestration.
Work through it in order for a clean ramp, or jump straight to the layer that matches what you need today.
Pick the Right Brain
Match the model to the job — pay for power only when it earns its keep
Four Claude tiers, one rule: cheap for simple, balanced for daily work, deepest reasoning when it pays off — and Fable 5 when the job is genuinely hard.
In 2026 the lineup is Opus 4.8 for hard reasoning, Sonnet 4.6 as the everyday coding workhorse, and Haiku 4.5 for fast, high-volume work. Picking right saves money and time without hurting quality. The official advice is blunt: use Haiku for simple tasks, Sonnet for most production workloads, Opus only for the most complex reasoning.
claude-fable-5) is now the most capable model — state-of-the-art on nearly every benchmark — at $10 / $50 per MTok. Use it for the hardest work; keep Haiku/Sonnet for everyday tasks. See What's New →Meet the lineup
Fable 5
The new top tier (June 2026). Best-in-class coding, analysis, vision, and science; runs autonomously across millions of tokens. Premium price — reach for it only on genuinely hard tasks.
Opus 4.8
Anthropic's most capable model. Built for long-horizon agentic coding, large-scale refactors, and complex systems engineering. Moderate speed, premium price.
Sonnet 4.6
The best balance of speed and intelligence. Your default for code generation, data analysis, and agentic tool use. Fast and mid-priced.
Haiku 4.5
The fastest model with near-frontier intelligence. Great for real-time apps, bulk processing, and sub-agent tasks. Cheapest by far.
Side-by-side comparison
| Model | API id | Speed | Price (in / out per MTok) | Context | Max output | Thinking |
|---|---|---|---|---|---|---|
| Opus 4.8 | claude-opus-4-8 | Moderate | $5 / $25 | 1M | 128k | Adaptive |
| Sonnet 4.6 | claude-sonnet-4-6 | Fast | $3 / $15 | 1M | 64k | Extended |
| Haiku 4.5 | claude-haiku-4-5 | Fastest | $1 / $5 | 200k | 64k | Extended |
think hard and ultrathink.The routing cheat sheet
When in doubt, follow this. Start cheap and upgrade only when you hit a real capability gap.
| Your task | Use | Why |
|---|---|---|
| Bulk edits, log lines, formatting, simple Q&A, sub-agents | Haiku | Fastest and 5x cheaper than Opus |
| Main development: code gen, data analysis, tool use | Sonnet | Best speed/intelligence balance for production |
| Architecture, multi-file debugging, big refactors, deep research | Opus | Deepest reasoning when a wrong answer is costly |
How to switch models
In Claude Code
Use /model inside a session. Run it bare to open a picker, or name a model to switch and save it as your default.
/model # open interactive picker
/model opus # switch to Opus and save as default
/model sonnet # switch to Sonnet
/model haiku # switch to Haiku
/model default # recommended default; may auto-route Opus for hard tasks
Press s in the picker to switch for the current session only without changing your saved default.
At launch from the CLI
The --model flag sets the session model and overrides both the model setting and the ANTHROPIC_MODEL env var. Add --fallback-model for automatic failover.
claude --model sonnet
claude --model claude-opus-4-8 --fallback-model sonnet
claude -p "summarize these logs" --model haiku # headless, cheap
In the Claude.ai app
Use the model picker near the chat input to choose Opus, Sonnet, or Haiku per conversation. Switching is per-chat and instant.
Big context windows (1M tokens)
Opus 4.6/4.7/4.8 and Sonnet 4.6 carry a 1M-token context window; Haiku 4.5 has 200k. Best of all, there is no surcharge tier — a 900k-token request costs the same per token as a 9k one, and caching plus batch discounts apply across the full window.
Squeeze more value per dollar
Two built-in levers cut cost hard. Prompt caching is automatic in Claude Code — cache reads are 90% cheaper than fresh input. The Batch API gives a flat 50% off when latency does not matter.
| Model | Cache hit (read) | Batch (in / out) |
|---|---|---|
| Opus 4.8 | $0.50 / MTok | ~$2.50 / $12.50 |
| Sonnet 4.6 | $0.30 / MTok | $1.50 / $7.50 |
| Haiku 4.5 | $0.10 / MTok | $0.50 / $2.50 |
Route per sub-agent, not just per session
You don't have to pick one model for everything. Give each sub-agent its own model so cheap workers stay on Haiku while the main thread reasons on Opus or Sonnet.
# .claude/agents/researcher.md frontmatter
---
name: researcher
description: Searches the codebase, use proactively
model: haiku
tools: Read, Grep, Glob
---
The model field accepts an alias (sonnet, opus, haiku), a full id, or inherit (the default). For one-off overrides, set the CLAUDE_CODE_SUBAGENT_MODEL env var, which wins over frontmatter.
Heads-up on dates
20250514 ids) retire Jun 15 2026 — migrate to Opus 4.8 and Sonnet 4.6.Install & First Run
From zero to your first prompt in five minutes
Claude Code is a terminal tool that turns your codebase into a workspace you can talk to.
This section gets you installed, signed in, and running your first session safely. You only do the install and login once; after that you just type claude in any project.
Step 1 — Install the CLI
Claude Code runs from your terminal. The fastest way to install it is the official script, which works on macOS and Linux.
# macOS / Linux
curl -fsSL https://claude.com/install.sh | bash
# Or via npm (any OS with Node 18+)
npm install -g @anthropic-ai/claude-code
claude --version. If the command is not found, restart your terminal so it picks up the new PATH.Step 2 — Sign in
The first time you launch Claude Code it walks you through login in your browser. You can sign in with a Claude subscription (Pro or Max) or an Anthropic Console API account.
# Start Claude, then follow the browser prompt
claude
# Or trigger login directly any time
/login
/doctor to diagnose install and auth problems, and /status to see which account you are signed in as.Step 3 — Run Claude in a project
Always cd into a real project folder before starting. Claude reads files relative to where you launch it, so the folder you start in becomes its working directory.
- Open a terminal in your project root.
- Type
claudeand press Enter. - Type a question in plain English and press Enter.
cd ~/Projects/my-app
claude
Three ways to start
Interactive
claude opens the live REPL where you chat back and forth. This is the normal way to work.
With a first prompt
claude "explain this repo" opens the REPL but starts working on that prompt right away.
One-shot (headless)
claude -p "list the API routes" prints one answer and exits. Great for scripts.
# Pipe a file in and get a quick answer back
cat error.log | claude -p "explain this stack trace"
The interactive REPL
The REPL is the prompt box where you type. Anything that does not start with / is a message to Claude; anything starting with / is a built-in command. Try these first.
| Type this | What it does |
|---|---|
/help | Lists every slash command and basic usage. |
/status | Shows your account, model, and session info. |
/clear | Wipes the conversation to free up context. |
/cost | Shows token usage for the session. |
Step 4 — Generate CLAUDE.md with /init
Run /init once per project. It scans your repo and writes a CLAUDE.md file describing the structure, build and test commands, and conventions. Claude reads this file at the start of every future session, so it remembers how your project works.
/init
/init, open CLAUDE.md and trim it. For each line ask: would removing this make Claude make a mistake? A short, sharp file beats a bloated one that gets ignored.You can edit memory any time with /memory, or append a quick note by starting a line with # in the prompt box.
Trust and permissions (read this)
Claude Code asks before it edits files or runs shell commands. The first time you open a new folder it also asks you to confirm you trust that directory. This is your safety net, so do not blindly approve everything.
Permission modes
Press Shift+Tab to cycle modes during a session, or start in one with a flag. plan mode is the safest place to begin: Claude can read and think, but cannot change anything.
| Mode | Behavior |
|---|---|
| default | Asks before edits and commands. |
| plan | Read-only. Explores and proposes a plan, makes no changes. |
| acceptEdits | Auto-approves file writes; still asks for risky shell commands. |
# Start a session in plan mode
claude --permission-mode plan
--dangerously-skip-permissions while learning. It skips every prompt, including destructive ones. Use it only in throwaway sandboxes you do not care about.To see or change which tools are auto-allowed, run /permissions. Approved rules are saved so you are not asked the same thing twice.
How to exit
When you are done, leave the REPL cleanly. Your conversation is saved so you can pick it back up later.
# Inside the REPL, any of these exit:
/exit
# or press Ctrl+D
# or press Ctrl+C twice
claude --continue (or -c) to resume the most recent conversation in that folder, or claude --resume to pick from a list.Your first five minutes, recap
- Install with the script, verify with
claude --version. cdinto a project and runclaude, then/loginif prompted.- Run
/initto createCLAUDE.md, then prune it. - Start in Shift+Tab plan mode and read each permission prompt.
- Exit with
/exit; resume later withclaude -c.
Terminal Commands & Flags
Drive Claude from your terminal: interactive, headless, and scriptable
The claude command is two tools in one: a live coding partner and a scriptable engine you can pipe, loop, and bake into CI.
Run it plain for a chat session, or add -p to get one answer and exit. Everything in between is flags that tell Claude which model, which session, and how much it's allowed to touch.
Three ways to start
Interactive REPL
Type claude alone for a full session, or claude "query" to open with a first prompt already sent.
Headless
claude -p "query" prints the answer and exits. The -p (alias --print) flag is the gateway to all scripting.
Piped input
Feed any file or command output to Claude over stdin, then ask a question about it.
# Interactive, with an opening prompt
claude "Walk me through the auth module"
# Headless one-shot: print and exit
claude -p "List the top 3 risks in this repo"
# Pipe content in, ask about it
cat error.log | claude -p "Explain the root cause in 2 sentences"
-p — including --continue, --resume, --model, --permission-mode, and --output-format.Resuming work: continue vs resume
When: you closed the terminal but want to pick up where you left off. --continue grabs the latest; --resume lets you choose.
# Reopen the most recent conversation in this directory
claude --continue # alias: claude -c
# Scripted continuation (combine with -p)
claude -c -p "Now check for type errors"
# Pick a specific past session by name or ID
claude --resume auth-refactor # alias: claude -r
claude -r abc123 "Pick up the review"
# No argument = interactive picker of past sessions
claude --resume
--name / -n so it's easy to find later: claude -n auth-refactor. As of v2.1.144, background sessions show in the picker tagged bg.Picking a model
Why: match the model to the job — Haiku for speed, Sonnet for most work, Opus for the hardest reasoning. --model overrides both the model setting and the ANTHROPIC_MODEL env var.
# Use an alias
claude --model opus
claude --model sonnet
# Or a full model id
claude --model claude-sonnet-4-6
# Add a fallback if the primary is unavailable
claude --model opus --fallback-model sonnet
Permission modes: how much can it touch?
When: you want Claude to plan without editing, auto-accept edits, or lock things down for automation. --permission-mode overrides the defaultMode from your settings.
# Read, search, and think only — no edits, no destructive commands
claude --permission-mode plan
# Auto-approve file writes + mkdir/touch/mv/cp
claude --permission-mode acceptEdits
# Locked-down automation: deny anything not in your allow rules
claude -p --permission-mode dontAsk "Run the test suite"
default, acceptEdits, plan, auto, dontAsk, bypassPermissions. Reach for --allowedTools to auto-approve only specific tools instead of opening everything.Adding extra working directories
Why: let Claude read and edit files outside the current folder (a sibling app, a shared lib). Each path is validated as an existing directory.
claude --add-dir ../apps ../lib
--add-dir grants file access but most .claude/ config is NOT discovered from added dirs. To persist them across sessions, set permissions.additionalDirectories in your settings file. In-session, /add-dir <path> does the same.One-shot scripting that survives turns
How: use --output-format json to capture a session_id, then feed it back with --resume. This is the backbone of multi-step automation.
# Capture the session id from a headless run
session_id=$(claude -p "Start a code review" \
--output-format json | jq -r '.session_id')
# Continue that exact session in a later step
claude -p "Now summarize the findings" --resume "$session_id"
--bare. It skips auto-discovery of hooks, skills, plugins, MCP servers, and CLAUDE.md, giving Claude only Bash + file read/edit. Load context explicitly via --append-system-prompt, --settings, --mcp-config, or --agents. It's recommended for scripted/SDK calls and is slated to become the default for -p.The flags that matter most
| Flag | Alias | What it does |
|---|---|---|
--print | -p | Headless: run, print the answer, exit. Enables piping and scripting. |
--continue | -c | Resume the most recent conversation in this directory. |
--resume | -r | Resume a session by ID/name, or open a picker with no argument. |
--model | — | Set the model by alias (sonnet, opus) or full id. |
--fallback-model | — | Model to use if the primary is unavailable. |
--permission-mode | — | Start in plan, acceptEdits, dontAsk, etc. |
--add-dir | — | Add extra working directories for read/edit. |
--output-format | — | json exposes the session_id for chaining steps. |
--name | -n | Label a session so you can --resume it by name. |
--allowedTools | — | Auto-approve specific tools without prompting. |
--bare | — | Skip auto-discovery for fast, reproducible scripted runs. |
--fork-session | — | Resume but write to a new session ID instead of reusing it. |
Two flags to handle with care
# Skip ALL permission prompts (use with caution)
claude --dangerously-skip-permissions
# Safer: start in plan mode, but make bypass available later
# via the Shift+Tab mode cycle
claude --permission-mode plan --allow-dangerously-skip-permissions
--dangerously-skip-permissions removes every guardrail. The --allow-dangerously-skip-permissions variant only adds bypassPermissions to the Shift+Tab cycle — you still start in a safe mode and switch deliberately.Monitoring parallel agents
claude agents opens a view to monitor and dispatch background sessions; claude agents --json prints live sessions as a JSON array for scripting. It shares --model, --permission-mode, --add-dir, and --mcp-config with the top-level command.
-p for "answer and leave," -c for "keep going," --resume for "go back to that one," and --output-format json + --resume for everything stitched together in a script.Slash Commands
Type / to drive the REPL — and build your own commands
Slash commands are the steering wheel of the Claude Code REPL: type / and a menu of built-in controls and your own custom commands appears.
Built-in commands manage your session, context, models, and tools. Custom commands are just Markdown files you drop in a folder. Master both and you stop fighting the tool and start flying it.
/help any time to list every available command, including ones this guide does not cover. Press / at the prompt to fuzzy-search them inline.Context control: /clear vs /compact
This is the single most important pair. Claude's context window fills fast, and quality drops as it fills. These two commands are how you fight back.
/clear
Wipes the whole conversation and frees all context. Fresh context is free. Use it between unrelated tasks.
/compact
Replaces history with a structured summary, keeping key facts. Use mid-task when you still need what came before.
| Situation | Use |
|---|---|
| Finished a task, starting a new unrelated one | /clear |
| Mid-feature, context near full, still need the thread | /compact |
| Stuck after two failed fixes on the same bug | /clear, then re-prompt |
/compact takes optional focus instructions so the summary keeps what matters.
/compact focus on the API changes and the failing test
/compact and switching models invalidate the prompt cache, forcing a slow, expensive uncached turn. Plan when you trigger them.Pick the right model: /model
Switch the active model without restarting. Run it bare to open a picker, or pass an alias.
/model opus # deepest reasoning, hard refactors
/model sonnet # best all-round speed + intelligence
/model haiku # fastest, cheap, simple tasks
/model default # recommended default; may auto-route Opus on hard work
Session lifecycle: /resume and friends
Your conversations are saved. Pick one back up instead of starting cold.
/resume # lists past conversations; pick one to continue
/resume auth-fix # resume a session you named earlier
From the shell, the same idea works via flags: claude --continue (alias -c) reopens the most recent conversation, and claude --resume <id-or-name> targets a specific one.
# Scriptable continuation
claude -c -p "Check for type errors"
Inspect spend and tools
/cost
Shows token usage and estimated cost for the session. On Pro/Max plans it reflects usage, not billed dollars.
/mcp
Lists configured MCP servers, connection status, and tools — and is where you authenticate (OAuth) to remote servers.
Project memory: /init, /memory, /config
These shape how Claude understands your repo and behaves across sessions.
| Command | What it does |
|---|---|
/init | Scans the repo and generates a starter CLAUDE.md with structure, build/test commands, and conventions. |
/memory | Opens your CLAUDE.md memory files in your editor for direct editing. |
/config | Opens the settings interface for theme and toggles; values persist in settings.json. |
Memory files load hierarchically: enterprise, then user (~/.claude/CLAUDE.md), then project (./CLAUDE.md), then local. Type # at the start of a prompt to quickly append a memory line.
CLAUDE.md ruthlessly. For each line ask: would removing this cause Claude to make a mistake? A bloated memory file gets ignored because rules get lost in the noise.Quality and editing helpers
/review
Requests a code review. /review <PR-number> reviews a specific pull request. /security-review focuses on security of current changes.
/agents
Opens an interactive manager to create, edit, and view subagents stored as Markdown in .claude/agents/.
/vim
Enables Vim-style modal editing for the input prompt — NORMAL/INSERT modes with h j k l and common motions.
Custom slash commands
Here is where power users pull ahead. A custom command is just a Markdown file. The filename (minus .md) becomes the command name. The body is the prompt Claude runs.
| Location | Scope | Example file |
|---|---|---|
.claude/commands/ | Project (shared via git) | optimize.md → /optimize |
~/.claude/commands/ | User (all your projects) | standup.md → /standup |
Pass arguments with $ARGUMENTS (all args) or $1, $2 for positional ones. Here is .claude/commands/fix-issue.md:
---
description: Fix a GitHub issue by number
argument-hint: <issue-number>
model: sonnet
allowed-tools: Bash(gh issue view:*), Read, Edit
---
Fix issue #$1. First read it:
!gh issue view $1
Then locate the bug, apply a fix, and explain your change.
Invoke it with the argument inline:
/fix-issue 482
! run bash and inject the output (requires allowed-tools), and @path/to/file injects that file's contents into the prompt.Frontmatter fields you can set: description, argument-hint, model, and allowed-tools to restrict which tools the command may use.
Namespacing
Subdirectories namespace commands for organization. A file at .claude/commands/frontend/component.md is still invoked as /component but shows a project:frontend label.
.claude/commands/
optimize.md -> /optimize (project)
frontend/component.md -> /component (project:frontend)
~/.claude/commands/
standup.md -> /standup (user)
.claude/skills/deploy-staging/SKILL.md also gives you /deploy-staging. Use a single .md for a one-shot prompt; use a Skill folder when you need bundled scripts and reference files.Shortcuts, @files & ! bash
Pull files in, push commands out, and steer without breaking flow
The REPL is a cockpit: @ pulls files into context, ! pushes shell output back, and a handful of keys let you steer mid-task without losing your place.
Three in-prompt prefixes do most of the heavy lifting. @ references a file or directory as context, ! runs a shell command and feeds its output into the chat, and # at the very start of a prompt appends a line to memory. Learn these and you stop copy-pasting forever.
@ — reference files
Type @ and a path to load that file or folder as context. Tab-completes paths.
! — run bash inline
Start a line with ! to run a shell command; its output joins the conversation.
# — quick memory
Start a prompt with # to append a note to a CLAUDE.md memory file on the fly.
@ — reference files and folders as context
The @ prefix tells Claude exactly which files to read instead of making it search. Be specific: pointing at the right files is the single biggest way to cut corrections. It tab-completes, so start typing the path and press Tab.
> Refactor the auth flow in @src/auth/login.ts and keep it consistent with @src/auth/session.ts
> Summarize what changed across @docs/changelog.md
> Use the patterns in @examples/ when you write the new handler
@src/handlers/ to give Claude the file listing, then let it open only what it needs. Cheaper than dumping every file.@path in the command body injects that file's contents into the prompt — the same syntax works inside .claude/commands/*.md.! — run a shell command inline
Prefix any line with ! to execute it in your shell and drop the output straight into the chat. When: you want Claude to see real state — a failing test, git status, a directory listing — without describing it yourself. Why: the output becomes context Claude can act on immediately.
> !git status
> !npm test 2>&1 | tail -40
> !ls -la src/ && cat package.json
A common combo: run a command with !, then ask Claude to fix what it sees.
> !pytest -x
> The first failure above is in test_kelly.py. Fix the implementation, not the test.
! command runs in your real shell with your permissions. Treat it like a terminal — don't pipe in anything you wouldn't run yourself.Drag, drop & paste images
Claude Code reads images. Paste a screenshot or drag an image file into the terminal and ask about it — great for UI bugs, error dialogs, diagrams, or design mockups.
| Action | How |
|---|---|
| Paste from clipboard | Ctrl+V (use this, not Cmd+V, on macOS) |
| Drag a file in | Drag the image onto the terminal window |
| Reference a saved image | @screenshots/error.png |
> Here's the broken layout [paste screenshot]. The sidebar overflows on mobile — fix the CSS in @src/styles/sidebar.css
Keyboard shortcuts that matter
The keys below are the difference between fighting the tool and flowing with it. The two most valuable are Esc (stop without losing context) and Shift+Tab (cycle permission modes).
| Keys | What it does |
|---|---|
| Esc | Stop Claude mid-action; context is preserved so you can redirect |
| Esc Esc | Open the rewind menu (same as /rewind) to restore an earlier state |
| Shift+Tab | Cycle permission modes: normal → plan → auto-accept |
| Ctrl+B | Background a running task so you can keep working |
| Ctrl+C | Cancel current input / interrupt |
| Ctrl+O | Toggle verbose mode (see thinking output) |
| Up / Down | Scroll through your prompt history |
| Option+T / Alt+T | Toggle extended thinking on or off |
Multi-line input
To write a prompt that spans several lines, use one of these:
| Method | Keys |
|---|---|
| Quick newline | \ then Enter (backslash continuation) |
| Standard newline | Option+Enter (macOS) / Shift+Enter |
| One-time setup | Run /terminal-setup to bind Shift+Enter reliably |
/vim to get NORMAL/INSERT modes with h j k l navigation in the prompt editor.Steering mid-task
You don't have to wait for Claude to finish a bad path. Steering early saves context and time.
- Hit Esc the moment it heads the wrong way — it stops with context intact.
- Type a correction and submit; Claude resumes with your new direction.
- If the whole attempt was wrong, press Esc Esc and rewind — this erases the failed attempt from context entirely.
/btw to ask a quick side question in a dismissible overlay — the answer doesn't pollute your main conversation history.Putting it together
A realistic loop using all three prefixes plus a key: gather state with !, scope the fix with @, plan it, then execute.
> !git diff --stat
> Review the changes in @src/api/routes.py against @CLAUDE.md conventions.
> [Shift+Tab to enter plan mode first, then let it execute]
> !curl -s localhost:8000/health | jq .
Claude in Your IDE
Same Claude Code engine, now living inside your editor
The VS Code extension and JetBrains plugin wire Claude Code straight into your editor, so plans, edits, and diffs show up right next to your code.
The IDE integration is not a different product. It drives the exact same Claude Code engine as the terminal, just with a graphical surface for selections, diffs, and diagnostics. You get inline-diff review, automatic selection sharing, and live error feedback without leaving your editor.
Two integrations, one engine
VS Code extension
Publisher id anthropic.claude-code. Adds a native graphical panel and bundles the CLI for you. Needs VS Code 1.98.0+. Also installs in forks (Cursor, Windsurf, Kiro, Devin Desktop) and on Open VSX.
JetBrains plugin
Named Claude Code [Beta] (publisher Anthropic PBC). Covers IntelliJ, PyCharm, WebStorm, GoLand, PhpStorm, RubyMine, Rider, CLion, and Android Studio. It shells out to the claude CLI, which you must install and authenticate first.
Install: VS Code
- Open the Extensions panel with Cmd+Shift+X (Mac) or Ctrl+Shift+X (Windows/Linux).
- Search for Claude Code and click Install (or use the marketplace 'Install for VS Code' link).
- Open the panel and sign in. The CLI is bundled, so there is nothing else to set up.
The VS Code extension panel is the integration. You do not run any attach command. Quick-launch or focus it from the editor with Cmd+Esc (Mac) or Ctrl+Esc (Windows/Linux).
Install: JetBrains
- Install and authenticate the CLI first: run
claudeonce in a terminal and sign in. - In the IDE, go to Settings -> Plugins -> Marketplace and search Claude Code.
- Click Install, then fully restart the IDE (a reload is not enough).
JetBrains settings live under Settings -> Tools -> Claude Code [Beta]: the claude command path, the 'command not found' notification toggle, Option+Enter for multi-line prompts (macOS, needs a terminal restart), and auto-updates.
What the IDE adds
Selection as context
Highlight code and Claude sees it automatically. The prompt footer shows the line count and the transcript logs 'Selected N lines from <file>'.
Side-by-side diff review
Every proposed edit opens in the editor's native diff viewer with accept / reject / redirect. Edit the proposal in the diff and Claude is told you changed it.
Live diagnostics
The integration shares lint and syntax errors with Claude, so it can react to real problems instead of guessing.
@-mention shortcut
One keystroke inserts a file reference with a line range, like @app.ts#5-10, so you point at exact code fast.
File-reference keys
| Editor | Mac | Windows/Linux | Inserts |
|---|---|---|---|
| VS Code | Option+K | Alt+K | @app.ts#5-10 |
| JetBrains | Cmd+Option+K | Alt+Ctrl+K | @src/auth.ts#L1-99 |
Connecting the terminal to the editor
Run claude inside the IDE's integrated terminal and it auto-connects: diff viewing and diagnostic sharing turn on by themselves. From an external terminal, attach with one command.
# In an external terminal, after launching claude:
/ide
# attaches the running session to your open editor
Choose where diffs render with the config command. Keep them in the IDE for visual review, or push them back to the terminal.
# Open diffs in the editor's native viewer:
/config diff tool auto
# Or keep diffs in the terminal:
/config diff tool terminal
Permission modes in the editor
In VS Code the mode indicator at the bottom of the prompt box switches behavior. Plan mode opens the plan as a full markdown document you can comment on inline before any edit happens.
| Mode | Behavior |
|---|---|
| normal | Asks before each action |
| Plan mode | Read-only; describes a plan and waits for approval |
| auto-accept | Applies edits without asking |
Set the default mode in VS Code settings so every new session starts the way you want.
// VS Code settings.json
"claudeCode.initialPermissionMode": "plan"
// values: default | plan | acceptEdits | bypassPermissions
Settings: what lives where
Editor-specific options live in the extension settings, but the rules that matter for safety and tooling are shared with the CLI. The same precedence applies across CLI, VS Code, and JetBrains.
In the editor
VS Code keys like useTerminal (default false), preferredLocation (panel or sidebar), autosave (default true), and useCtrlEnterToSend. Use Shift+Enter for a newline without sending.
Shared
~/.claude/settings.json holds allowed commands, env vars, hooks, and MCP servers for both the extension and the CLI. One source of truth for permissions.
Under the hood: the 'ide' MCP server
When the VS Code extension is active it runs a local MCP server named ide that the CLI connects to automatically. It binds to 127.0.0.1 on a random high port and writes a fresh auth token to a lock file under ~/.claude/ide/ with 0600 permissions.
It hosts about a dozen tools, but only two are exposed to the model: mcp__ide__getDiagnostics and mcp__ide__executeCode. The rest are internal RPC for opening diffs, reading selections, and saving files.
When the IDE flow beats the terminal
Reach for the editor flow when the work is visual or edit-heavy. Stick with the pure terminal for headless runs, CI, and tight keyboard loops.
Use the IDE when
You are reviewing multi-file diffs, refactoring with reference safety, pointing at exact selections, or want live lint and syntax errors fed back automatically.
Use the terminal when
You are scripting claude -p, running in CI, chaining commands, or working over SSH where a graphical diff viewer adds nothing.
Quick-start in 60 seconds
- Install the extension or plugin and sign in.
- Open Claude with Cmd+Esc / Ctrl+Esc.
- Highlight the function you want changed so it becomes context.
- Ask: 'Add input validation here and run the tests.'
- Review the side-by-side diff, tweak it inline if needed, then accept.
Plan Mode & Extended Thinking
Look before you leap, then think as hard as the problem demands
The two cheapest upgrades to your results: make Claude plan before it touches code, and let it think harder only when the problem is actually hard.
Plan mode and extended thinking solve different problems. Plan mode controls what Claude is allowed to do. Extended thinking controls how much it reasons before it acts. Used together, they turn a guess-and-check session into a deliberate one.
Plan Mode: explore, propose, approve, then execute
Plan mode is enforced at the tool level. Claude literally cannot edit files or run destructive commands while in it. It can only read, search, and think, then hand you a written plan to approve.
Toggle it with Shift+Tab, which cycles through normal, plan, and auto-accept modes. You can also start a session already in plan mode from the CLI.
claude --permission-mode plan
Why it prevents wasted work
When Claude edits first and reasons later, a wrong assumption becomes ten wrong file changes you have to unwind. A plan surfaces that assumption in one paragraph you can correct in one sentence, before any code is written.
There is also a hidden performance win. Front-loading context (reading files, fixing the plan, stating constraints) writes a stable prefix to the prompt cache once. Every later execution turn then reads from that cached prefix, which is the documented cause of very high (90%+) cache-hit rates.
How to ask for a plan first
Be specific. Name the files, state the constraints, and tell Claude not to write code yet. Vague requests cause endless exploration that floods your context.
Read src/auth/session.ts and src/auth/tokens.ts.
Propose a plan to add refresh-token rotation.
Do NOT edit anything yet — show me the plan and wait for approval.
For a larger change, ask Claude to write the plan to disk so it survives any context reset or /clear between phases.
Investigate the checkout flow and write a phased plan to PLAN.md.
Keep each phase to one session of work. List files touched per phase.
Behind the scenes: the Plan subagent
In plan mode Claude uses a built-in read-only Plan subagent. Like the Explore agent, it skips CLAUDE.md and the git-status snapshot so it stays focused on reading and proposing. You don't configure it; it's part of the mode.
Extended Thinking: spend reasoning where it pays
Extended thinking is on by default with roughly a 31,999-token reasoning budget. You escalate that budget with keywords right inside your prompt. More thinking buys reasoning depth, not magic.
| Keyword | Budget | Use for |
|---|---|---|
think | Lowest | Moderately tricky logic, small refactors |
think hard / think harder | Higher | Multi-file changes, non-obvious bugs |
ultrathink | Maximum | Architecture, hard multi-file debugging, costly decisions |
ultrathink: our BGP failover drops sessions for ~30s on link flap.
Walk through the timers, the convergence path, and propose fixes
ranked by blast radius. Don't change configs yet.
Toggling and capping it
Flip thinking on or off mid-session with Option+T on Mac (Alt+T elsewhere). Cap the budget with an env var, or set a coarse effort level with a slash command.
# Hard cap the thinking budget for a whole session
export MAX_THINKING_TOKENS=10000
# In-session: dial reasoning effort up or down
/effort high # low | medium | high
/effort auto # back to default
A model note worth knowing
The thinking story differs by model. Opus 4.x uses adaptive thinking rather than the standard extended-thinking toggle, while Sonnet 4.6 and Haiku 4.5 support extended thinking. The practical workflow is the same either way; the model decides how to apply the reasoning.
Putting them together
The highest-leverage habit is to combine them: enter plan mode, ask Claude to think hard about the approach, approve the plan, then drop back to normal mode to execute.
- Press Shift+Tab until you see plan mode.
- Prompt:
think hard and propose a plan to refactor the payment retry logic. Read the relevant files first. - Read the plan. Correct any wrong assumption in one line.
- Approve, then let Claude execute against the now-cached plan.
Plan mode
Controls permissions. Read-only until you approve. Catches bad assumptions before code is written.
Extended thinking
Controls reasoning depth. Escalate with think / think hard / ultrathink only when the problem earns it.
Together
Plan + think hard = a deliberate, cache-efficient session that wastes far fewer correction rounds.
/rewind) to erase the failed attempt from context, then re-plan with sharper constraints. Correcting leaves dead ends polluting context; rewinding removes them.CLAUDE.md & Memory
Teach Claude your project once — it remembers every session
A good CLAUDE.md is the single highest-leverage upgrade you can make: it loads at the start of every session so Claude stops guessing your conventions.
CLAUDE.md is a plain Markdown file Claude reads automatically before it does anything else. Put your build commands, code style, and gotchas there once, and they apply to every conversation in that repo. It is the difference between Claude inferring your setup and Claude knowing it.
/compact. That means good rules survive context resets for free.The memory hierarchy
Memory files load in layers: enterprise, user, project, and local. More specific layers take precedence, so a project file beats your personal defaults.
| Layer | Location | Scope |
|---|---|---|
| Enterprise | managed policy file | Org-wide, set by admins |
| User | ~/.claude/CLAUDE.md | All your projects, private |
| Project | ./CLAUDE.md | This repo, shared with team via git |
| Local | project-local, untracked | This repo, just you (gitignored) |
./CLAUDE.md so your whole team gets the same Claude behavior. Keep personal quirks in ~/.claude/CLAUDE.md or a local, gitignored memory file.Get started fast
You almost never write CLAUDE.md from scratch. Let Claude scan the repo and draft one, then trim it.
- Run
/initin a repo. Claude reads the codebase and generates a starter CLAUDE.md with structure, build/test commands, and conventions. - Run
/memoryto open your memory files in your editor and clean up the draft. - Prune ruthlessly (see the rule below), then commit it.
The # shortcut: add a memory mid-flow
When Claude makes a mistake or you spot a convention worth saving, start a prompt with #. Claude appends it to a memory file for you, so you never lose the lesson.
# Always run tests with `pnpm test`, never `npm test`
# The staging deploy script must be run from the repo root
It lets you pick which memory file to write to (user vs project), so a one-off correction becomes a permanent rule in seconds.
What to put in (and leave out)
The golden test for every line: "Would removing this cause Claude to make a mistake?" If no, delete it. A bloated CLAUDE.md gets ignored because the real rules drown in noise.
Include
Non-guessable bash commands, non-default code style, the exact test runner, repo etiquette, project architecture, and env quirks or gotchas.
Exclude
Anything inferable from the code, standard language conventions, and info that changes often (it goes stale and misleads).
Make rules stick
Claude weights emphatic instructions more heavily. Use IMPORTANT or YOU MUST for rules that really matter.
.claude/rules/ files. That keeps the always-loaded core lean while still having deep guidance available on demand.A real CLAUDE.md snippet
Concrete, scannable, and every line earns its place:
# Project: Acme API
## Commands
- Install: `pnpm install`
- Dev server: `pnpm dev` (port 3000)
- Test: `pnpm test` (Vitest) — NEVER use npm
- Typecheck: `pnpm typecheck`
- Lint+fix: `pnpm lint --fix`
## Code style
- TypeScript strict mode; no `any`.
- Immutable data only — never mutate inputs, return new objects.
- Prefer async/await over `.then()`.
## Architecture
- API routes live in `src/routes/`, business logic in `src/services/`.
- DB access goes ONLY through `src/repos/` (repository pattern).
## Gotchas
- IMPORTANT: run `pnpm db:migrate` after pulling — schema drifts often.
- The `legacy/` folder is frozen. YOU MUST NOT edit files there.
Inspect what loaded
Not sure which memory files Claude is actually using? Two commands give you ground truth.
/memory # opens loaded CLAUDE.md files for editing
/context # live token breakdown — see how much memory costs
Treat CLAUDE.md as a living document. Every time Claude repeats a mistake, capture the fix with # — over a few weeks your repo trains itself.
Skills
Teach Claude a procedure once, and it loads itself exactly when needed.
A Skill is a folder with a SKILL.md that Claude auto-loads the moment your request matches it, then forgets again to keep context lean.
Think of a skill as a reusable playbook on disk. Claude reads only the name and description at startup, and pulls in the full instructions only when they are actually relevant. This is called progressive disclosure, and it is why skills scale where giant prompts do not.
Anatomy of a skill
Every skill is a directory containing one SKILL.md file. That file has YAML frontmatter (between --- markers) plus markdown instructions. You can bundle extra scripts, reference docs, or data files alongside it.
.claude/skills/
└── deploy-staging/
├── SKILL.md # frontmatter + instructions
├── deploy.sh # bundled script (run via bash)
└── checklist.md # reference, loaded only if needed
.claude/skills/deploy-staging/SKILL.md becomes /deploy-staging. The name frontmatter field only changes the display label, not what you type.A minimal SKILL.md
Only description is recommended; every field is technically optional. The description is what Claude reads to decide whether to activate the skill, so make it state both what it does and when to use it.
---
name: deploy-staging
description: Deploy the app to the staging environment. Use when
the user asks to ship, deploy, or push a build to staging.
---
# Deploy to staging
1. Run the test suite: `npm test`
2. Build the bundle: `npm run build:staging`
3. Run the bundled script: `bash deploy.sh staging`
4. Confirm the health check at /healthz returns 200.
Never deploy if tests fail. Report the deployed commit SHA.
name is max 64 chars, lowercase letters / numbers / hyphens only, and cannot contain the words 'anthropic' or 'claude'. description is max 1024 chars with no XML tags. Gerund-style names like processing-pdfs are recommended.Where skills live
| Location | Scope | Command |
|---|---|---|
~/.claude/skills/ | All your projects (personal) | /name |
.claude/skills/ | This project, shared via git | /name |
Plugin skills/ dir | Whoever installs the plugin | /plugin:name |
.claude/commands/optimize.md still works and still creates /optimize. For anything new, prefer a skill folder so you can bundle scripts and resources.How activation works
You get the skill two ways. Claude can auto-activate it when your wording matches the description, or you can fire it directly by typing the slash command. Both reach the same instructions.
- At session start, Claude preloads only metadata: each skill's name + description (~100 tokens each).
- When your request matches a description, Claude reads the full
SKILL.mdfrom disk. - If the body points to other files or scripts, those load (or run via bash) only when actually needed.
Progressive disclosure: the three levels
Level 1 · Metadata
Name + description, ~100 tokens per skill. Always loaded at startup so Claude knows what exists.
Level 2 · Instructions
The SKILL.md body, kept under ~5k tokens. Loaded only when the skill is triggered.
Level 3+ · Resources
Bundled docs, scripts, data. Effectively unlimited; zero context cost until a file is read or a script is run.
Controlling who can invoke
Two frontmatter switches tune visibility. They trade off context budget against convenience.
| Frontmatter | Effect |
|---|---|
| (default) | Both you and Claude can invoke; description stays in context. |
disable-model-invocation: true | Only you can run it via /name. Description is NOT kept in context (saves budget). |
user-invocable: false | Only Claude invokes it; hidden from the / menu. Description stays in context. |
/skills menu, highlight a skill, press Space to cycle states (on name-only user-invocable-only off), then Enter. It writes skillOverrides in settings.When skills beat plain prompts
Reusable procedure
You re-explain the same multi-step task often. Write it once; let it auto-load.
Keeps context lean
Long instructions sit on disk at Level 2/3 instead of bloating every turn.
Bundled determinism
Ship a tested script next to the skill so Claude runs it instead of re-generating code.
Team-shareable
Commit .claude/skills/ and the whole team inherits the same playbook.
Reach for a plain prompt instead when the task is a one-off, or so simple that a sentence does the job. Skills earn their keep on repeated, structured work.
Authoring rules that matter
- Keep the
SKILL.mdbody under ~500 lines; split overflow into separate files. - Keep file references one level deep, and use forward slashes in paths.
- Add a table of contents to any reference file longer than 100 lines.
- Prefer a deterministic bundled script over asking Claude to write code each time.
- Put the most important instructions near the top — skill bodies are re-injected after
/compactbut capped at 5,000 tokens each.
skills: frontmatter field) gets the FULL skill body injected at startup, not just the description — so a worker agent always has its playbook ready without needing to be triggered.allowed-tools frontmatter field works only in the Claude Code CLI, not via the Agent SDK. In the SDK, filter with the skills option on query() and use the main allowedTools option instead. Claude Code supports filesystem skills only — there is no API upload.Subagents & Delegation
Spin up focused helpers that work in their own context and report back
Subagents are disposable specialists with their own fresh context window, so dirty research and verbose output never pollute your main chat.
A subagent is just a Markdown file with YAML frontmatter. The frontmatter sets identity and limits; the body is the subagent's system prompt. Claude can call them automatically or you can invoke them by name.
Why delegate at all
The core constraint of Claude Code is that context fills fast and quality drops as it fills. A subagent runs in its own window, does the noisy work, and returns only a short summary, keeping your main thread clean.
Isolated context
Each subagent starts fresh. Verbose searches, test logs, and dead ends stay out of your main conversation.
Scoped tools
Give a reviewer read-only access. The tools allowlist means it literally cannot edit or run destructive commands.
Cheaper model per job
Route grunt work to Haiku, hard reasoning to Opus, via the model field, independent of your main session.
Parallel work
Fan out several subagents on independent investigations at once, then let Claude merge their findings.
Where they live
Project agents go in .claude/agents/ (shared with your team). Personal agents go in ~/.claude/agents/. When names collide, the higher-priority location wins; project beats user.
| Location | Scope | Priority |
|---|---|---|
--agents CLI flag | This session only | Highest (after managed) |
.claude/agents/ | Project / team | High |
~/.claude/agents/ | All your projects | Medium |
Plugin agents/ | Installed plugins | Lowest |
A real agent definition
Only name and description are required. The tools line is an allowlist; omit it and the subagent inherits every tool from the main chat. Adding 'use proactively' to the description nudges Claude to delegate on its own.
---
name: code-reviewer
description: Reviews code for bugs, security, and style. Use proactively after writing or changing code.
tools: Read, Grep, Glob, Bash
model: sonnet
---
You are a senior code reviewer. When invoked:
1. Run `git diff` to see what changed.
2. Flag bugs, security issues, and missing error handling.
3. Group findings as CRITICAL / HIGH / MEDIUM / LOW.
4. Suggest concrete fixes. Do NOT edit files yourself.
Return a short, scannable summary only.
/agents to verify it loaded. Agents edited directly on disk need a session restart; agents created through the /agents UI take effect immediately.How to invoke one
Three ways to call a subagent, from least to most explicit.
- Auto-delegation — describe the task and Claude picks an agent based on its
description. - Natural language — name it: "Use the code-reviewer to check my last commit."
- @-mention — guarantees that exact agent runs.
# @-mention forces the named subagent to run
@agent-code-reviewer check the auth changes in src/auth/
# Or run an entire headless session AS a subagent
claude --agent code-reviewer -p "review the staged diff"
Pick the right model per agent
The model field takes an alias (haiku, sonnet, opus), a full id like claude-opus-4-8, or inherit. It defaults to inherit (same model as your main chat) when omitted.
| Agent job | Model | Why |
|---|---|---|
| Codebase search / scan | haiku | Fastest, cheap ($1/$5 per MTok), near-frontier for lookup |
| Code review / tests | sonnet | Best balance for most production work |
| Architecture / hard debugging | opus | Deepest reasoning for complex, costly decisions |
Running several in parallel
For independent investigations, spawn multiple subagents at once. Each gathers its own context; Claude synthesizes the results. This works best when the paths truly don't overlap.
Investigate three things in parallel, one subagent each:
1. How is auth wired in src/auth/ ?
2. Where are database migrations defined?
3. What does the CI pipeline run on push?
Report each as a short summary.
background: true in frontmatter to always background it.When delegation helps vs hurts
Delegate when
Output is verbose and you don't need it in main context, you want tool/permission limits enforced, or the work is self-contained and returns a summary.
Stay in the main chat when
You need frequent back-and-forth, phases share context, the change is a quick one-liner, or latency matters — subagents start cold and must re-gather context.
Agent, AskUserQuestion, and plan-mode tools are unavailable inside a subagent.Two power features
Add memory: project to give an agent persistent notes across sessions (stored in .claude/agent-memory/<name>/). Add skills: to preload full skill content into the subagent at startup — subagents get the entire skill injected, not just its description.
---
name: test-runner
description: Writes and runs tests; use proactively on new features.
tools: Read, Edit, Bash
model: sonnet
memory: project
skills: python-testing
isolation: worktree
---
Write failing tests first, then implement until green. Verify 80%+ coverage.
isolation: worktree for risky changes — the subagent runs in a throwaway git worktree, so it can experiment without touching your working tree. Combine with memory: project so it remembers conventions between runs.Restricting who can spawn what
In settings or an agent definition, control delegation with the Agent tool syntax. List allowed agents in parens, allow any with a bare Agent, or omit it to block spawning entirely.
# Only let this agent spawn the named workers
# (Task is the old name for Agent — still works as an alias)
Agent(researcher, test-runner)
# Disable a built-in subagent for the session
claude --disallowedTools "Agent(Explore)"
Built-ins worth knowing: Explore (Haiku, read-only search), Plan (read-only, used in plan mode), and general-purpose (all tools). Explore and Plan skip CLAUDE.md and git status for speed; every other agent loads the full memory hierarchy.
MCP Servers & Connectors
Plug Claude into your tools, your data, your whole stack
MCP (Model Context Protocol) is the universal adapter that lets Claude talk to outside tools and data — GitHub, browsers, databases, docs, and your own scripts.
Without MCP, Claude only has its built-in tools (read, edit, bash). With MCP, you bolt on new tools that show up right inside the session. One claude mcp add command wires a server in for good.
When and why to add an MCP
Add a server when Claude keeps needing data or actions it can't reach on its own. The goal is fewer copy-pastes and more real work done in-session.
Reach live data
Query a database, fetch current library docs, or read your Notion — instead of pasting it by hand.
Take real actions
Open PRs on GitHub, drive a browser with Playwright, or send a Slack message.
Persist memory
A memory server lets Claude store and recall facts across sessions.
Wrap your own scripts
Expose an internal CLI or API as a tool your whole team can call.
/mcp to see per-server tool cost and disable servers you aren't using.The three transports
A transport is just how Claude talks to the server. Pick based on where the server lives.
| Transport | Use it for | Flag |
|---|---|---|
| stdio | Local subprocess servers on your machine | default (no flag) |
| HTTP | Remote / cloud servers (recommended) | --transport http |
| SSE | Legacy remote — deprecated, use HTTP | --transport sse |
stdio: local process
Claude launches the server as a child process and talks over stdin/stdout. The -- separates Claude's flags from the command that starts the server — everything after it is passed through untouched.
claude mcp add playwright -- npx -y @playwright/mcp@latest
HTTP: remote server
Best for hosted/cloud services. Pass a name and a URL.
claude mcp add --transport http claude-code-docs https://code.claude.com/docs/mcp
--transport, --env, --scope, --header) must come BEFORE the server name. For stdio, the command goes last after --.Auth and environment variables
Use --env to pass secrets to a local server and --header to send auth on a remote one.
# Local server with an env var
claude mcp add github --env GITHUB_TOKEN=ghp_xxx -- npx -y @modelcontextprotocol/server-github
# Remote server with a bearer token
claude mcp add --transport http my-api --header "Authorization: Bearer $TOKEN" https://api.example.com/mcp
For remote servers that use OAuth, just run /mcp in-session and authenticate through the browser prompt — no manual token needed.
Scopes: who sees this server
Scope decides where the config is stored and who can use it. Set it with --scope.
| Scope | Visibility | Stored in |
|---|---|---|
local (default) | Just you, this project | ~/.claude.json (under project path) |
project | Whole team, shared | .mcp.json in repo root (commit it) |
user | Just you, all projects | ~/.claude.json |
# Share a server with your team via version control
claude mcp add --scope project --transport http docs https://code.claude.com/docs/mcp
local was project, and today's user was global. When the same server name exists in several scopes, precedence is local > project > user > plugin > connector (one definition wins, no merging).The .mcp.json file
Project-scoped servers live in .mcp.json at your repo root, checked into git so the whole team gets them. Claude reads it only at session start — restart after editing.
{
"mcpServers": {
"docs": {
"type": "http",
"url": "https://code.claude.com/docs/mcp"
},
"playwright": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"]
}
}
}
The type field accepts streamable-http as an alias for http, so configs copied straight from a server's docs work unchanged.
No secrets in the file
Use variable expansion so committed configs never hold real tokens. Supports ${VAR} and ${VAR:-default} in command, args, env, url, and headers.
{
"mcpServers": {
"my-api": {
"type": "http",
"url": "https://api.example.com/mcp",
"headers": { "Authorization": "Bearer ${API_TOKEN}" }
}
}
}
The /mcp command
Run /mcp in any session to see configured servers, their connection status, and the tools each one exposes. It's also where you authenticate (OAuth) to remote servers.
/mcp. Reset all approvals with claude mcp reset-project-choices.Managing servers from the CLI
claude mcp list # all configured servers
claude mcp get github # show details + which scope holds it
claude mcp remove github # remove (add --scope if in several)
claude mcp add-json db '{"type":"stdio","command":"..."}'
claude mcp add-from-claude-desktop # import (macOS/WSL only)
Example servers to start with
GitHub
Read repos, search code, open and review PRs, manage issues — all without leaving the session.
Playwright
Drive a real browser: navigate, click, fill forms, screenshot, and test web flows.
Docs (Context7 / Claude docs)
Pull current, version-accurate library documentation instead of relying on training data.
Memory
A knowledge-graph server that lets Claude store and recall facts across sessions.
Using MCP tools in automation
MCP tools are namespaced mcp__<server>__<tool>, which is how you reference them in --allowedTools for non-interactive runs.
claude -p "Open a PR for the fix" --allowedTools "mcp__github__create_pull_request"
Security: trust before you connect
- Prefer official servers and the Anthropic connector directory over random forks.
- Pin versions for stdio servers rather than always pulling
@latestfrom strangers. - Never hardcode secrets in
.mcp.json— use${VAR}expansion. - Treat tool descriptions and returned data as untrusted input (prompt-injection risk).
- Keep project servers reviewed in code review, since they ship to the whole team via git.
Hooks (Automation)
Deterministic guardrails the model can't skip
Hooks run your own shell commands at fixed moments in Claude's lifecycle, giving you control the model cannot talk its way around.
A hook is a command that fires automatically on an event like a file edit or a finished response. Because the harness runs it (not Claude), the behavior is deterministic and reliable. Use hooks to auto-format code, block edits to protected files, or inject fresh context.
hooks key in any settings file: ~/.claude/settings.json (all projects), .claude/settings.json (project, committable), or .claude/settings.local.json (gitignored). All matching hooks merge and run together.The lifecycle events
Events fire at three cadences: once per session, once per turn, and once per tool call. Pick the event that matches when you want to act.
| Event | Fires when | Can block? |
|---|---|---|
SessionStart | Session begins or resumes | No |
UserPromptSubmit | You submit a prompt (before Claude reads it) | Yes (erases prompt) |
PreToolUse | Before a tool call runs | Yes (blocks call) |
PostToolUse | After a tool call succeeds | No (already ran) |
Stop | Claude finishes responding | Yes (keeps going) |
How the config nests
The structure has three levels: event name, then matcher groups, then handler objects. A matcher filters which tool triggers the hook.
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{ "type": "command", "command": "echo ran" }
]
}
]
}
}
_/| means an exact match with | as "or" (e.g. Edit|Write). A *, empty, or omitted matcher matches everything. Anything else is treated as a regex (e.g. mcp__.* matches all MCP tools). Matchers are case-sensitive.Exit codes: how a hook talks back
A command hook receives event JSON on stdin and communicates through its exit code. This is the contract you must get right.
| Exit code | Meaning |
|---|---|
0 | Success. stdout may carry JSON for fine control. |
2 | Blocking error. stderr is fed back to Claude as feedback. |
| other | Non-blocking error. A warning shows; execution proceeds. |
exit 2 to enforce policy, not exit 1. Only exit 2 actually blocks. A PreToolUse hook that exits 2 blocks the tool even under bypassPermissions or --dangerously-skip-permissions — PreToolUse hooks run before any permission-mode check.Use case 1: Auto-format after every edit
The classic PostToolUse hook. It reads the edited file path from the event JSON with jq and pipes it to a formatter.
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "jq -r '.tool_input.file_path' | xargs -r prettier --write"
}
]
}
]
}
}
Swap prettier for black, gofmt, or rustfmt depending on your stack. Add an "if": "Edit(*.py)" field to scope it to one file type.
Use case 2: Block edits to protected files
A PreToolUse hook can inspect the proposed tool call and refuse it. Exit 2 cancels the edit and tells Claude why.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "f=$(jq -r '.tool_input.file_path'); case \"$f\" in *.env|*/secrets/*) echo 'Protected file, blocked.' >&2; exit 2;; esac"
}
]
}
]
}
}
PreToolUse, exit 0 and print JSON instead: hookSpecificOutput.permissionDecision set to allow, deny, or ask. Pick exit-code signaling OR JSON, never both. When several hooks run, the most restrictive wins (deny > ask > allow). Note that allow still respects existing deny/ask permission rules.Use case 3: Inject context at session start
For UserPromptSubmit and SessionStart, stdout is injected straight into Claude's context. That makes them perfect for feeding in live state like the current git branch.
{
"hooks": {
"SessionStart": [
{
"matcher": "startup",
"hooks": [
{ "type": "command", "command": "echo \"Branch: $(git branch --show-current)\"" }
]
}
]
}
}
The SessionStart matcher filters by source: startup, resume, clear, or compact.
Use case 4: Forced verification on Stop
A Stop hook runs when Claude thinks it's done. Run your tests there and return decision: "block" to push Claude to keep fixing until they pass.
stop_hook_active field in the input and exit 0 when it is true, or you will create an infinite loop.Handler types beyond shell
command
Run a shell command. Gets event JSON on stdin; replies via exit code and stdout.
http
POST the event to an endpoint — good for central audit or team policy servers.
mcp_tool
Invoke an MCP tool as the handler.
prompt / agent
Run an LLM prompt or a subagent to make a judgment call at the hook point.
"timeout" (seconds) and "async": true. Use async for non-blocking work like an audit log — an unmatched PostToolUse hook that appends JSONL records every action without slowing Claude down.Manage and inspect your hooks live with the /hooks slash command, which shows what is configured across all your settings files.
Claude Fable 5 & Mythos 5
Anthropic's new flagship — state-of-the-art on nearly every benchmark
On June 9, 2026 Anthropic released Claude Fable 5 — its most capable model yet — plus Claude Mythos 5, the same model with safeguards lifted for vetted research.
What launched
Claude Fable 5
The general-availability flagship, with full safeguards. API id claude-fable-5. Best-in-class at software engineering, knowledge work, vision, and science.
Claude Mythos 5
The same underlying model with safeguards lifted in narrow areas. Restricted to Project Glasswing partners and select biology researchers — not general access.
The numbers
| Item | Detail |
|---|---|
| API id | claude-fable-5 |
| Price (in / out) | $10 / $50 per million tokens |
| Autonomy | Runs unattended across millions of tokens on long tasks |
| Availability | Live now on the Claude API and consumption-based Enterprise plans. Subscription plans: included through June 22, then via usage credits. |
Why it's a big deal
Software engineering
State-of-the-art on CursorBench and top score on Cognition's FrontierCode at medium effort. In a Stripe case study it compressed months of work on a 50-million-line Ruby migration into days.
Analysis & finance
First model to break 90% on complex analytical tasks — a 10-point jump over Opus — and the highest score of any model on Hebbia's finance benchmark.
Science & vision
~10x faster protein design, novel molecular-biology hypotheses preferred 80% of the time vs. Opus-class, and strong vision work like rebuilding web apps from screenshots.
What it means for you
- Reach for Fable 5 when a task is genuinely hard — large refactors, deep research, multi-step agentic runs — where a better answer is worth the higher token price.
- Keep routing cheap, high-volume work to Haiku and everyday coding to Sonnet. Fable 5 is the premium tier, not the default for everything (see Pick the Right Brain).
- In Claude Code, switch with
/model claude-fable-5. Front-load context once so prompt caching keeps the cost down.
Plugins & Marketplaces
Bundle your whole setup into one installable unit
A plugin packages commands, agents, skills, hooks, and MCP servers into a single thing you install once and reuse everywhere.
Building a great Claude Code setup takes work: custom skills, subagents, format-on-save hooks, the right MCP servers. A plugin wraps all of that into one directory you can share. Install it on a new machine, hand it to a teammate, or drop it into every repo and your capabilities travel with you.
What lives inside a plugin
A plugin is a self-contained folder. Components are auto-discovered from the directory layout at the plugin root, so you rarely have to wire anything up by hand.
skills/
Each skill is a folder with a SKILL.md. Namespaced as /plugin-name:skill-name to avoid collisions.
agents/
Subagents shipped with the plugin. (Plugin agents ignore hooks, mcpServers, and permissionMode for security.)
hooks/hooks.json
Lifecycle hooks (auto-format, audit logs, policy gates) that activate when the plugin is enabled.
.mcp.json
MCP servers the plugin needs, started automatically alongside the plugin.
bin/, themes/, output-styles/
Executables added to the Bash PATH, plus themes and output styles.
.claude-plugin/plugin.json
The only file inside .claude-plugin/. It is OPTIONAL — used to set name, version, and author.
.claude-plugin/. Only plugin.json goes in .claude-plugin/.The /plugin manager
Run /plugin to open the manager — a tabbed interface you cycle with Tab / Shift+Tab.
| Tab | What it does |
|---|---|
| Discover | Browse plugins from every marketplace you've added. |
| Installed | Enable, disable, or uninstall, grouped by scope. |
| Marketplaces | Add, remove, or update marketplace sources. |
| Errors | See what failed to load and why. |
Recent versions also show a Context cost estimate, a Last updated date, and a 'Will install' breakdown listing exactly which commands, agents, skills, hooks, and MCP/LSP servers a plugin adds before you commit.
Install your first plugin
Anthropic ships a curated marketplace, claude-plugins-official, that is available automatically in every install — no setup needed.
# Install the official GitHub plugin (user scope by default)
/plugin install github@claude-plugins-official
# Activate it this session without restarting
/reload-plugins
The syntax is always plugin-name@marketplace-name. That @marketplace suffix is how Claude knows which catalog the plugin came from.
/reload-plugins to apply changes instead of restarting Claude.Add a marketplace
A marketplace is just a repo (or hosted file) with a .claude-plugin/marketplace.json catalog at its root. Add one, then install from it.
# Add Anthropic's community marketplace
/plugin marketplace add anthropics/claude-plugins-community
# Now install any plugin it lists
/plugin install commit-commands@claude-community
Sources are flexible. Use whichever fits where your plugins live.
| Source type | Example |
|---|---|
| GitHub owner/repo | anthropics/claude-code |
| Git URL (GitLab, self-hosted) | https://gitlab.com/team/cc-plugins.git |
| Local path | ./my-marketplace or a marketplace.json |
| Hosted URL | remote marketplace.json |
/plugin market works for marketplace, and rm works for remove.Share a plugin with your whole team
This is the real payoff. Install at project scope and the plugin is recorded in enabledPlugins in .claude/settings.json — which you commit. Every teammate who clones the repo gets the same setup.
# Write the plugin into the shared project settings file
/plugin install lint-suite@claude-community --scope project
One reviewer agent, one format hook, one test command — defined once, used by everyone. No more 'works on my machine' tooling drift.
Scripted install (CLI, non-interactive)
For CI, dotfiles, or onboarding scripts, use the non-interactive claude plugin subcommands.
# Add a marketplace and install without the UI
claude plugin marketplace add anthropics/claude-plugins-community
claude plugin install commit-commands@claude-community
# Audit what's installed (machine-readable)
claude plugin list --json
claude plugin validate checks plugin.json, every skill/agent frontmatter, and hooks.json. Scaffold a fresh one with plugin init --with skills,agents,hooks,mcp.Build a plugin in five steps
- Run
plugin init(or make a folder) and drop components intoskills/,agents/,hooks/, and.mcp.jsonat the root. - Reference bundled files with
${CLAUDE_PLUGIN_ROOT}in hooks and MCP configs — plugins are copied to a cache, so absolute paths break. - Optionally add
.claude-plugin/plugin.jsonwith a name, version, and author. Skipversionand the git commit SHA is used. - Run
claude plugin validateto catch frontmatter and manifest errors. - Push to a repo, add a
.claude-plugin/marketplace.jsoncatalog, and share theowner/repoas a marketplace source.
.claude-plugin/plugin.json loads in place as name@skills-dir on the next session — no install, no cache. Great for iterating on a plugin you're authoring.Why plugins win
Skills, agents, and hooks scattered across ~/.claude/ and individual repos are hard to move and easy to forget. A plugin makes that whole stack portable: version it, audit its context cost in /plugin, enable it per project, and update it when the source changes.
Multi-Agent Workflows & Loops
From one agent to a coordinated team — without burning your token budget
One agent is a worker; many agents are an assembly line that you steer from a single chair.
Subagents each run in their own fresh context window with their own tools. They do the heavy reading, then hand back a short summary. That is the whole trick: keep your main context clean while the noisy work happens elsewhere.
When orchestration is worth it (and when it is not)
The core constraint never changes: context fills fast and quality drops as it fills. Multi-agent setups exist to protect that budget, not to look clever. Spending more tokens only pays off when the work is parallelizable or verbose.
| Use subagents / loops when | Stay in the main chat when |
|---|---|
| Task spits out verbose output you do not need to keep | You need frequent back-and-forth steering |
| Several investigations are independent (fan-out) | Phases share the same context |
| You want to enforce tool/permission limits | The change is a quick one-line diff |
| Work is self-contained and returns a summary | Latency matters (cold start costs time) |
Pattern 1: Parallel fan-out (research)
Spin up several agents at once on independent slices of a problem. Each explores in its own context; the main thread then synthesizes their summaries. This works best when the paths truly do not overlap.
Investigate our auth system. Run three subagents in parallel:
1. Explore: map every login/session file and how tokens are issued
2. Explore: list all places that read or write user roles
3. Explore: find rate-limiting and lockout logic
Then summarize the attack surface in one page.
The built-in Explore agent is perfect here: it is Haiku-backed, read-only, and skips CLAUDE.md and git status to start lean. Cheap, fast, disposable.
Pattern 2: The pipeline (research → compose → verify)
Some work is a relay, not a sprint. Each stage runs as a separate subagent so a bloated research phase never pollutes the writing phase. Define small, single-purpose agents in .claude/agents/.
---
name: researcher
description: Gathers facts and returns a tight brief. Use proactively.
tools: Read, Grep, Glob, Bash
model: haiku
---
You investigate, then return ONLY a bulleted brief with file paths.
Never edit files. Never speculate beyond what you found.
A reviewer agent caps cost the same way by restricting tools and using a cheaper model:
---
name: verifier
description: Checks work against the brief and the tests.
tools: Read, Bash
model: sonnet
disallowedTools: Write, Edit
---
Run the test suite. Report PASS/FAIL plus the smallest fix list.
model frontmatter field defaults to inherit.Pattern 3: Find → fix → test loops
An autonomous loop repeats a step until a quality gate passes. The cleanest gate in Claude Code is a Stop hook: it fires when Claude finishes and can refuse to let it stop until tests are green.
{
"hooks": {
"Stop": [{
"hooks": [{
"type": "command",
"command": "npm test --silent || { echo 'TESTS FAILED' >&2; exit 2; }"
}]
}]
}
}
Have the hook script exit 2 on failure: that blocks the stop and feeds stderr back to Claude as a to-do, so it keeps fixing. Exit 0 lets it finish.
stop_hook_active field in the hook input and exit 0 when it is true, or your loop never ends.Headless loops for scripts and CI
Drive Claude non-interactively with -p (print mode). Capture the session ID as JSON, then resume it to chain steps without losing state.
sid=$(claude -p "Find the failing test and its cause" \
--output-format json | jq -r '.session_id')
claude -p "Fix it, then run the suite until green" \
--resume "$sid" \
--permission-mode acceptEdits \
--allowedTools "Edit,Bash(npm test)"
For reproducible CI runs, add --bare to skip auto-discovery of hooks, skills, MCP, and CLAUDE.md, then load only what you need explicitly. It is the recommended mode for scripted and SDK calls.
--permission-mode dontAsk denies anything not in your allow rules; acceptEdits auto-approves file writes plus safe filesystem commands. Pair with --allowedTools so locked-down loops never hang on a prompt.Monitoring and backgrounding agents
Long jobs do not have to block you. Press Ctrl+B to background a running task, or set background: true in an agent's frontmatter. Backgrounded agents run concurrently and auto-deny anything that would prompt.
/agents
Opens a tabbed UI. The Running tab shows live subagents; the Library tab creates and edits them.
claude agents
Dispatch and watch parallel background sessions. claude agents --json prints them for scripting.
isolation: worktree
Runs a subagent in a throwaway git worktree so parallel agents never collide on the same files.
memory: project
Gives an agent persistent cross-session memory under .claude/agent-memory/ — the recommended default.
Token discipline for any multi-agent run
- Plan in plan mode first (Shift+Tab) so a stable prefix caches once and every execution turn reads from it.
- Delegate broad searches to subagents; let only their short summaries enter main context.
- Write the plan to a file like
PLAN.mdso it survives/clearbetween phases. /clearbetween unrelated tasks — fresh context is free.- Check spend with
/costand live usage with/context.
opus and ultrathink for the orchestrator's planning and synthesis, where reasoning depth pays off. Let cheap worker agents do the grunt work. That is where heavy orchestration actually beats a single agent on both quality and cost.Claude API & Agent SDK
Build on Claude — the raw Messages API and the Agent SDK that powers Claude Code
There are two ways to build on Claude: call the model directly with the Messages API, or reuse Claude Code's whole agent loop with the Agent SDK.
Pick the layer that matches your job. The Messages API is one stateless HTTP call where you own everything. The Agent SDK wraps the API in a ready-made agent loop with tools, memory, and MCP. Claude Code is the interactive CLI built on that same SDK.
Messages API
One model call. You write the tool loop, manage history, and control every byte. Best for chat, classification, extraction, RAG.
Agent SDK
The Claude Code agent loop as a library. File tools, Bash, MCP, and subagents work out of the box. Best for headless automation.
Claude Code
The interactive terminal app on top of the SDK. Best for hands-on coding and exploration, not unattended pipelines.
Layer 1 — The Anthropic Messages API
Every request is a POST to /v1/messages at https://api.anthropic.com. It is stateless: you resend the full conversation on each call (up to 100,000 messages). Three headers are required.
| Header | Value |
|---|---|
x-api-key | your API key |
anthropic-version | 2023-06-01 |
content-type | application/json |
Current models & list pricing
| Model ID | Context | Max output | $/MTok (in / out) |
|---|---|---|---|
claude-opus-4-8 | 1M | 128k | $5 / $25 |
claude-sonnet-4-6 | 1M | 64k | $3 / $15 |
claude-haiku-4-5 | 200k | 64k | $1 / $5 |
claude-sonnet-4-20250514 and claude-opus-4-20250514 retire June 15 2026; claude-opus-4-1 retires Aug 5 2026. Claude 3 Haiku, Sonnet 3.7, and Haiku 3.5 are already retired and return errors. Migrate to Sonnet 4.6 / Opus 4.8.Minimal Python request
pip install anthropic # Python 3.9+, reads ANTHROPIC_API_KEY from env
from anthropic import Anthropic
client = Anthropic()
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a terse senior network engineer.",
messages=[{"role": "user", "content": "Explain BGP route reflectors in 2 sentences."}],
)
print(msg.content[0].text)
Minimal TypeScript request
npm install @anthropic-ai/sdk // 4.9+, Node 20+; also Deno/Bun/browser
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic(); // reads ANTHROPIC_API_KEY
const msg = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: "You are a terse senior network engineer.",
messages: [{ role: "user", content: "Explain BGP route reflectors in 2 sentences." }],
});
console.log(msg.content[0].type === "text" && msg.content[0].text);
system parameter (string or array of text blocks). There is NO system role in the messages array. New in Opus 4.8: mid-conversation role: "system" entries are allowed and preserve cache hits when instructions change mid-session.Tool use (function calling) & the agentic loop
Pass a tools array — each tool has a name, description, and JSON-Schema input_schema. When Claude wants a tool it returns stop_reason: "tool_use". You run the tool and feed the result back as a tool_result block, then loop.
tools = [{
"name": "get_weather",
"description": "Current weather for a city",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
}]
messages = [{"role": "user", "content": "Weather in Detroit?"}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=messages
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
break
results = []
for block in resp.content:
if block.type == "tool_use":
out = run_my_tool(block.name, block.input) # your code
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": out,
})
messages.append({"role": "user", "content": results})
Loop while stop_reason == "tool_use"; exit on end_turn, max_tokens, stop_sequence, or refusal. Control calls with tool_choice (auto, any, named tool, or none) and disable_parallel_tool_use: true to force at most one call.
{"type": "web_search_20250305", "name": "web_search", "max_uses": 5}. Long server loops may return stop_reason: "pause_turn"; just resend the response to continue.Streaming, vision & PDFs
Set stream: true for SSE. The event flow is message_start then per-block content_block_start / _delta / _stop, then message_delta and message_stop. SDK helpers hide the plumbing.
with client.messages.stream(
model="claude-sonnet-4-6", max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about subnets."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
All current models accept images via {"type":"image","source":{...}} (base64 with media_type, or a URL source). PDFs go through document blocks and Claude reads both text and visuals.
messages=[{"role": "user", "content": [
{"type": "image", "source": {"type": "url", "url": "https://example.com/diagram.png"}},
{"type": "text", "text": "What VLAN topology is this?"},
]}]
max_tokens to avoid HTTP timeouts on long generations.Prompt caching — cut repeat cost ~90%
Mark stable, reused content (big system prompts, docs, tool defs) with cache_control. Cache reads cost about 1/10th of base input. Place the breakpoint after the static prefix.
system=[
{"type": "text", "text": LONG_STYLE_GUIDE,
"cache_control": {"type": "ephemeral"}},
]
TTL is 5 minutes by default or 1 hour (now GA, no beta header). Minimum cacheable length is 1,024 tokens on Opus 4.8 / Sonnet 4.6 (4,096 on Haiku 4.5). Usage reports cache_creation_input_tokens and cache_read_input_tokens.
max_tokens: 0 — the API writes the cache at the breakpoint and returns an empty content array with stop_reason: "max_tokens" and full usage, generating no output. (Rejected inside Batch requests.)Batch API — 50% off, async at scale
For non-urgent bulk jobs (evals, backfills, large extractions), submit many requests at once for half price. Poll until done, then read results.
batch = client.messages.batches.create(requests=[
{"custom_id": "job-1",
"params": {"model": "claude-haiku-4-5", "max_tokens": 256,
"messages": [{"role": "user", "content": "Classify: ..."}]}},
])
# poll batch.processing_status until == "ended", then:
for r in client.messages.batches.results(batch.id):
print(r.custom_id, r.result)
output-300k-2026-03-24 beta header.Extended thinking & structured outputs
Turn on deeper reasoning with thinking. Claude 4/4.5 and Sonnet 4.6 use {"type":"enabled","budget_tokens": N} (N must be < max_tokens); Opus 4.6/4.7/4.8 use {"type":"adaptive"} with interleaved thinking auto-enabled. With tool use, only tool_choice auto or none are allowed, and you must pass thinking blocks back.
Structured outputs are GA on Sonnet 4.5 / Opus 4.5 / Haiku 4.5 via output_config.format (no beta header) when you need guaranteed JSON shapes.
Official SDKs also cover Java 8+, Go 1.23+, Ruby 3.2+, C#, and PHP 8.1+, plus an ant CLI. Companion endpoints: Models API, Token Counting API, Files API, and the Admin API.
Layer 2 — The Claude Agent SDK
The Agent SDK (formerly the Claude Code SDK) runs the exact same agent loop that powers Claude Code, inside your process. It handles tool execution, context management, retries, and prompt caching so you don't write the tool loop yourself.
# Python (3.10+)
pip install claude-agent-sdk
# TypeScript — bundles a native Claude Code binary, no CLI install needed
npm install @anthropic-ai/claude-agent-sdk
Tiny Python agent
import anyio
from claude_agent_sdk import query
async def main():
async for message in query(
prompt="Find every TODO in src/ and summarize them."
):
print(message)
anyio.run(main)
Python has two entry points: query() creates a fresh session per call and returns an async iterator; ClaudeSDKClient reuses one session across exchanges with manual lifecycle (connect / query / receive_response / interrupt / disconnect) and supports interrupts.
Tiny TypeScript agent
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find every TODO in src/ and summarize them.",
})) {
console.log(message);
}
In TypeScript, query() returns an async generator (the Query object) that maintains the session and exposes interrupt(), setPermissionMode(), setModel(), setMcpServers(), and close().
Built-in tools — nothing to implement
Files
Read, Write, Edit, Glob, Grep
Shell & web
Bash, WebSearch, WebFetch, Monitor
Orchestration
Agent (subagents), Skill, TaskCreate/TaskUpdate, ToolSearch, AskUserQuestion
Read-only tools (Read, Glob, Grep, read-only MCP) can run concurrently; state-mutating tools (Edit, Write, Bash) run sequentially.
Permission modes — the safety dial
| Mode | Behavior |
|---|---|
default | Unmatched tools trigger the canUseTool callback |
acceptEdits | Auto-approves file edits + filesystem commands (mkdir/rm/mv/cp) |
plan | Read-only tools only |
dontAsk | Denies anything not pre-approved (never calls the callback) — for CI |
bypassPermissions | Approves all, but deny rules and hooks still apply |
bypassPermissions), then the mode, then allow rules, then canUseTool. allowed_tools only pre-approves — it does NOT restrict Claude to those tools.Custom tools, MCP & subagents
Define in-process tools with the @tool decorator (Python) or tool() helper (TypeScript, Zod schema), bundle with create_sdk_mcp_server() / createSdkMcpServer(), and pass via mcpServers. Return isError: true to signal failure without halting the loop; an uncaught exception stops it.
from claude_agent_sdk import tool, create_sdk_mcp_server
@tool("add", "Add two numbers", {"a": float, "b": float})
async def add(args):
return {"content": [{"type": "text", "text": str(args["a"] + args["b"])}]}
server = create_sdk_mcp_server(name="math", tools=[add])
# pass server via the mcp_servers option of query()/ClaudeSDKClient
External MCP servers go through mcpServers/mcp_servers or a .mcp.json file, over stdio, HTTP/SSE, or in-process. Tool names follow mcp__{server}__{tool}; auto-approve with wildcards like mcp__github__*.
agents option, as markdown in .claude/agents/, or use general-purpose. They are invoked via the Agent tool (renamed from Task in v2.1.63), so add Agent to allowedTools to auto-approve. Subagents cannot spawn their own subagents.Hooks, sessions & cost caps
Hooks fire at lifecycle points (PreToolUse, PostToolUse, UserPromptSubmit, Stop, SubagentStop, PreCompact, and more). A PreToolUse hook can return permissionDecision allow/deny/ask/defer and even rewrite the input.
Capture ResultMessage.session_id to resume later via resume. The SDK also supports forking sessions, file checkpointing (rewind files to a prior message), reasoning effort (low→max), and guardrails max_turns and max_budget_usd.
Headless / CI automation
For pipelines, drive Claude Code non-interactively with claude -p.
claude -p "run the test suite and fix any failures" \
--allowedTools "Bash,Read,Edit" \
--permission-mode acceptEdits \
--output-format json
--bare for reproducible CI: it skips auto-discovery of hooks, skills, plugins, MCP servers, auto-memory, and CLAUDE.md (auth must come from ANTHROPIC_API_KEY). Anthropic says --bare will become the default for -p in a future release.claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive limits.Which layer should I use?
| You want to… | Use |
|---|---|
| Add chat, classify/extract/summarize text, build RAG | Messages API |
| Run a model with full control over the tool loop and history | Messages API |
| Ship an autonomous agent that reads/writes files, runs Bash, uses MCP, spawns subagents | Agent SDK |
| Automate coding tasks headless in CI | Agent SDK / claude -p |
| Code hands-on in the terminal, explore a repo interactively | Claude Code CLI |
docs.claude.com and code.claude.com/docs/en/agent-sdk/.Prompting for Best Results
Precise input, powerful output — stop guessing, start steering
Claude can infer intent, but it can't read your mind — the more precise your instructions, the fewer corrections you'll need.
Great results come from clear inputs. Tell Claude what to do, give it context, show it an example, and set the rules for "done." Then iterate in small steps and fix course early.
The seven habits of a strong prompt
Be specific
Name the exact task, file, and outcome. Replace "fix the bug" with the symptom, the file, and the expected behavior.
Give context
Say why, who it's for, and what came before. Context shapes the right answer; missing context invites wrong guesses.
Show an example
Point to a pattern Claude should copy: "follow the style in userService.ts."
State constraints
List the rules: no new dependencies, keep it under 50 lines, must pass pytest.
Define "done"
Give acceptance criteria so Claude can self-check before stopping.
Plan first
Ask for a plan before any edits. Review it, then say go. Cheaper to fix a plan than a diff.
Iterate small
One scoped change at a time beats one giant request. Smaller steps are easier to verify and correct.
Before and after: vague vs strong
The same goal, written two ways. The strong version names the file, the rule, and the test that proves success.
Vague (forces guessing)
fix the login bug
Strong (one-shot ready)
In @src/auth/login.ts, users with expired sessions get a
500 instead of a 401 redirect to /login.
Reproduce: call loginHandler() with an expired token.
Constraint: don't change the token schema; reuse the
existing redirectToLogin() helper.
Done when: the expired-token test in auth.test.ts passes
and `npm test auth` is green.
Point at files with @
Typing @ opens a file picker so Claude reads the exact file — no broad searching that floods context. Reference the real path instead of describing the code.
Refactor @src/api/users.py to use the repository pattern
in @src/api/orders.py as the reference. Keep public
function signatures unchanged.
Ask for a plan first
Plan Mode is the highest-leverage habit. Toggle it with Shift+Tab (cycles normal / plan / auto-accept). In plan mode Claude can read, search, and think — but it literally cannot edit files until you approve.
- Enter plan mode with Shift+Tab or launch with
claude --permission-mode plan. - Let Claude explore the relevant files and propose a plan.
- Read the plan, correct the approach, then approve it.
- Claude executes against a stable, cached context.
Dial reasoning with thinking keywords
Extended thinking is on by default. Escalate the budget with keywords when a problem is genuinely hard.
| Phrase | Use for |
|---|---|
think | Moderately tricky logic or a small refactor |
think hard / think harder | Multi-file changes, non-obvious bugs |
ultrathink | Architecture, hard debugging, costly decisions |
Correct course early
The moment an answer drifts, stop it. Don't let a wrong approach pile up in context.
Stop it
Press Esc to interrupt mid-action with context preserved, then redirect.
Erase the failure
Press Esc Esc (or /rewind) to undo a bad attempt. Rewinding deletes it from context entirely — cleaner than correcting on top of it.
Reset after two misses
After two failed corrections on the same issue, run /clear and re-prompt with a sharper ask. Fresh context is free.
Do / Don't grid
| Do | Don't |
|---|---|
Name the exact file: @src/db/pool.ts | Say "the database file" |
| State acceptance criteria ("done when tests pass") | Leave "done" undefined |
| Ask for a plan before edits | Let Claude rewrite five files unprompted |
| Point to a pattern to copy | Assume Claude knows your house style |
| Scope investigations narrowly | Say "investigate the codebase" (floods context) |
| Iterate one change at a time | Bundle ten unrelated asks in one prompt |
/clear between unrelated tasks | Pile new tasks onto stale context |
| Rewind a failed attempt | Keep correcting on top of a broken approach |
Delegate research to keep context clean
Unscoped "investigate" requests cause endless exploration that floods your main window. Hand research-heavy work to a subagent that runs in its own fresh context and returns only a short summary.
Use the Explore subagent to find every place we call the
Stripe API, then report back a one-paragraph summary with
file paths. Don't edit anything.
PLAN.md. It survives /compact and /clear, so a fresh session can pick up exactly where you left off.Claude.ai: Chat, Projects & Artifacts
Turn the chat box into a coworking studio with persistent knowledge and live artifacts
The Claude.ai app is more than a chat box: Projects give Claude a memory for your work, Artifacts give it a canvas, and styles give it your voice.
Most people just type into the chat and start over every time. Power users wire up Projects, Artifacts, and custom instructions so Claude shows up already knowing the context. This section shows how those pieces snap together into a real coworking setup.
Chat
The conversation. Where you ask, refine, and steer.
Projects
A workspace with its own chat history, knowledge base, and instructions.
Artifacts
A live side window rendering code, docs, and apps next to the chat.
Styles
Reusable voice and formatting rules applied to Claude's output.
Projects: a memory for a body of work
A Project is a self-contained workspace. It keeps its own chat history plus a knowledge base of files you upload once and reuse in every chat inside it. Projects are on every plan, including Free, which is capped at 5 projects.
The two pillars are project knowledge (uploaded docs, text, and code) and project instructions (custom instructions that set tone and role for every chat in that project).
When knowledge gets big: RAG
On paid plans (Pro, Max, Team, Enterprise), when your knowledge base nears the context limit, Claude automatically switches to Retrieval Augmented Generation. RAG expands capacity up to 10x by pulling only the relevant chunks per question instead of stuffing everything into context.
Project instructions: set the role once
Project instructions are the single highest-leverage Project setting. They tell Claude who it is and how to answer for the whole project, so you stop repeating yourself.
You are my senior network architecture reviewer.
Audience: enterprise infra teams (Juniper, Arista, Palo Alto).
Always: cite the config file by name, flag security risks first,
and give a one-line summary before the detail.
Never invent interface names — ask if a config is missing.
Styles: control the voice
Styles change how Claude writes, independent of any Project. Open the Search and tools menu and pick Use style. Four presets ship in: Normal Concise Formal Explanatory. Normal and Concise are always visible and can't be hidden.
Custom styles are where it gets useful. You can build one two ways:
- Upload or paste a writing sample (pdf, doc, or txt) so Claude matches your voice.
- Or pick Describe style for a starting point, then use Use custom instructions (advanced) to give exact rules.
Styles can be renamed, previewed, edited (by Claude or via Set Instructions Manually), and deleted.
Artifacts: the coworking canvas
Artifacts are a dedicated side window for standalone content: code, documents, data visualizations, and full apps that render live beside the chat. They're generally available on every plan and on iOS and Android, with their own Artifacts space in the sidebar.
The workflow is conversational. You ask, the artifact appears, you point at what's wrong, and Claude edits it in place — no copy-paste loop.
Live preview
HTML, React, and visualizations render instantly so you see results, not just code.
Iterate in place
"Make the header sticky" edits the same artifact instead of restating everything.
AI-powered apps
Embed Claude into a shareable app via a text-based API — no API keys needed.
MCP + storage
On paid plans, artifacts read/write tools like Slack and persist data.
AI-powered Artifacts
You can build a shareable app with Claude intelligence baked in. It runs on Anthropic's infrastructure, needs no API keys, and — importantly — usage counts against each end-user's own subscription, not yours. So you can share a tool without paying for everyone's calls.
On Pro, Max, Team, and Enterprise (web and desktop), Artifacts also support MCP integration and persistent storage (personal or shared). That lets an artifact read and write to external tools like Asana, Google Calendar, Slack, and custom MCP servers.
File uploads and file creation
Drop files straight into a chat for analysis: CSV, TSV, PDFs, images, and more. Multiple files work in one chat and stay downloadable for the whole conversation.
Claude can also create files. On all plans (web, Desktop, Mobile), it generates downloadable .xlsx, .pptx, .docx, and PDF files, plus PNG charts and Python data analysis. Turn it on at Settings > Capabilities > Code execution and file creation.
.xlsx back.Web search and Research
Web search must be turned on by you (toggle it in chat settings/tools). Once on, Claude decides when a prompt needs fresh data, searches, and answers with citations to its sources.
Research mode is a paid-only (Pro/Max/Team/Enterprise) agentic step up. It runs many progressive searches across the web and connected internal sources like Gmail, Google Calendar, and Google Docs. It needs web search on, and you enable it from the + button > Research.
Putting it together: a coworking session
The real power is the combination. Here's a typical flow that uses all four surfaces at once.
- Create a Project, upload your reference docs, and write a role instruction.
- Pick (or build) a custom style so output lands in your voice.
- Open a chat, turn on web search, and ask Claude to draft something.
- Claude renders the draft as an Artifact you can preview live.
- Refine by pointing at the artifact; export it as
.docxor.pptxwhen done.
| Surface | Holds | Best for |
|---|---|---|
| Project | Knowledge + instructions | Persistent context for one effort |
| Chat | The conversation | Asking and steering |
| Artifact | Rendered output | Live code, docs, and apps |
| Style | Voice rules | Consistent tone everywhere |
Cost & Speed Optimization
Pay Haiku prices for Opus-quality work
The cheapest token is the one you never send — route smart, cache big, and keep context lean.
Two levers control your bill: which model runs the task, and how many tokens it reads each turn. Master both and you cut cost 5–10x without losing quality. The single rule under everything: the context window fills up fast, and quality drops as it fills.
Pick the right model for the job
Opus 4.8 costs 5x more per token than Haiku 4.5. Using Opus to fix a typo is like renting a crane to hang a picture. Match the model to the difficulty, not your mood.
| Model | In / Out (per MTok) | Speed | Use it for |
|---|---|---|---|
| Haiku 4.5 | $1 / $5 | Fastest | High-volume, simple edits, search, subagents, real-time |
| Sonnet 4.6 | $3 / $15 | Fast | Most production work: code gen, analysis, tool use |
| Opus 4.8 | $5 / $25 | Moderate | Hard reasoning, big refactors, long-horizon agentic runs |
Switch the active model mid-session with /model. Use /model default to let Claude auto-route harder turns to Opus for you.
/model haiku # cheap, fast turns
/model sonnet # the workhorse
/model opus # bring out the heavy reasoning
/model default # let Claude pick per task
Route from the command line
Set the model per run for scripts and one-offs. A headless triage pass on Haiku, then escalate only the hard cases.
# Cheap, scripted lint check on Haiku
cat build.log | claude -p "summarize the errors" --model haiku
# Escalate one tough bug to Opus
claude --model opus "think hard about the race condition in worker.py"
--model overrides both the model setting and the ANTHROPIC_MODEL env var. Add --fallback-model so jobs still run if your first choice is unavailable.Push cheap, verbose work to subagents
A subagent on Haiku can churn through a noisy codebase search and hand back a short summary — the verbose output never pollutes your main (pricier) context. Set model: haiku in the agent frontmatter.
---
name: log-scanner
description: Use proactively to grep logs and return a short summary
tools: Read, Grep, Glob, Bash
model: haiku
---
Search the logs, extract only the failing requests, and report a 5-line summary.
Prompt caching: pay once for stable context
Caching is automatic in Claude Code — you never add cache_control yourself. The API caches by exact prefix match, so a cache hit costs just 0.1x base input (90% cheaper). The catch: any change anywhere in the prefix recomputes everything after it.
| Operation | Cost vs base input |
|---|---|
| Cache hit (read) | 0.1x — 90% off |
| 5-minute cache write | 1.25x |
| 1-hour cache write | 2x |
/compact. Each invalidates the cache and forces a slow, expensive uncached turn. Do these at boundaries, not in the middle of a flow.Keep context lean
File reads dominate context, and a bloated window both costs more and degrades quality. Fresh context is free — use it.
/clear between tasks
Wipes history entirely. Use it whenever you switch to unrelated work so old tokens stop riding along on every turn.
/compact at breakpoints
Summarizes history into a compact form. Steer it: /compact focus on the API changes. Do it proactively around 50–75%, not at the auto-trigger.
Scope your @ files
Reference exact files and line ranges, not whole directories. Prefer rg/grep over dumping files into the chat.
/context to audit
Shows a live token breakdown by category so you can see what's eating the window before it bites.
/rewind) erases a failed attempt from context entirely, while correcting leaves the dead-end approach polluting every later turn.Be specific to avoid re-work
Vague prompts cause endless exploration that floods context and burns tokens. "Investigate the auth bug" can spiral; a scoped request lands in one pass.
# Costly: unscoped, invites infinite exploration
claude "investigate why login is slow"
# Lean: names the file, states the constraint
claude "in src/auth/session.ts, find why validateToken() does a DB call per request; propose a cache"
Batch and reuse work
Stop paying startup costs over and over. Reuse sessions in scripts instead of cold-starting each time, and let the cache stay warm across follow-ups.
# Capture a session id, then continue it cheaply
session_id=$(claude -p "Start a review of the diff" --output-format json | jq -r '.session_id')
claude -p "Now check for type errors" --resume "$session_id"
Trim CLAUDE.md ruthlessly
CLAUDE.md loads on every session and re-loads after every compaction — every line is a recurring tax. For each line ask: "Would removing this cause Claude to make mistakes?" If not, cut it. A bloated file also makes Claude ignore the rules that matter.
Watch the meter
You can't optimize what you don't measure. Check spend and usage as you go.
/cost # token usage + estimated cost this session
/context # live token breakdown by category
/mcp # per-server tool cost — prune noisy servers
ultrathink for genuinely hard, costly-to-get-wrong decisions (architecture, multi-file debugging). For everyday work, plain think or think hard gives the depth you need without the token bill.Your cost-saving habits, in order
- Default to Haiku or Sonnet; reach for Opus only on hard reasoning.
- Plan first (Shift+Tab) to write one stable, cached prefix.
- Avoid mid-task model switches and CLAUDE.md edits — they nuke the cache.
/clearbetween unrelated tasks;/compactat logical breakpoints.- Scope prompts and @ files; rewind failures instead of correcting them.
- Reuse sessions in scripts; batch non-urgent jobs at 50% off.
- Check
/costand/contextto catch waste early.
Security Hardening
Give Claude autonomy without giving it the keys to your machine
Claude Code is powerful enough to edit files, run shells, and call external servers — so the question isn't whether to trust it, but how to scope that trust.
Security in Claude Code is enforced by the harness itself, not by the model. A prompt or CLAUDE.md can never widen what Claude is allowed to do — only your settings, permission modes, and hooks can. That separation is the whole game.
The permission model
Every tool call is matched against three rule types. They are evaluated deny → ask → allow, and the first match wins, so a deny always beats an allow at any other level.
| Rule | Effect |
|---|---|
| allow | Run the tool with no prompt |
| ask | Prompt for confirmation each time |
| deny | Block it outright — always wins |
Rules use the format Tool or Tool(specifier). Manage them interactively with /permissions or write them into settings.json.
{
"permissions": {
"allow": [
"Read(src/**)",
"Bash(npm test)",
"Bash(npm run lint)"
],
"ask": [
"Bash(git push:*)"
],
"deny": [
"Read(.env)",
"Read(./secrets/**)",
"Bash(curl:*)"
]
}
}
"Bash" removes the tool from Claude entirely. A scoped deny like "Bash(rm *)" only blocks matching calls. Choose the narrower one that fits the risk.Settings precedence
When rules conflict, this order decides (highest to lowest). A deny at any level cannot be overridden by an allow at any other level.
- Managed (admin) settings — not even CLI flags override these
- Command-line arguments
- Local project settings (
.claude/settings.local.json) - Shared project settings (
.claude/settings.json) - User settings (
~/.claude/settings.json)
Permission modes
Modes set the default posture for unmatched tools. Cycle them with Shift+Tab or set permissions.defaultMode.
default
Reads freely, prompts on first use of anything that writes or runs.
plan
Read-only exploration. Enforced at the tool level — Claude literally cannot edit or run destructive commands.
acceptEdits
Auto-accepts file edits plus filesystem commands (mkdir, touch, rm, mv, cp, sed) in the working dir.
dontAsk
Auto-denies anything not pre-approved. Built for non-interactive CI.
auto
Auto-approves with a background safety classifier. Research preview — treat as experimental.
bypassPermissions
Skips all prompts. Only safe inside an isolated environment.
plan mode. Let Claude explore and propose a plan with zero write access, then switch out to implement.The truth about --dangerously-skip-permissions
This flag (and the bypassPermissions mode) removes per-action review completely and skips protected-path checks. The only thing left is a circuit breaker that still prompts on rm -rf / and rm -rf ~ — and it offers no protection against prompt injection.
--dangerously-skip-permissions on your host machine for convenience. It is designed for isolated, no-host-damage environments only — containers, VMs, or dev containers without internet access.The flag is blocked when running as root or via sudo on Linux/macOS (that check is skipped inside a recognized sandbox). The safer move is almost always a scoped allowlist instead.
# Risky: blanket bypass on your laptop
claude --dangerously-skip-permissions # don't do this on the host
# Safer: pre-approve exactly the tools you need
claude -p "run the test suite and fix failures" \
--allowedTools "Read" "Edit" "Bash(npm test)" "Bash(npm run lint)"
permissions.disableBypassPermissionsMode: "disable" (and disableAutoMode for auto) in managed settings.Protected paths
In every mode except bypassPermissions, Claude Code refuses to auto-approve edits to sensitive locations. You don't configure these — they're built in.
| Protected dirs | Protected files |
|---|---|
.git, .vscode, .idea, .devcontainer, .husky, .cargo, .claude | .bashrc, .zshrc, .profile, .envrc, .npmrc, .gitconfig, .mcp.json, .claude.json |
Sandboxing risky autonomy
The sandboxed Bash tool gives OS-level filesystem and network isolation — Seatbelt on macOS, bubblewrap on Linux/WSL2. It covers Bash commands and their children only (not Read, Edit, WebFetch, or MCP). Turn it on with /sandbox or sandbox.enabled: true.
{
"sandbox": {
"enabled": true,
"filesystem": {
"denyRead": ["~/.aws", "~/.ssh", "~/.config/gcloud"],
"denyWrite": ["~/"]
},
"network": {
"allowedDomains": ["api.anthropic.com", "registry.npmjs.org"]
}
}
}
The default read policy still allows credential dirs, so Anthropic recommends adding denyRead for ~/.aws and ~/.ssh yourself. For org-wide enforcement, managed settings can set failIfUnavailable: true (refuse to start without a sandbox) and allowUnsandboxedCommands: false (kill the dangerouslyDisableSandbox escape hatch).
Whole-process isolation
For maximum autonomy, two options wrap more than just Bash:
Dev container
The claude-code repo ships an example .devcontainer/ with a default-deny iptables firewall, a non-root user, and no unapproved egress — a starting point for unattended runs. It's a convention, not a boundary Claude enforces.
sandbox-runtime
@anthropic-ai/sandbox-runtime (beta) wraps the entire process in Seatbelt/bubblewrap so every tool, hook, and MCP server is constrained — denying all write and network by default.
Secret handling
Claude Code stores its own API keys encrypted via credential management, and blocks web-fetching commands like curl and wget by default. Your job is to keep your project's secrets out of reach.
- Never hardcode keys — read them from environment variables (
ANTHROPIC_API_KEY, etc.). - Add secret files to
.gitignoreand addReaddeny rules for them. - Run a pre-commit secret scan as a hard gate before any commit.
# .gitignore
.env
.env.*
secrets/
*.pem
denyRead.Trusting MCP servers
MCP servers are separate processes that run unconstrained on your host unless the whole Claude Code process is sandboxed. A server you add can read your data and run code — treat adding one like installing a dependency with full machine access.
First-time codebase runs and new MCP servers require trust verification. Note that verification is disabled in non-interactive -p mode (except --worktree, which still requires accepted trust) — so don't pipe an untrusted repo straight into headless mode.
When you do auto-approve MCP tools, scope them with wildcards rather than reaching for bypassPermissions. Note that acceptEdits does not auto-approve MCP tools at all.
{
"permissions": {
"allow": ["mcp__github__*"],
"deny": ["mcp__github__delete_repository"]
}
}
Hooks as guardrails
Hooks are your programmable safety net. A PreToolUse hook runs before the permission prompt and can deny a call, force a prompt, or skip it. Across the evaluation flow, hooks run first — even before deny rules — making them the strongest enforcement point you control.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "jq -r '.tool_input.file_path' | grep -qE '(^|/)(\\.env|secrets/)' && { echo 'Blocked: protected path' >&2; exit 2; } || exit 0"
}
]
}
]
}
}
An exit code of 2 tells Claude Code to block the call and feed the message back to Claude. Use the same pattern to scan a diff for secrets before a commit, or to protect any path your team treats as off-limits.
ConfigChange hook can audit or block changes to settings during a session — so Claude can't quietly loosen its own policy mid-task.Hardening checklist
- Default to
planmode for unfamiliar work; switch to edit only after reviewing the plan. - Maintain a least-privilege
allowlist — specific tools and commands, not blanket grants. - Add
denyrules for secret files, credential dirs, and dangerous commands (curl,rm *). - Never run
--dangerously-skip-permissionson your host — only inside a container, VM, or dev container. - Enable the sandbox (
sandbox.enabled: true) withdenyReadfor~/.awsand~/.sshon autonomous runs. - Keep secrets in env vars;
.gitignorethem; gate commits with a secret scan. - Add only trusted MCP servers; scope tool approval with wildcards, never
bypassPermissions. - Use
PreToolUsehooks to block protected paths and scan diffs before commit. - Run a fresh-context
/security-reviewon the diff before treating work as done.
Best Practices: Writing Code
Plan, scope, test, verify — then trust.
Great code from Claude is not luck — it comes from a tight loop: plan first, change small, verify always, commit clean.
The goal is not code that runs. It is code that is correct, readable, and maintainable. Every habit below points at that.
The Core Loop
Use the same four-phase workflow Anthropic recommends. It keeps each change small enough to review and reversible if it goes wrong.
- Explore in plan mode — read-only, no edits. Claude reads the code and asks questions.
- Get a written plan — ask for a step-by-step approach before any code.
- Implement & verify — leave plan mode, build it, run the checks against the plan.
- Review the diff & commit — read every change, then commit with a clear message.
1. Ask for a Plan First
Plan mode is enforced at the tool level, not just a polite request. In plan mode Claude can only read, search, and think — it cannot edit files or run destructive commands. Enter it with Shift+Tab to cycle modes, or the /plan command.
# Start a session locked to read-only planning
claude --permission-mode plan
When to skip it: Anthropic's rule of thumb is simple. If you could describe the diff in one sentence (a typo, a log line, a rename), skip the plan. Plan when the approach is uncertain, the change spans multiple files, or the code is unfamiliar.
PLAN.md file so they survive context compaction.2. Point Claude at the Right Files
Don't make Claude guess where code lives. Reference exact files with @ so it loads only what matters and keeps context lean.
# Mention specific files and a line range
Review @src/auth/login.ts#L40-95 and the test in
@src/auth/login.test.ts. The session cookie isn't
being set on success. Find the root cause.
In the IDE, Option+K (VS Code) or Cmd+Option+K (JetBrains) inserts an @-mention for the current selection with its path and line range automatically.
3. Make Small, Testable Changes
The Common Workflows guidance is explicit: do refactoring in small, testable increments, and keep backward compatibility when it matters. Small diffs are easy to read, easy to test, and easy to revert.
Give yourself a safety net with git. Have Claude branch per feature so a bad turn is throwaway.
# Discardable safety net before a risky change
git checkout -b fix/session-cookie
4. Test-Driven Development
Claude defaults to writing implementation first — you have to invert that on purpose. Tell it to write failing tests first, give concrete examples, and run the tests after implementing.
Write failing tests FIRST for isValidEmail(), then implement.
Cases: "user@example.com" -> true, "nope" -> false,
"" -> false, "a@b.c" -> true.
Run the tests after implementing and show me the output.
CLAUDE.md with a rule so it won't repeat it. Claude is good at writing rules for itself.5. Anchor Conventions in CLAUDE.md
CLAUDE.md is read at the start of every conversation, before anything else. It is the most reliable channel for your project's conventions — build commands, test runner, code-style rules, and gotchas. Generate a starter with /init, then trim it.
| Include | Exclude |
|---|---|
| Bash commands Claude can't guess | Anything inferable from the code |
| Non-default code-style rules | Standard language conventions |
| Test runner & how to run it | Full API docs (link instead) |
| Architecture decisions, env quirks | Self-evident advice |
# CLAUDE.md (excerpt)
## Commands
- Test: pytest -q
- Lint: ruff check . && mypy src
- Build: make build
## Style (YOU MUST)
- Type hints on every function signature
- Immutable data — never mutate in place
- Async/await for all IO
6. Review the Diff Before You Trust It
This is where correctness is won or lost. Read the diff — don't just accept it. Watch for the "trust-then-verify gap": plausible-looking code that misses edge cases.
# Open the interactive diff viewer
/diff
For a second opinion, run the bundled review skills. They review the current diff in a fresh subagent that sees only the change and the criteria — not Claude's own reasoning.
/code-review
Correctness pass on the diff in a clean context. Add --fix to apply findings.
/security-review
Security-focused pass for auth, input handling, and secrets.
/simplify
Cleanup-only review — reuse, simplification, lower altitude. No bug hunting.
7. Run It, Then Commit
Code that compiles is not code that works. Run the build and the test suite, address the root cause of any failure (don't suppress the error), then commit.
pytest -q && ruff check . && git add -A
git commit -m "fix: set session cookie on successful login"
Do / Don't Grid
Do: plan complex work
Read-only plan mode before multi-file or unfamiliar changes.
Don't: one-shot a big feature
Large unscoped diffs are unreviewable and unrevertable.
Do: give a runnable check
Tests, build exit code, linter, or a screenshot close the loop.
Don't: trust on sight
Plausible code hides edge-case bugs. Read the diff, run it.
Do: point with @
Name exact files and line ranges to keep context lean.
Don't: say "investigate"
Unscoped exploration floods context and degrades quality.
Do: /clear between tasks
Reset context for unrelated work; one scope per session.
Don't: bloat CLAUDE.md
Long files bury the rules Claude actually needs.
A Tight Example Loop
Putting it together on a real bug fix — small, verified, committed:
- Shift+Tab into plan mode:
Bug: @api/cart.py#L20-60 double-charges on retry. Diagnose only. - Approve the plan, then leave plan mode and ask for a failing test that reproduces the double charge.
- Let Claude implement the fix and run
pytest -quntil the new test passes. - Run
/code-reviewon the diff, then/diffand read it yourself. - Run the full suite + linter, then commit:
fix: make checkout idempotent on retry.
/context to see usage and /compact focus on the API changes to summarize while keeping file paths, decisions, and test commands.Best Practices: Docs & Writing
Turn Claude into your fastest, most consistent co-writer
Claude writes its best docs when you hand it a role, an audience, a goal, hard constraints, and a sample of the voice you want.
Vague prompts produce vague prose. The fix is a simple frame you reuse for every writing task: who Claude is, who it's writing for, what success looks like, and the rules it must follow. Add your own source material and a voice sample, and quality jumps immediately.
The five-part writing prompt
Every strong writing request answers five questions. Skip one and Claude guesses, which is where bland output comes from. Keep them explicit and up front.
Role
"You are a staff engineer writing internal docs." Sets vocabulary and depth.
Audience
"For new hires with no Kubernetes background." Controls jargon and assumptions.
Goal
"Reader can deploy the service in 10 minutes." Gives a measurable target.
Constraints
Length, tone, format: "Under 400 words, plain, Markdown with headers."
Source + voice
Paste real notes and a paragraph in the voice you want copied.
Lock a consistent style
If you write the same kinds of docs often, stop re-pasting style rules. Save them once so every session starts from your house style.
In the Claude app: Projects + custom instructions
Create a Project, drop your style guide and example docs into its knowledge, and put the rules in the Project's custom instructions. Every chat in that Project inherits them.
In Claude Code: CLAUDE.md
CLAUDE.md is read at the start of every session, so it is the most reliable place for writing conventions. Keep it short; bloated files get ignored.
# Writing style
- Audience: external developers integrating our API.
- Voice: direct, second person, present tense. No marketing fluff.
- Format: Markdown. H2/H3 headings. Code in fenced blocks with language tags.
- IMPORTANT: every code sample must be copy-pasteable and runnable as-is.
- Sentences under 25 words. Define an acronym on first use.
.claude/rules/ with a paths: ["docs/**"] frontmatter field so it loads only when you touch doc files, keeping your main context lean.Work section by section, not all at once
Long documents drift when generated in one shot. Ask for an outline first, approve it, then fill one section at a time. You catch structure problems before any prose is written.
- Ask for an outline only: "Draft a section outline for this README. No prose yet."
- Edit the outline directly and paste it back as the agreed structure.
- Request one section at a time, feeding the relevant source for each.
- After the draft, run a self-critique pass before you accept it.
Make Claude critique its own draft
A fresh-eyes review catches what the drafting pass missed. Tell Claude exactly what to look for so it doesn't just rewrite for the sake of it.
Review the draft above as a skeptical editor. Flag ONLY:
1. Claims that need a source or are unverifiable
2. Sentences over 25 words or with passive voice
3. Steps a beginner could not follow
4. Anything off-tone vs. the voice sample
List issues as a numbered table, then give a revised version.
Generate tables, diagrams, and structured output
Claude is strong at turning messy notes into clean structure. Ask for the exact format you need.
Tables and comparison grids
Turn these release notes into a Markdown table with columns:
Feature | What changed | Action needed | Breaking?
Sort breaking changes to the top.
Diagrams as code (Mermaid)
Ask for Mermaid and you get a diagram that renders in GitHub, many docs tools, and Markdown viewers, and stays editable as text.
Draw the auth flow as a Mermaid sequenceDiagram:
user -> web app -> auth service -> token -> protected API.
Return only the fenced ```mermaid block.
Strict structured output via the API
When another program consumes Claude's output, enforce a schema. Structured outputs are GA on Claude API via output_config.format (no beta header) on recent models.
output_config={
"format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"summary": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}}
},
"required": ["title", "summary", "tags"]
}
}
}
Artifacts: draft documents live
In the Claude app, ask for an Artifact when you want an editable, self-contained document you can iterate on in a side panel instead of scrolling chat. Great for a README, a spec, or an email you'll keep refining.
Before / after: a real prompt upgrade
Same task, two prompts. The second wins because it supplies role, audience, goal, constraints, and a voice anchor.
Before vague
Write a README for my CLI tool.
After framed
You are a developer-experience writer. Audience: developers who
found this tool on GitHub and have 2 minutes to decide if it fits.
Goal: they can install and run their first command without leaving
the page.
Constraints: Markdown; under 350 words; sections = What it does /
Install / Quickstart / Config. Tone: terse and friendly, like the
sample below. Every command must be copy-pasteable.
Voice sample: "No config, no daemon. Point it at a folder and go."
Source (use only these facts):
- Installs via: npm i -g foldermap
- First command: foldermap ./src
- Reads a .foldermaprc for ignore globs
Output the outline first and wait for my OK before drafting.
Quality checklist before you ship a doc
| Check | What good looks like |
|---|---|
| Audience fit | No undefined jargon; assumptions match the reader's level. |
| Goal met | A reader could actually do the thing the doc promises. |
| Every claim sourced | Facts come from your material, not invented by the model. |
| Commands run | You pasted and executed each snippet; none are guesses. |
| Voice consistent | Reads like one author and matches your sample. |
| Length honored | Within the word/section limits you set. |
| Structure scannable | Clear headings, short paragraphs, tables where useful. |
Organizing Your Workflow
Run a system, not a pile of prompts
Power users don't write better prompts — they run a better system around the prompts.
One goal per thread. Break it into a task list, sequence the steps, and feed Claude only the context that step needs. Everything else in this section is in service of that one idea.
Orchestrate the goal, not the prompt
For anything with 3+ distinct steps, Claude Code already tracks a todo list for you. As of Claude Code v2.1.142 it uses structured Task tools (TaskCreate, TaskUpdate, TaskList) instead of the old TodoWrite. Your job is to set the scope and let the list keep the work honest.
/clear). Mixing two goals in one context is the fastest way to make Claude forget the first one.Anthropic's recommended rhythm for non-trivial work is four phases: explore, plan, implement, commit. Stay read-only while you explore so nothing changes before you have a plan.
- Explore in plan mode (read-only) — press Shift+Tab or run
/plan. - Plan — ask for a detailed, file-by-file implementation plan.
- Implement — leave plan mode and build, verifying against the plan.
- Commit — descriptive message, then open a PR.
Durable context: where memory lives
A focused thread is short-lived. Durable context is what survives a /clear or a new session. You have three layers — use the right one.
Projects (Claude app)
Attach files and a custom instruction set to a workspace so every chat in that Project starts with the same background. Best for non-code work — research, writing, planning.
CLAUDE.md (Claude Code)
Read at the start of every session, before anything else. The most reliable channel for project conventions. Generate a starter with /init, then trim it.
Auto memory (MEMORY.md)
Claude writes notes to itself across sessions. The first 200 lines or 25KB load at startup — anything past that is ignored. Toggle with /memory.
CLAUDE.md lives in a few places, and they stack. Use /memory to see which files are actually loaded.
~/.claude/CLAUDE.md # applies to every session, all projects
./CLAUDE.md # project root, checked into git (team shares it)
./.claude/CLAUDE.md # project, alternate location
./CLAUDE.local.md # personal, gitignored
src/api/CLAUDE.md # per-directory, loads on demand when you touch that dir
YOU MUST or IMPORTANT on rules that keep getting ignored.Manage context deliberately
The context window is the resource that matters most. Performance drops as it fills, and Claude starts forgetting early instructions. Treat free context like a budget you spend on purpose.
/context # see what is eating your context window
/mcp # per-server token cost (MCP schemas are deferred until used)
/clear # full reset between unrelated tasks
/compact # summarize history, preserving key decisions
/compact focus on the auth refactor and the failing tests # targeted compact
For long single tasks Claude Code auto-compacts near the limit — it drops old tool output first, then summarizes while keeping code, file paths, and decisions. You can also bias what survives with a Compact Instructions section in CLAUDE.md.
@src/auth.ts or @app.ts#5-10. Don't ask Claude to “investigate” with no scope — it will read hundreds of files and flood your context.For broad exploration, delegate to a subagent. It runs in its own fresh context window and returns only a summary, so the research, logs, and doc-reads never touch your main thread.
Which mechanism do I reach for?
This is the question power users get wrong most often. Match the job to the lightest tool that does it.
| Reach for… | When you want… | Loads |
|---|---|---|
| CLAUDE.md | Project conventions Claude needs every session (build cmd, test runner, style rules it can't guess) | Always, at startup |
| Skill | Deep how-to for a specific task, only when relevant (e.g. “generate a PDF”) | On demand, by name/description |
| Path-scoped rule | Instructions tied to certain files (.claude/rules/ with a paths frontmatter) | Only when matching files are touched |
| Subagent | Isolate noisy work (research, review, tests) so the main context stays clean | Spawned per task, returns a summary |
| MCP server | Connect Claude to an external system (GitHub, a DB, Notion, browser) | Tool schemas deferred until used |
| Hook | Deterministic action on every event (auto-format on save, block a command) | The harness runs it — not the model |
| Plugin | Bundle and share skills, commands, agents, and MCP config as one unit | Installed once, brings its pieces |
| API / Agent SDK | Run the agent loop in your own code, headless, on your infrastructure | You build and deploy it |
Patterns for repeatable work
Anything you do more than twice should become a command. Custom slash commands are just markdown files — drop one in .claude/commands/ and it becomes /name.
# .claude/commands/ship.md
Review the current diff with /code-review, fix any CRITICAL
or HIGH issues, run the test suite, then write a conventional
commit message and open a PR. Verify the build passes first.
Define reusable subagents the same way, in .claude/agents/, with frontmatter to route cheap work to a faster model and scope its tools.
---
description: Reviews diffs for correctness. Use proactively after edits.
model: haiku # route cheap, frequent work to Haiku 4.5
tools: Read, Grep, Bash
isolation: worktree # run in its own git worktree
---
You are an adversarial reviewer. You see only the diff and the
acceptance criteria. Flag ONLY gaps that affect correctness.
/batch to split it into 5–30 independent units, each in its own git worktree — then 3–5 sessions work in parallel without colliding. Persist the plan to PLAN.md so it survives any compaction.An operating rhythm you can adopt
Put it together into a loop you run for every meaningful goal.
- Frame — one goal, fresh thread.
/clearif reusing a session. - Explore + plan — plan mode, then a written plan saved to a file.
- Scope context —
@only the files in play; delegate wide research to a subagent. - Implement in increments — small, testable steps; give Claude a check it can run.
- Verify — run
/code-reviewin a fresh context before calling it done. - Capture — commit, and fold any new lesson into CLAUDE.md.
- Reset —
/compactfor a long task,/clearbefore the next goal.
From the Source: Boris Cherny
A complete, slide-by-slide walkthrough of Boris Cherny's Claude Code masterclass.
A full walkthrough of "Mastering Claude Code in 30 Minutes" by Boris Cherny — the creator of Claude Code — turned into slide-by-slide notes you can act on.
Source: Mastering Claude Code in 30 Minutes — Boris Cherny (Code w/ Claude, Anthropic). Quotes paraphrased.
From the Source: Boris Cherny on Claude Code
This section distills a talk by Boris Cherny — a member of technical staff at Anthropic and the creator of Claude Code — titled "Mastering Claude Code in 30 Minutes." Everything here comes straight from his words at Code w/ Claude.
Slide 1 — What Claude Code Is
A new kind of AI assistant
Past coding assistants completed a line, or a few lines, at a time — autocomplete on steroids. Claude Code is different: it is fully agentic. It is meant to build whole features, write entire functions and files, and fix entire bugs in one go.
Old way: line completion
Suggests the next line or two as you type. You stay the driver for every keystroke.
Claude Code: fully agentic
You describe the outcome — "build this feature," "fix this bug" — and it figures out the steps end to end.
Works with ALL your tools — no workflow change
A core design goal: you don't change how you work to use it. Claude Code slots into your existing setup instead of replacing it.
| Dimension | What's supported |
|---|---|
| IDEs / editors | VS Code, Xcode, JetBrains, Vim/Emacs |
| Terminal | Every terminal — it's terminal-native |
| Where it runs | Locally, over remote SSH, inside tmux |
| Scope | General purpose — not tied to one language or stack |
A power tool, intentionally un-opinionated
When you open Claude Code, you just see a prompt bar — nothing else. That minimalism is deliberate.
Slide 2 — Recommended First-Time Setup
Boris recommends a short setup pass the first time you run Claude Code. Each step is a one-time investment that makes everything afterward smoother.
- Run
/terminal-setupto fix multi-line input. - Run
/themeto pick a theme that's comfortable for your eyes. - Run
/install-github-appto bring Claude into your GitHub flow. - Customize your allowed tools so you stop getting prompted for routine actions.
- (macOS) Turn on Dictation so you can speak your prompts.
1. /terminal-setup — better multi-line input
By default, adding a new line in a terminal prompt is awkward. This command wires up Shift+Enter for new lines instead of forcing you to type trailing backslashes.
/terminal-setup
# now: Shift+Enter = newline (no more \ at line ends)
2. /theme — light, dark, and colorblind-friendly
Set the theme to match your environment and accessibility needs, including a Daltonize option for colorblind users.
/theme
# choose: light | dark | Daltonize (colorblind)
3. /install-github-app — @mention Claude on GitHub
This installs the new GitHub app. Once it's set up, you can @mention Claude on any GitHub issue or pull request and have it respond right there in the GitHub UI.
/install-github-app
# then on any issue/PR: @claude please review this and suggest a fix
4. Customize your allowed tools
Claude Code asks for approval before taking certain actions. If there's something you get prompted about constantly, allow-list it once so you're not interrupted every time.
5. Pro tip — talk to Claude Code (macOS Dictation)
Enable Dictation in System Settings > Accessibility > Dictation, then hit the dictation key twice and just speak your prompt out loud.
Slide 3 — Tip 1: Start with Codebase Q&A
This is Boris's number-one recommendation and the single best way to begin with Claude Code. Before you ask it to write anything, just point it at your repo and ask questions. It is also exactly what Anthropic teaches every new hire on day one of technical onboarding.
The onboarding result
Codebase Q&A collapsed the time it takes a new engineer to become productive at Anthropic.
| Onboarding | Before Claude Code | After Claude Code |
|---|---|---|
| Time to productive | ~2–3 weeks | ~2–3 days |
Zero setup — and your code stays yours
There is nothing to configure before you can ask questions. Unlike tools that build a search index or upload your repo to a server, Claude Code reads the code locally, in place, on demand.
No indexing
No index to build, no embeddings, no wait. You point it at the repo and ask immediately.
Stays local
Code is NOT uploaded to a remote database. It stays on your machine and you stay in control.
No training
Anthropic does NOT train generative models on your code.
Zero setup
No remote service, no indices to maintain — just open the prompt and ask.
Deeper than text search
Plain text search (grep / Ctrl-F) only finds literal string matches. Claude Code goes a level deeper — it actually understands the code, so it can answer relational questions and surface real examples, at the quality of a wiki or internal docs.
> How is the AuthSession class instantiated and used?
Instead of listing every line containing AuthSession, it finds the real call sites and shows you how the class is actually used in practice.
Ask about git history
This is where it shines. The model knows git — it was never system-prompted to do this; "the model is awesome" and reaches for git on its own.
> Why does this function have 15 arguments named this weird way?
It looks through the git history to find who introduced those arguments, what issues the commits link to, and summarizes the whole story back to you.
Ask about GitHub issues
Using web fetch, Claude Code can read a linked GitHub issue and pull that context into its answer — connecting the code in front of you to the discussion behind it.
The Monday standup trick
Boris asks this every Monday before standup. It reads the git log, already knows his username, and produces a readout he pastes straight into a doc.
> What did I ship this week?
Slide 4 — Tip 2: Editing Code with a Small Toolset
Claude Code is fully agentic. Agentic means: you give the model tools, and it figures out how to use them to accomplish your goal — you don't micromanage the steps.
The whole toolset is tiny
There are only a handful of core tools. The power comes from how Claude strings them together to explore, brainstorm, and then edit.
Edit files
Make changes to your source files.
Run bash
Execute commands in your shell.
Search files
Find and read the relevant code.
> Add rate limiting to the /login endpoint and update the tests.
From that one sentence, Claude will search the codebase to find the endpoint, read the surrounding code, edit the handler, and run the tests — chaining its small toolset together on its own.
Slide 5 — Brainstorm & Plan Before Big Features
The most common failure mode: handing Claude an enormous task cold. People ask it to "implement this 3,000-line feature" in one go. Sometimes it nails it — but sometimes it builds the wrong thing, because it guessed at decisions you never spelled out.
The easiest fix: just ask it to think first
You do not need plan mode or any special tool. Plain language in your prompt is enough.
> Brainstorm a few approaches for this feature, make a plan,
run it by me, and ask for my approval before you write any code.
- Explore the relevant code
- Brainstorm a few options
- Write a plan
- Get your approval
- Then — and only then — write the code
Slide 6 — The "Commit, Push, PR" Incantation
Once a change is done, one short phrase hands the whole git wrap-up to Claude.
> commit, push, and open a PR
From that, Claude makes a commit, creates a branch, pushes it, and opens a pull request — the full ship sequence.
It matches your conventions automatically
Claude reads your code, your history, and your git log to figure out how your team writes commit messages — and then matches that format. This is not system-prompted; the model works it out from context.
Commit
Stages and commits the change.
Branch + push
Creates a branch and pushes it up.
Open PR
Opens the pull request for review.
Match style
Reads git log to mirror your commit-message format.
Slide 7 — Plug in your team's tools: bash/CLI and MCP
Claude Code is agentic: you give it tools, it figures out how to use them. Tip 3 in the talk is about extending its tiny built-in toolset (edit files, run bash, search files) with the tools your team already lives in. There are two kinds.
1. Bash / CLI tools
Tell Claude to use a command-line tool. The trick: point it at --help so it learns the tool on the spot, then it starts using it correctly.
2. MCP tools
Add an MCP server, tell Claude how to use it, and it begins calling it. Extremely powerful on a new codebase — hand Claude every tool your team already uses.
Teaching Claude a CLI
You don't pre-program specific tools — you just describe what you want and let Claude explore. The "made-up Barley CLI" example: introduce a tool by name and let it read its own help text.
> We use the `barley` CLI for deploys. Run `barley --help` to
learn it, then deploy the staging environment.
CLAUDE.md so it persists. One-off discovery via --help is great; repeated use belongs in shared context.Adding an MCP server
MCP (Model Context Protocol) lets Claude talk to external systems — issue trackers, design tools, browsers, internal services. Boris's headline example: a Puppeteer MCP server that lets Claude drive a real browser and screenshot the result.
# Register an MCP server, then just ask Claude to use it
claude mcp add puppeteer -- npx -y @modelcontextprotocol/server-puppeteer
> Use the Puppeteer server to open localhost:3000 and screenshot it.
Slide 8 — Workflows: explore → plan → confirm, then let Claude CHECK its work
The most common effective workflow is simple: let Claude explore a bit, plan a bit, and ask for confirmation before it writes code. You don't need a special mode — just ask.
> Before you write any code: brainstorm a few approaches,
make a plan, run it by me, and wait for my approval.
The big one: give Claude a way to check its own work
This is the most powerful pattern in the talk. When Claude has a feedback tool to verify what it built, it can iterate by itself — running, looking, fixing, re-running — until the result is near-perfect. No human in the loop between passes.
Unit / integration tests
Claude writes tests, runs them, reads failures, and fixes the code until they pass.
Puppeteer screenshots
It screenshots a web UI in a headless browser, compares to the target, and adjusts.
iOS simulator screenshots
It captures the running app from the simulator and self-corrects layout/visuals.
- Give Claude a mock and say "build this web UI."
- It gets close on the first pass.
- Give it a check (tests / screenshot tool) and let it iterate 2–3 times.
- It converges to near-perfect on its own.
Slide 9 — CLAUDE.md: the simplest way to give Claude context
More context = smarter decisions. As an engineer you carry tons of unwritten context; CLAUDE.md is how you hand some of that to Claude. It's a special filename in the project root (the same directory you start Claude in), and it's auto-read into context at the start of every session — it rides along on your first user turn.
Local vs checked-in
Project CLAUDE.md (checked in)
Commit it, share it with the team. Write it once and everyone benefits — the network effect.
Local CLAUDE.md (not checked in)
Just for you. Personal notes and preferences you don't want in source control.
What to put in it
Common bash commands
Build, test, lint, run — the commands you'd tell a new hire.
Common MCP tools
Which servers exist and how the team uses them.
Architecture decisions
Key choices and conventions worth knowing up front.
Important / core files
Where the load-bearing code lives.
Style guide
Formatting and code conventions to follow.
Anything onboarding-relevant
What you'd need to know to work in this codebase.
Nested CLAUDE.md
You can drop a CLAUDE.md into child directories. Unlike the root file (always loaded), nested ones are pulled in on demand — only when Claude is working in that subtree. Great for module-specific guidance without bloating the global context.
repo/
├── CLAUDE.md # root — always loaded
├── frontend/
│ └── CLAUDE.md # loaded when Claude works in frontend/
└── services/payments/
└── CLAUDE.md # loaded on demand for payments work
Slide 10 — The context & config hierarchy
It's not just CLAUDE.md — everything in Claude Code is hierarchical. Config flows from your repo, up to your personal global settings, up to org-wide enterprise policy. This applies to CLAUDE.md, slash commands, permissions, and MCP servers alike.
| Level | Scope | Governs |
|---|---|---|
| Project | Specific to one git repo. Check it in to share, or keep it personal/local. | Project CLAUDE.md · repo slash commands · .mcp.json · repo permissions |
| Global (user) | Applies across all of your projects on this machine. | Your personal CLAUDE.md · your slash commands · your permissions · your MCP servers |
| Enterprise | Global config rolled out to all employees, org-wide. | Mandated CLAUDE.md · shared slash commands · auto-approved and blocked commands/URLs · org MCP servers |
Permissions: auto-approve AND block, org-wide
Permissions work in both directions at the enterprise level:
Auto-approve
Check a safe command (e.g. your test runner) into enterprise policy so it's auto-approved for every employee — no per-prompt clicking.
Block (hard deny)
Block a command or a URL org-wide — e.g. a URL that should never be fetched. Employees can't override it, so it can never be fetched.
.mcp.json — ship MCP servers with the repo
Check a .mcp.json into the repository and anyone who runs Claude Code there is prompted to install those MCP servers automatically.
// .mcp.json (checked into the repo)
{
"mcpServers": {
"puppeteer": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"]
}
}
}
.mcp.json, so every engineer can drive end-to-end tests and screenshot/iterate without installing anything themselves.Slide 11 — /memory and the # shortcut
Because memory is hierarchical, you need a way to see and edit it. Two tools do that.
/memory
Shows every memory file currently being pulled into context — enterprise policy, your user memory, the project CLAUDE.md, and any nested CLAUDE.md — and lets you open and edit a specific one.
Type #
The fastest way to remember something. Type #, write the note, and pick which memory file it lands in. It's incorporated immediately.
/memory # list & edit all active memory files
# always run the test suite with `pnpm test --silent`
↑ press '#', type the note, choose the destination file
Slide 12 — Recommendation: start with shared project context
If you're unsure where to begin in the hierarchy, Boris's answer is clear: shared project context. Write your CLAUDE.md, slash commands, permissions, and .mcp.json once, check them into the repo, and share them with the team.
| If you want… | Start here |
|---|---|
| Maximum payoff for the least effort | Checked-in project CLAUDE.md |
| Team tools available to everyone | Checked-in .mcp.json |
| Consistent, safe commands across the team | Repo (then enterprise) permissions |
| Personal-only tweaks | Local CLAUDE.md / global config |
Slide 13 — Key Bindings: The Hidden Keyboard Layer
Boris flags these because most people never discover them — Claude Code looks like a single prompt bar, but a handful of keystrokes unlock its real power. Each one is about steering Claude mid-flight or feeding it exactly the context it needs.
| Key | What it does | When Boris uses it |
|---|---|---|
| Shift+Tab | Accept edits / cycle into auto-accept-edits mode — file edits are auto-applied, but bash commands still need approval | When Claude is clearly on the right track or iterating on tests; you can always ask Claude to undo later |
| # | Make Claude remember something — it's incorporated into CLAUDE.md immediately, and you pick which memory file it goes to | "It's not using a tool correctly" — teach it once so the fix persists across sessions |
| ! | Drop into bash mode and run a command locally — the command and its output both go into the context window | Long-running commands, or anything you want Claude to "see" without it running the command itself |
| @ | Mention files and folders to pull them into context | Pointing Claude at the exact files/dirs that matter |
| Esc | Stop whatever Claude is doing — anytime, safely; it won't corrupt the session | Claude suggested a 20-line edit, 19 lines are perfect — stop it, correct the one line, redo |
| Esc Esc | Jump back through conversation history | Rewinding to an earlier point in the session |
| Ctrl+R | See the full output — the exact same thing Claude sees in its context window | Inspecting truncated output / understanding what the model is actually working from |
Resuming work across sessions
You don't lose a session when you close the terminal. Two startup flags bring you right back.
# Pick a previous session to resume
claude --resume
# Continue the most recent session
claude --continue
Slide 14 — The SDK Is Already On Your Machine
There's no separate package to install. The -p / --print flag is the Claude Code SDK. Pass a prompt, get a result back, and exit — non-interactive, scriptable, composable.
claude -p as a super-intelligent Unix utility. Give it a prompt, get JSON back. Pipe in, pipe out — exactly like grep, jq, or any other tool in your pipeline.The shape of a call
You control the prompt, which tools Claude is allowed to use (including specific bash commands), and the output format — plain text, a JSON blob, or a stream of JSON you can process as it arrives.
# Basic one-shot
claude -p "summarize the failing tests"
# Constrain the toolset (allow-list specific tools/commands)
claude -p "triage this build" --allowedTools "Bash(git log:*)" "Read"
# Structured output for programmatic consumption
claude -p "list the changed files and why" --output-format json
# Streaming JSON to process incrementally
claude -p "analyze this log" --output-format stream-json
Piping: in and out
Because it reads stdin and writes stdout, claude -p slots into any shell pipeline.
# Feed it command output, parse the result with jq
git status | claude -p "which of these files are risky to commit?" --output-format json | jq '.result'
# Pull a giant log from a GCP bucket and find what's interesting
gsutil cat gs://my-bucket/prod.log | claude -p "what looks anomalous here?"
# Fetch data from the Sentry CLI and let Claude act on it
sentry-cli issues list | claude -p "group these by likely root cause"
CI
Run Claude as a step in your continuous-integration pipeline — review diffs, summarize failures, gate merges.
Incident response
Pipe logs, traces, or Sentry output straight in so Claude can triage and recommend actions during an outage.
Pipelines
Drop it anywhere data flows — read a giant log, find what matters, hand the result to the next stage.
Slide 15 — Advanced Parallelism: Many Claudes at Once
Boris calls himself a "Claude normie" — one Claude at a time, a few terminal tabs for a few repos. But the power users he sees, inside and outside Anthropic, almost always run many Claudes in parallel.
How the power users scale up
tmux + SSH
SSH sessions and tmux tunnels into running Claude sessions, so multiple agents persist and survive disconnects.
Multiple checkouts
Several separate checkouts of the same repo, each running its own Claude, so many tasks progress at once.
Git worktrees
Use git worktree for isolation — each Claude works in its own branch/directory without stepping on the others.
# Spin up an isolated worktree so a parallel Claude can't collide with your main checkout
git worktree add ../feature-x -b feature-x
# ...then start Claude inside that directory
Slide 16 — Q&A Gems
The hardest thing to build: safe bash
Boris named making bash commands safe as the single hardest part of building Claude Code. Bash is inherently dangerous — it changes system state in unexpected ways — yet approving every command by hand kills productivity, and not everyone runs inside Docker.
- Read-only detection — some commands are recognized as read-only and treated as low risk.
- Static analysis — figures out which commands can be safely combined (e.g. chained commands) before running them.
- Tiered permissions — a layered allow-list / block-list system that operates at different levels (project, global, enterprise).
Fully multimodal from day one
Claude Code has always been multimodal — it's just hard to discover in a terminal. There are three ways to give it an image.
Drag & drop
Drag an image straight onto the terminal.
File path
Give it a path to an image file.
Copy-paste
Paste an image directly into the prompt.
Why a CLI and not an IDE?
At Anthropic people use a huge range of editors — VS Code, Zed, Xcode, Vim, Emacs. The terminal is the common denominator that works for everyone. And given how fast the model is improving, there's a real chance people won't be using IDEs the same way by year's end — so they deliberately avoid over-investing in UI layers that may not matter.
How much does Anthropic itself use it?
▶︎ Watch the full talk: youtube.com/watch?v=B_KAEqiC-0Q
Troubleshooting & Pro Tips
Recover fast, then level up
Most Claude Code problems are context problems in disguise — know the recovery move and you stay fast.
Performance degrades as the context window fills, so almost every fix below is about protecting that one resource. Learn the symptom, the cause, and the keystroke that resets it.
The fast triage table
| Symptom | Likely cause | Fix |
|---|---|---|
| Replies slow, vague, or “forgetting” earlier facts | Context window near full | /compact at a breakpoint, or /clear for a new task |
| Output drifts off the goal | Goal buried under noise / failed attempts | Re-state the goal, switch to plan mode (Shift+Tab) |
| Same fix fails twice in a row | Failed approaches polluting context | /rewind to erase, or /clear and re-prompt |
| Constant permission prompts | No allow rules / wrong mode | /permissions add allow rules; set --permission-mode acceptEdits |
| “File is outside allowed directories” | Path not in working dirs | /add-dir <path> or restart with --add-dir |
| MCP tools missing / server red | Not connected, bad transport, or auth | /mcp to check status and re-auth |
Edited .mcp.json but nothing changed | Config read only at startup | Restart the session |
| Installation acting weird | Broken setup / stale state | /doctor to diagnose health |
When context is full
Two moves, two situations. Use /clear when you are starting something unrelated — fresh context is free and gives the cleanest results. Use /compact when you need to keep working on the same thread but reclaim room.
/clear # wipe history, start a fresh task
/compact # summarize history, keep key facts
/compact focus on the API changes # steer what the summary keeps
/context shows a live token breakdown by category so you can see exactly what is eating your window before deciding to clear or compact.Re-anchor after a reset
Project-root CLAUDE.md and MEMORY.md are re-injected from disk after a compact, but skill descriptions and path-scoped rules are lost until re-triggered. Persist plans to a file so they survive any reset.
echo "Goal: migrate auth to OAuth. Constraints: no schema change." > PLAN.md
# after /clear or /compact:
# "Read PLAN.md and continue from step 3."
When output drifts
Drift means the model lost the thread, not that it got dumber. First, re-state the goal in one specific sentence with the file names involved. Second, drop into plan mode so it must read and think before it touches anything.
If a correction fails twice, stop correcting. Rewinding erases the failed attempt entirely; correcting leaves the bad approach in context where it keeps misleading the model.
Esc # stop Claude mid-action, context preserved
Esc Esc # open rewind menu (or /rewind) to restore prior state
Permission & sandbox issues
Prompts are a feature, but you can tune the friction. Add durable allow rules with /permissions, or start the session in a looser mode for trusted work.
claude --permission-mode acceptEdits # auto-approve file writes + mkdir/mv/cp
claude --permission-mode plan # read-only until you approve a plan
claude --allowedTools "Bash(git*),Edit" # auto-approve specific tools only
For “outside allowed directories” errors, grant access to the extra path. Note that --add-dir grants file access but does not load that dir’s .claude/ config.
/add-dir ../shared-lib # in-session
claude --add-dir ../apps ../lib # at launch
--dangerously-skip-permissions for routine work. Prefer --allow-dangerously-skip-permissions, which only adds bypassPermissions to the Shift+Tab cycle so you can start safely in plan mode and switch to it later.MCP server not connecting
Start with /mcp — it shows every server, its connection status, available tools, and is where you complete OAuth for remote servers. From there, check the three usual causes: transport, scope, and restart timing.
- Run
/mcpand read the status. Red usually means a crashed subprocess or an unfinished OAuth. - Verify the transport. Cloud servers need
--transport http;sseis deprecated; omit the flag for local stdio servers. - If you just edited
.mcp.json, restart — it is read only at session start. - For a project-scoped server, approve it at the first-use prompt (a guard against cloned repos auto-launching processes).
claude mcp add --transport http docs https://code.claude.com/docs/mcp
claude mcp add playwright -- npx -y @playwright/mcp@latest
claude mcp get docs # shows which scope holds it
claude mcp reset-project-choices # re-trigger project approval prompts
--transport, --env, --scope, --header) must come BEFORE the server name. For stdio, everything after -- is the launch command, passed untouched.~/.claude.json and the project-root .mcp.json. It does NOT read ~/.claude/mcp.json — a common reason a server “won’t load.”High-leverage pro habits
One scope per session
Fresh context is free. Finish a task, /clear, start the next. Don’t let unrelated work share a window.
Plan before you build
Explore, plan in plan mode, then execute. Front-loading context writes a stable prefix that caches — driving 90%+ cache hits.
Be specific
Name the files, state constraints, point to an example pattern. Precise instructions mean far fewer corrections.
Scope investigations
Never say a bare “investigate” — it floods context. Delegate research to a subagent that returns a short summary.
Prune CLAUDE.md
For each line ask: would removing this cause a mistake? A bloated memory file makes Claude ignore the rules that matter.
Read narrow
File reads dominate context. Read only the ranges you need and prefer rg/grep over dumping whole files.
Route by cost
Haiku for simple/high-volume, Sonnet for most work, Opus for the hardest reasoning. Switch with /model.
Reserve ultrathink
Escalate ‘think’ → ‘think hard’ → ‘ultrathink’ only for genuinely hard reasoning. More thinking aids depth, not output compliance.
PLAN.md) and break large work into per-session phases with a /clear between them. External files survive any context reset; conversation history does not.CLAUDE.md, and /compact all invalidate the prompt cache and trigger a slow, expensive uncached turn. Batch those changes at natural breakpoints.Master Cheat Sheet
The one tab you keep open
Every flag, command, shortcut, and "which tool when" call in one scannable place.
CLI Flags (the daily ten)
| Flag | Alias | Does |
|---|---|---|
--print | -p | Headless: run once, print, exit. Pipe-friendly. |
--continue | -c | Resume the most recent conversation in this dir. |
--resume | -r | Resume by ID/name, or open the picker. |
--model | Set model: sonnet, opus, or full id. | |
--permission-mode | default / acceptEdits / plan / auto / dontAsk / bypassPermissions. | |
--add-dir | Add extra working dirs for read/edit. | |
--output-format | json for scriptable output (grab session_id). | |
--allowedTools | Auto-approve specific tools, no prompt. | |
--name | -n | Name the session for later --resume. |
--bare | Skip all auto-discovery for fast, reproducible CI. |
Copy-paste launches
# Interactive with a starting prompt
claude "refactor the auth module"
# Headless, pipe content in
cat logs.txt | claude -p "explain the errors"
# Scripted continuation in the current dir
claude -c -p "Check for type errors"
# Start in plan mode (read-only until you approve)
claude --permission-mode plan
session_id=$(claude -p "Start a review" --output-format json | jq -r '.session_id')
claude -p "Continue that review" --resume "$session_id"
--dangerously-skip-permissions bypasses every prompt. Prefer --permission-mode dontAsk plus an --allowedTools allowlist for locked-down automation.Slash Commands
Context & cost
| Command | Does |
|---|---|
/clear | Wipe history, free context. Use between unrelated tasks. |
/compact | Summarize to reclaim context. Steerable: /compact focus on the API changes. |
/context | Live token breakdown by category. |
/cost | Session token usage and spend. |
/memory | Open CLAUDE.md files in your editor. |
/rewind | Restore prior conversation/code state (erases the failed attempt). |
Models & agents
| Command | Does |
|---|---|
/model | Switch model: opus, sonnet, haiku, default, or picker. |
/effort | Set reasoning effort low/medium/high; auto resets. |
/agents | Create/edit subagents; Running tab shows live ones. |
/mcp | Server status, tools, OAuth auth, per-server cost. |
/plugin | Discover, install, enable/disable plugins. |
Project & review
| Command | Does |
|---|---|
/init | Scan repo, generate CLAUDE.md. |
/review [PR] | Code review of changes or a specific PR. |
/security-review | Security-focused analysis of current changes. |
/resume | Pick a past conversation to continue. |
/permissions | Manage allow/deny rules. |
/doctor | Diagnose installation health. |
REPL Shortcuts
| Key / Token | Action |
|---|---|
| Shift+Tab | Cycle normal / plan / auto-accept mode. |
| Esc | Stop Claude mid-action; context preserved. |
| Esc Esc | Open rewind menu (same as /rewind). |
| Option+T / Alt+T | Toggle extended thinking. |
| Ctrl+B | Background a running task. |
@path/to/file | Reference file contents in a prompt. |
@agent-name | Guarantee a specific subagent runs. |
!command | Run bash; output injected into context. |
# (line start) | Quick-append a line to CLAUDE.md memory. |
Thinking Budget
Keywords escalate reasoning depth in your prompt. More thinking means deeper reasoning, not better rule-following.
think < think hard < think harder < ultrathink (max)
ultrathink for architecture or hard multi-file debugging. Use think / think hard for everyday moderate tasks.Model Picker
| Model | $/MTok in/out | Context | Use for |
|---|---|---|---|
| Opus 4.8 | $5 / $25 | 1M | Long-horizon agentic coding, big refactors, hard reasoning. |
| Sonnet 4.6 | $3 / $15 | 1M | Most production work: code, analysis, tool use. |
| Haiku 4.5 | $1 / $5 | 200k | Real-time, high-volume, sub-agents, cost-sensitive. |
Which Tool When
CLAUDE.md
When: facts Claude needs every session. Why: re-injected after compaction. Non-guessable commands, code style, gotchas. Prune ruthlessly.
Skill
When: a repeatable how-to that loads only on demand. Why: metadata costs ~100 tokens; the body loads when your request matches. .claude/skills/<name>/SKILL.md.
Subagent
When: verbose, self-contained work that returns a summary. Why: runs in its own fresh context, then reports back. Great for parallel research.
MCP
When: Claude needs external systems (DBs, APIs, SaaS). Why: standard tool protocol. claude mcp add; tools namespaced mcp__server__tool.
Hook
When: deterministic action the model must not skip. Why: shell/HTTP at lifecycle points. Auto-format on Edit, block bad Bash, run tests on Stop.
Plugin
When: bundle and share many of the above. Why: one installable directory of skills, agents, hooks, MCP. /plugin install name@marketplace.
Config-File Map
| What | User level | Project (shared) |
|---|---|---|
| Memory | ~/.claude/CLAUDE.md | ./CLAUDE.md |
| Settings | ~/.claude/settings.json | .claude/settings.json |
| Skills | ~/.claude/skills/ | .claude/skills/ |
| Subagents | ~/.claude/agents/ | .claude/agents/ |
| MCP | ~/.claude.json | .mcp.json |
| Hooks | settings.json (hooks key) | settings.json (hooks key) |
.claude/settings.local.json (gitignored). MCP local scope lives in ~/.claude.json, not that file.Quick Recipes
# Add a remote MCP server (project-shared, checked in)
claude mcp add --transport http --scope project docs https://code.claude.com/docs/mcp
# Add a local stdio MCP server
claude mcp add playwright -- npx -y @playwright/mcp@latest
# Auto-format every edit (PostToolUse hook in settings.json)
{"hooks": {"PostToolUse": [{"matcher": "Edit|Write",
"hooks": [{"type": "command",
"command": "jq -r .tool_input.file_path | xargs prettier --write"}]}]}}
0 = success, 2 = blocking error (stderr fed back to Claude), other non-zero = non-blocking. Use 2 to enforce policy.Custom Slash Command (60 seconds)
- Create
.claude/commands/optimize.md(filename becomes/optimize). - Add YAML frontmatter:
description,argument-hint, optionalallowed-tools,model. - Write the prompt body. Use
$ARGUMENTS,$1,@path, and!cmdfor dynamic content.
---
description: Optimize a file for performance
argument-hint: [file path]
allowed-tools: Read, Edit, Bash(git diff:*)
---
Review @$1 for performance issues and apply safe fixes.
Current diff:
!git diff $1
/clear between unrelated tasks.What Is AI, Really?
Strip away the hype: AI is just software doing jobs we thought needed a brain.
Before we touch a single buzzword, let's nail down what "AI" actually means. Short version: it's software that does tasks we used to assume needed a human brain. That's it. No glowing red eyes required.
One plain sentence
Artificial Intelligence is software that performs tasks we once thought required human intelligence — spotting spam, suggesting a song you'll like, finding the fastest route, answering a messy question in plain English. When you put it that way, "AI" stops sounding like science fiction and starts sounding like Tuesday.
The two ways software can be smart
This next idea is the spine of the entire track, so it's worth slowing down for. There are really only two ways to build something "intelligent," and the difference is who writes the rules.
Rules-based (hand-written)
A human writes every if-this-then-that by hand. "IF the email contains 'FREE VIAGRA' THEN mark as spam." Predictable and easy to inspect — but a human has to imagine every case in advance, and spammers are creative.
Learned (figured out from examples)
You don't write the rules. You show the system thousands of examples ("these 10,000 were spam, these weren't") and it figures out the patterns itself. Flexible and powerful — but harder to peek inside and explain.
The nesting dolls
Here's where the jargon usually gets confusing — AI, ML, Deep Learning, Generative AI, LLMs. They sound like rivals. They're not. They're Russian nesting dolls (matryoshka): open the big one and a smaller, more specialized one sits inside.
| Doll | What it is | Everyday face |
|---|---|---|
| AI (biggest) | The whole idea: machines doing brain-like tasks | The entire field |
| Machine Learning | AI that learns rules from examples instead of being hand-coded | Spam filter, maps ETA |
| Deep Learning | ML using many-layered "neural networks" — math loosely inspired by brain cells, great with images, audio, and language | Face unlock, voice typing |
| Generative AI / LLMs (smallest) | Deep learning that creates new text, images, or code (an LLM = Large Language Model, the text kind) | Claude, ChatGPT |
Each doll is a more specific kind of the one wrapped around it. So Claude is a Large Language Model, which is Generative AI, which is Deep Learning, which is Machine Learning, which is AI. All true at once. No contradiction.
Narrow vs general — set your expectations
Today's AI is narrow: brilliant at one job. A spam filter can't drive a car. Claude can write a poem but can't water your plants. AGI — Artificial General Intelligence, a do-anything, human-level mind — does not exist yet, no matter what a flashy demo implies.
The whole story in one breath
You don't need the full history — just the shape of it. Read this as a single timeline:
- 1950 — Alan Turing asks "can machines think?" The field gets its name a few years later, in 1956.
- 1980s — Expert systems: giant hand-written rulebooks. Clever, but brittle.
- 1990s–2000s — Machine Learning takes over: let the data write the rules.
- 2012 — Deep learning explodes after neural nets crush an image-recognition contest.
- 2017 — The transformer is invented — the engine behind every modern LLM.
- 2022 — ChatGPT launches and suddenly everyone's grandma knows the word "AI."
- Today — Agents: AI that doesn't just answer, but takes actions for you.
It's already in your pocket
If "AI" still feels abstract, look around — you've been using it for years:
Spam filter
Quietly sorting your inbox, learning what junk looks like.
Maps ETA
Predicting traffic to guess your arrival time within minutes.
Recommendations
The "you might also like" behind Netflix, Spotify, and your shopping cart.
Claude
An LLM you can just talk to in plain language.
Machine Learning: Learning from Data
Stop writing the rulebook — hand the machine the photo album and let it figure out "cat."
Here's the big flip this section owns: instead of a human painstakingly hand-coding rules, machine learning lets the computer learn the patterns straight from data. You don't tell it what a cat is — you show it, and it works the rest out itself.
The cat-photo story
Imagine teaching a child what a cat is. You don't hand them a rulebook saying "pointy ears, whiskers, four legs, smug expression." You just point at cats over the years and say "cat." Eventually the child recognizes a cat they've never seen before — even a weird hairless one.
Machine learning does the same thing. Show a model a giant album of labeled cat photos and it learns "cat-ness" on its own. Nobody writes the rules; the rules emerge from the examples. That's the whole shift.
The working vocabulary
Four words unlock almost every ML conversation. Learn these and you're fluent enough to follow along.
Features
The inputs — the raw stuff the model looks at. For a photo, the pixels. For a house price, the square footage and zip code.
Labels
The answers we want — "cat" or "not cat," the actual sale price. The correct response we're trying to predict.
Model
The thing that learns — the recipe that turns features into a guess at the label.
Parameters / weights
The dials the model tunes. Training is just nudging these numbers until the guesses get good. (Weights are the most common kind of parameter.)
Three ways a machine can learn
There are three big "learning styles," called paradigms. One line each:
| Paradigm | How it learns | Everyday picture |
|---|---|---|
| Supervised | From labeled examples (features + correct answers) | Flashcards with the answer on the back |
| Unsupervised | Finds hidden structure with no labels at all | Sorting a messy drawer into natural piles |
| Reinforcement | By trial and error, chasing rewards | Training a dog with treats |
The cat-album example is supervised learning — every photo came with a label. That's the most common starting point, and the one to keep in your head.
Two phases of a model's life: training vs inference
This split shows up everywhere in AI, so it's worth nailing now.
Training = studying
The model pores over the examples and adjusts its dials, over and over, until its guesses match the labels. Slow, expensive, and done in big batches (then occasionally repeated to keep the model fresh).
Inference = taking the test
The trained model meets brand-new data and makes a prediction. Fast, cheap, and what happens every time you use it. Like flipping a flashlight on — you don't need to know the electronics to use it.
Memorizing vs actually understanding
Here's the trap every model can fall into. We don't want it to memorize the album — we want it to understand cats well enough to handle ones it's never seen. That ability to handle new data is called generalization, and it's the entire point.
The failure is called overfitting. Overfitting is the student who memorizes every past exam word-for-word, then panics when the teacher changes the numbers. They aced the practice test and bombed the real one — because they learned the answer key, not the subject.
So far we've treated "the model" as a magic box that tunes dials. The most powerful ML models are built from layered networks of these dials stacked together — let's look under that hood next.
Neural Networks & Deep Learning
Inside every AI is a giant mixing board of dimmer switches that learned to tune itself.
Remember the nesting doll from earlier? Open the "machine learning" doll and inside sits deep learning. Let's crack that one open too — because it turns out to be built from millions of copies of one tiny, almost embarrassingly simple gadget: the artificial neuron. Build one, stack a bunch, and you've got the engine behind nearly every modern AI.
Step one: build a single neuron
An artificial neuron is basically a tiny calculator. It does three things in a row:
- Take some inputs — numbers coming in (say, the brightness of pixels, or "was it raining?").
- Multiply each by a weight, then add them up — the weight says how much that input matters. Big weight = "pay attention to this." Near-zero weight = "ignore it."
- Push the total through an activation function — a gate that bends the result so the network can learn curvy, non-straight-line patterns. Loosely: a bouncer who decides how much of the signal gets waved through.
That's the whole neuron. Inputs in, one number out. Not magic — arithmetic.
Step two: stack neurons into layers
One neuron makes one little decision. The power comes from wiring thousands of them together in layers, where each layer's output feeds the next.
Input layer
Where the raw data enters — the pixels, the words, the sensor readings. No thinking happens here; it's just the loading dock.
Hidden layers
The actual processing. Each layer transforms what the last one found into something more abstract. This is where the real work lives.
Output layer
The final answer — "cat," "spam," or the next word in a sentence.
And "deep" learning? It just means there are many hidden layers stacked between input and output. That's the entire secret behind the scary-sounding word. Deep = lots of layers. That's it.
The knobs: weights are everything
Here's the anchor image to keep in your head. Picture a giant sound-mixing board covered in dimmer-switch sliders. Each weight is one slider. The activation function is the gate each signal passes through after the sliders set its level. A real network has millions (or even billions) of these sliders.
When people say a model is "trained," they mean exactly one thing: all those sliders got nudged up and down until the output came out right. Training is just tuning the knobs. And here's the kicker — no human touches them by hand. The network adjusts its own sliders, which is the next thing to explain.
How it actually learns (the loop)
So how do you tune millions of sliders with nobody touching each one? You let the network do it, in a repeating four-beat loop:
| Beat | Plain-English meaning |
|---|---|
| Forward pass | Run data through and make a guess (it'll be terrible at first). |
| Loss | Measure how wrong the guess was — one number for "how bad." |
| Backpropagation | Trace the error backward to figure out which sliders caused it, and which way each should move. |
| Gradient descent | Nudge every slider a little in the helpful direction. Repeat — often millions of times. |
Gradient descent is the all-time favorite analogy here: imagine a hiker descending a foggy mountain, feeling which way is downhill and taking one careful step at a time. "Downhill" means "less error." Do that enough and you reach a valley floor — a network that's usually right.
Why deep beats hand-crafted features
Before deep learning, humans had to hand-engineer the clues a program looked for — "an edge here, a curve there." Tedious, brittle, and capped by human imagination.
Deep networks flip that. Given enough examples, they discover their own hierarchy of clues: early layers find edges, middle layers assemble edges into shapes, deeper layers assemble shapes into things like faces or wheels. Nobody programmed those stages — they emerged from turning the knobs.
One honest caveat
Where this is heading
You now know the whole engine: neurons, layers, weights as knobs, and a learning loop that tunes them by chasing lower error. Now take a network like this and make it enormous — billions of knobs — then feed it a huge chunk of all the text humans have ever written. Do that, and what you get is a Large Language Model. That's exactly where we go next.
What Is an LLM?
A giant autocomplete that picked up coding and translation as accidental side hobbies.
You've used an LLM if you've ever talked to Claude, ChatGPT, or Gemini. But under the hood, the thing doing all that brilliant work is doing something almost comically simple: guessing the next chunk of text, over and over.
The one-line definition
A Large Language Model (LLM) is a very large neural network (remember those from the last section?) trained to do one simple thing: predict the next token — a token being a small piece of text, roughly a word or part of a word. That's it. Everything else — the essays, the code, the patient explanations — grows out of that single skill done at an absurd scale.
What does "large" actually mean?
Two things are large here, and both matter:
Billions of parameters
Parameters are the tunable "weights" inside the neural net — the knobs it adjusts while learning. Modern LLMs have billions to trillions of them. More knobs means more nuance it can capture.
A giant text diet
It was trained on an enormous slice of the text humans put online, plus mountains of books. Think of it as having read more than any person could in many lifetimes.
"Predict" doesn't mean "look up"
When we say it predicts, we mean it models probabilities over what comes next. Type "I'll be there in ___" and your phone suggests five minutes. An LLM does the same basic trick — just trained on a huge chunk of the internet instead of your text history, and scaled up enormously. Same idea, wildly more power.
The big misconception to kill right now: an LLM is not a database or a search engine looking up stored answers. There's no filing cabinet inside it with "capital of France → Paris." Instead, Paris comes out because that pairing dominated its training data. It's a pattern predictor generating plausible text from learned weights — a statistical map of how language works, not a lookup table.
| Search engine | LLM | |
|---|---|---|
| What it does | Points you to where an answer might live | Tries to produce the answer itself |
| Source | Indexed web pages | Patterns baked into its weights |
| If it's wrong | It showed you a page; you can judge the source | It can state a wrong answer fluently and confidently |
How it writes: the autoregressive loop
Here's the surprising part — the model doesn't compose a whole answer at once. It builds every response one token at a time, in a tight loop:
- Predict a probability for every possible next token, given everything so far.
- Pick one token from that distribution (often, but not always, a top-ranked one).
- Append it, feed the whole thing back in, and predict again.
- Repeat — sometimes thousands of times — until the answer is complete.
This predict-pick-append-repeat cycle is called autoregressive generation. Every answer you've ever gotten from an LLM is this loop run over and over. It's a bit like writing a sentence where each word is chosen knowing only the words before it.
Because step 2 involves a bit of controlled randomness, the same question can produce slightly different wording each time — that's a feature, not a glitch.
The plot twist: emergence
Here's what made LLMs a genuinely big deal. At small scale, a next-token predictor is just a clever autocomplete. But push the size up far enough — more parameters, more data — and abilities that weren't obvious at small scale start to show up: translating languages, writing working code, multi-step reasoning. Nobody hand-coded those in.
It learned to write working code and translate French essentially as a side effect of getting really, really good at finishing your sentences — nobody told it to. Researchers call these surprise skills emergent abilities, and they're a big part of why the field exploded.
Meet the players
You already know the names. Claude (Anthropic), GPT (OpenAI), Gemini (Google), and Llama (Meta) are all LLMs. Different companies, different training, same core machine: a giant neural net predicting the next token.
One more honest detail before we move on: the model never actually sees words. It sees tokens — the puzzle pieces text gets chopped into. Let's go meet them next.
Tokens, Embeddings & Vectors
Models don't read words — they read tokens, then turn them into coordinates for meaning.
Here's the plot twist nobody tells beginners: a language model never actually reads words, or even letters. It reads tokens — bite-sized chunks of text. Everything else in this guide is built on top of that one fact.
What's a token?
A token is a sub-word chunk: sometimes a whole word, sometimes a piece of one, sometimes just a space or a punctuation mark. As a rough rule of thumb for English, one token is about 4 characters, or roughly 0.75 of a word. So 100 words lands somewhere near 130 tokens.
Why chop words up? Because keeping a separate entry for every word ever written would be wildly inefficient. Instead, the model learns a kit of reusable pieces. Take unbelievable — it can split into un / believ / able. Those same pieces snap back together to build unthinkable, believer, readable, and thousands more. Think of tokens as the LEGO bricks of text. (Different models slice slightly differently, so the exact pieces vary — but the idea is identical.)
From token to meaning: the embedding
A model can't do math on the letters c-a-t. So each token gets converted into an embedding — a long list of numbers (a vector) that captures its meaning. Not the spelling. The meaning.
On its own, a token's embedding captures its general meaning, the dictionary-ish gist. The richer, in-context meaning (is "bank" a river or a vault?) gets layered on later by the transformer — that's the next section.
The anchor idea: GPS coordinates for meaning
Think about how GPS works. Two numbers — latitude and longitude — pinpoint any place on Earth. Grand Blanc, Michigan is just a pair of coordinates, and so is Detroit, and because they're close in real life, their numbers are close too.
An embedding does the same trick, but for meaning instead of geography. Its list of numbers pinpoints a word in an imaginary "meaning-space." Words with similar meanings land near each other, like houses in the same neighborhood: cat, kitten, and feline cluster together, while helicopter sits way across town.
Meaning even has direction
The famous demo: take the vector for king, subtract man, add woman, and you land remarkably close to queen.
king − man + woman ≈ queen
That's not a parlor trick — it hints that meaning is geometric and directional: a rough "royalty" direction and "gender" direction actually exist as movements through the space.
Why you should care: cost and the clock
Tokens aren't just an internal detail — they're the unit of money and the unit of memory. You're billed per token (in and out), and the model can only hold so many at once.
Tokens = your wallet
Providers price per token. A long document in plus a long answer out means more tokens, which means a bigger bill. Concise prompts are cheaper prompts.
Tokens = your budget
The context window is the total number of tokens the model can hold at one time — prompt, conversation, and reply combined. Fill it up and the oldest stuff falls out the back.
The strawberry problem
This is why an LLM can write you a flawless sonnet but fumble "how many R's are in strawberry." It sees chunky tokens, not individual letters — counting characters isn't its native language.
| Text | ≈ Tokens |
|---|---|
| One short word ("cat") | 1 |
| "unbelievable" | ~3 (un / believ / able) |
| A 100-word paragraph | ~130 |
| A 10-page document | ~5,000+ |
We'll come back to the context window when we talk about inference — for now, just hold the picture: it's a token budget, and you spend it as you go.
Transformers & Attention
In 2017 one paper taught machines to read every word at once — and lit the fuse on ChatGPT.
Back at our 2017 stop on the timeline, a paper with the cheeky title "Attention Is All You Need" introduced the transformer — the architecture humming under the hood of virtually every modern LLM. The title was a flex, and for once it held up: attention really was the key idea, and it kicked off the entire ChatGPT era.
The heart of it: self-attention
Here's the one idea that matters. As the model reads, every token (a word or word-piece) looks at the other tokens and decides how much each one matters to its own meaning. That's it. That's self-attention.
Picture a sentence: "The trophy didn't fit in the suitcase because it was too big." What does "it" mean — the trophy or the suitcase? You knew instantly. Self-attention is how the model figures it out too: the word "it" glances at the other words, scores which one is most relevant, and locks onto "trophy." Swap "big" for "small" and the answer flips to "suitcase" — and the model's attention flips right along with it.
Why it crushed the old way (RNNs)
Before transformers, models called RNNs (recurrent neural networks) read text the way you'd read with a finger under each word — one word at a time, left to right, trying to remember everything that came before. Slow, and forgetful: by the end of a long paragraph, the start had faded.
Attention reads the whole input at once. Two huge wins fall out of that:
Parallelizable
All tokens are processed simultaneously, not single-file. That's what makes training on internet-sized text feasible — and fast on modern hardware (GPUs love doing many things at once).
Long-range context
A word near the end can directly "talk to" a word near the start in a single step, instead of passing the message hand-to-hand. Distant connections stay sharp instead of fading.
| RNN (the old way) | Transformer / attention | |
|---|---|---|
| Reading style | One word at a time | All words at once |
| Speed | Sequential, slow to train | Parallel, fast to train |
| Distant words | Fades over distance | Connects directly |
Multi-head attention: several viewpoints at once
One round of attention is good. Transformers run several in parallel — called heads — and each head can learn to notice a different kind of relationship. Back at the dinner party: one guest tracks grammar (who's the subject, who's the verb), another tracks topic (which words are about the same thing), another tracks who-refers-to-whom (that "it" → "trophy" link). Combine all those viewpoints and you get a far richer read of the sentence than any single pass could give.
Stacked deep — the "deep" in deep learning
One block of attention isn't the whole model. These blocks are stacked many layers high, each one's output feeding into the next. Early layers tend to catch simple patterns; later layers build toward grammar, meaning, and intent — the same hierarchy-of-abstraction idea from the neural-networks section, just taller. That stacking is the "deep" in deep learning.
So now we have the architecture. But an empty transformer knows nothing — it's a brilliant student who hasn't opened a single book yet. How do you actually teach it? That's training, and it's our next stop.
Training, Fine-Tuning & RLHF
A model goes to school in three stages: read the library, take a class, get coached.
A finished AI assistant isn't built in one step — it's educated in three. Picture a model going to school: first it reads the entire library, then it takes a class on answering questions, then mentors coach it until it's genuinely useful.
The three-class education of a model
Everything in this section happens during training — the slow, expensive "studying" phase where the model's internal numbers (its weights) get tuned. Once training wraps up, those weights are locked in and the model ships out to do its job (that's inference, covered elsewhere). Here are the three classes it takes, in order.
1. Pretraining — read the library
The model reads internet-scale text and plays one game trillions of times: predict the next token (a token is a word or word-fragment). No humans label anything — the text is its own answer key: hide what comes next, guess it, check, repeat. This is where nearly all its language skill and world knowledge come from.
2. SFT — take a class
Now it studies curated (instruction, response) pairs written by people, so it learns to actually follow instructions instead of just continuing text. This mostly shapes behavior and tone — not new facts.
3. RLHF — get coached
Humans rank competing answers. A "reward model" learns those preferences, then nudges the model toward being helpful, honest, and harmless — qualities too fuzzy to write down as labeled examples.
Stage 1: Pretraining (self-supervised next-token prediction)
The model chews through a colossal pile of text and repeatedly guesses the next token. Because the right answer is simply whatever actually came next, no human grading is needed — that's why it's called self-supervised. The result is a base model: a raw text-completer that has absorbed grammar, facts, and patterns but isn't yet a polite assistant.
Stage 2: Supervised fine-tuning (SFT)
Here the base model keeps training, but now on hand-picked examples of good instruction-and-answer pairs. It's the difference between someone who has read every cookbook and someone who's taken a class on actually answering "what should I make for dinner?" SFT teaches how to respond — the helpful format, the tone, the do-what-was-asked behavior. It generally doesn't pour in new knowledge; that came from Stage 1.
Stage 3: RLHF and preference tuning
Some things are hard to capture with example answers — "be honest," "don't be smug," "refuse this politely." So instead of writing perfect answers, humans look at two model replies and pick the better one. A reward model learns to predict those human preferences, and reinforcement learning gently steers the main model toward answers the reward model scores highly. It's coaching: try, get ranked, adjust, repeat.
The cost reality
The three stages are wildly unequal in price. Pretraining is the budget-buster; the later stages are cheap by comparison but disproportionately decide how the model behaves.
| Stage | Teaches | Needs humans? | Cost |
|---|---|---|---|
| Pretraining | Language + world knowledge | No (self-supervised) | Enormous |
| SFT | Following instructions | Yes (curated pairs) | Much smaller |
| RLHF / Constitutional AI | Helpful, honest, harmless | Yes (rankings / principles) | Smaller still, but decisive |
Inference, Context & Hallucinations
Inference is the model improvising — give it a script or it makes things up.
Training built the brain. Inference is what happens when you actually use it — and understanding this one moment explains why models are so fluent, so fast, and occasionally so confidently, spectacularly wrong.
Inference: using the frozen brain
Remember training — millions of examples nudging billions of dials until the model got good? Inference is the exam, not the studying. The dials are now locked. Every time you chat with a model, you're running that finished, frozen brain forward to generate text. The weights don't change, the skills are fixed in place. It doesn't learn from your conversation, and it won't remember you tomorrow.
The context window: its tiny working memory
The model has no memory of your last chat and no private notebook. All it can "see" is what fits in its context window right now — think of it as the model's short-term working memory, or the desk it's allowed to spread papers on.
Here's the catch beginners always miss: that desk holds everything at once — your prompt, the entire back-and-forth history, the system instructions, and the answer it's currently writing. Anything that doesn't fit on the desk is simply gone. Not "kind of forgotten" — invisible. If a detail scrolled off the top, the model isn't ignoring it; from where it sits, that detail never existed.
Temperature: the creativity dial
When the model generates, it's choosing the next word from a list of likely candidates. Temperature (and related sampling settings) decides how adventurous that choice is.
Low temperature
Focused, predictable, repeatable. It picks the safest, most likely word almost every time. Great for code, data extraction, and factual answers.
High temperature
Loose, surprising, creative. It gambles on less-likely words. Great for brainstorming, poetry, and "give me ten wild ideas."
Why models hallucinate
A hallucination is when the model states something false with total confidence — a fake citation, a made-up date, a plausible-sounding "fact" that isn't. Beginners assume this is a bug. Mostly, it's the core mechanism working exactly as designed.
The model doesn't look up facts in a database. It predicts plausible text, one word at a time, based on patterns from training. When it has seen the answer often, the most plausible continuation happens to be correct. When it hasn't, the most plausible continuation is a confident-sounding guess — because in its training data, confident sentences almost never trail off with "...but honestly I'm not sure."
The fix is shockingly effective
Here's the rule of thumb worth memorizing. When a model answers open-ended questions purely from memory, it can be wrong a meaningful chunk of the time — and you often can't tell which chunk. Hand it a source document to work from — and ask it to answer using that document — and accuracy improves dramatically, because now it's reading instead of guessing.
The takeaway every pro repeats: models are best as editors and synthesizers, not as knowledge sources. Don't ask "what does the contract say?" — paste the contract in and ask "according to this, what does it say?"
Your practical mitigation toolkit
| Move | Why it helps |
|---|---|
| Give it context / sources (RAG) | RAG = retrieval-augmented generation: fetch relevant text and hand it to the model. The script instead of improv — the single biggest accuracy win. |
| Ask for citations | Forces it to point at where a claim came from, so you can check it (and still verify — citations can be faked too). |
| Lower the temperature | For factual work, fewer creative gambles means fewer invented flourishes. |
| Verify anything high-stakes | Medical, legal, financial, or money-moving answers always get a human check. |
The knowledge cutoff
Every model was trained up to a certain date and then frozen. On its own, it knows nothing that happened after its knowledge cutoff — last week's news, today's prices, a result from this morning. Ask about recent events and it'll either admit the gap or, worse, confidently improvise. The fix is the same one as above: don't rely on stale memory, drop the fresh information straight into the context window.
Agents, Tools & the AI Loop
An agent is an LLM in a loop with hands: it acts, checks its work, and tries again.
A chatbot answers and stops. An agent reads a goal, takes a real action, looks at what happened, and keeps going until the job is done. That loop is the whole difference.
The leap: from chatting to doing
So far we've treated an LLM as something you talk to: prompt in, text out, done. An agent takes that same model and drops it into a loop where it can take actions in the world — search the web, run code, edit a file — and then react to the results. One single model call is not an agent. The loop is what earns the name.
The core loop, one trip around
Every agent runs the same four-beat cycle, over and over, until the goal is met:
- Perceive — read the goal and the current state ("the tests are failing; here's the error").
- Think — plan the next single step ("I should open the file the error points to").
- Act — call a tool or run code (open the file, run the test, search the docs).
- Observe — read the result, then loop back to step 1 with that new information.
The anchor: a chef tasting the sauce
Picture a cook who never tastes anything — they just follow a fixed recipe and pray. Now picture a real chef: taste the sauce (observe), decide it needs salt (think), add a pinch (act), taste again (loop). That feedback cycle — not one frozen recipe — is exactly what makes an agent an agent.
What's a "tool," really?
A tool is just a function the model is allowed to call. Each one comes with three things: a name, a plain-English description of what it does, and a typed input schema (what arguments it expects). The model reads those descriptions and picks the right tool for the moment. Without tools, an LLM can only produce text — it can say "I'll search for that," but it can't actually do it.
Web search
Fetch fresh, real information the model never memorized.
Run code
Execute a script and read the actual output — math it can verify, not guess.
Read / write files
Open a file, edit it, save it. This is how Claude Code touches your project.
Why the loop is the entire point
Feeding each result back into the next step lets the model check its own work and self-correct. Wrote a function? Run the test. Test failed? Read the error, fix the line, run it again. That tight perceive-act-observe rhythm is exactly how Claude Code operates — it doesn't just suggest a change, it makes the change, runs your tests, and reacts to what breaks.
Memory and a leash
Two practical pieces keep the loop sane. First, working memory: every tool result gets added to the conversation, so the model carries what it learned into the next step. Second, a budget — a maximum number of iterations — so it can't spin forever chasing a goal it can't reach.
Where loops go wrong
| Failure mode | What it looks like | The fix |
|---|---|---|
| Infinite loop | Tries the same thing forever | A max-iteration budget and clear stop conditions |
| Wrong tool | Picks the search tool when it needed code | Clear tool names and descriptions |
| Hallucinated result | "Pretends" a tool ran and invents output | Human-in-the-loop checkpoints on risky steps |
AI Safety, Ethics & Limits
AI is a brilliant intern, not an oracle — know the limits and you become a power user.
Here's the good news and the catch in one breath: AI is genuinely useful and genuinely limited. Knowing exactly where it falls short is what turns you into a power user instead of a worrier.
Picture a brilliant, fast new intern. Incredibly capable, worth handing real work to — but you still review the important stuff, you don't give them the company passwords, and you don't let them sign the contract alone. That's the exact posture to take with AI. Trust, but verify.
The real limits (none of these are dealbreakers — just things to know)
It hallucinates
An LLM (large language model — the kind of AI behind chatbots) predicts plausible text, not true text. With no fact to recall, it'll invent one that sounds right. It can cite a study that doesn't exist with a completely straight face — treat it like that friend who's amazing at parties but shouldn't be your only source.
It's biased
The model reflects its training data — the huge pile of text it learned from. If that text leans a certain way on a topic, the answer might too. Bias isn't a glitch bolted on; it's baked in by what it read.
No real understanding
There's no consciousness, no feelings, no "knowing" behind the words. It's a stunningly good pattern machine, not a tiny mind in a box.
Knowledge cutoff
It only learned up to a certain date, so it has no built-in memory of recent events. Ask about last week's news and it may go stale, guess, or need a live search tool to catch up.
The ethics surface, briefly
| Concern | What it means in plain terms |
|---|---|
| Bias & fairness | Outputs can quietly favor or harm some groups — a real risk in hiring, lending, or medical contexts. |
| Privacy | Depending on the tool and its settings, what you type may be processed, logged, or used to improve models. Personal and confidential data deserve caution. |
| Copyright & provenance | Models trained on others' work raise fair questions about credit, ownership, and consent. ("Provenance" just means where the data came from.) |
| Jobs | Some tasks get automated; many roles shift rather than vanish. Disruption is real and uneven. |
| Misinformation & deepfakes | The same tech that drafts your email can mass-produce convincing fakes. Healthy skepticism scales. |
Alignment: making AI do what we actually want
Alignment is the engineering effort to make AI behave the way we genuinely intend — not just the literal words we typed. The shorthand is HHH: Helpful, Honest, and Harmless. Sounds simple. The hard part is all three at once — a model that refuses everything is harmless but useless; one that answers anything is helpful but unsafe. Balancing them is the open challenge.
Your practical user rulebook
- Keep a human in the loop. AI drafts and suggests; you decide and approve. You're the editor, not the audience.
- Verify high-stakes output. Medical, legal, financial, or anything you'd put your name on? Double-check it against a real source.
- Never paste secrets. No passwords, API keys, client data, or anything you wouldn't hand to a stranger.
- Cite and check sources. Ask for references — then actually confirm they exist and say what the AI claims.
The calm takeaway
None of this is doom. These are solvable, ongoing engineering problems, and a lot of smart people are actively working on every one of them. The right stance is caution, not fear — the same everyday judgment you'd use with any powerful new tool.
AI Jargon Buster
A pocket phrasebook for every AI term you met on the tour — point and remember.
You just toured the whole country of AI. This is the pocket card of key phrases to point at when one slips your mind — no memorizing required, just a quick "oh right, that's what that means." Each definition deliberately echoes the analogy from its home section, so it should feel like running into an old friend.
How to use this page
Read top to bottom and the terms build on each other; or just Ctrl+F for the one word that's bugging you. Definitions are one plain line each — zero jargon hiding inside the jargon.
Foundations
AI AGI ML Deep Learning Neural Network Parameter / Weight
Language plumbing
Token Embedding Vector Context Window
Architecture
Transformer Attention LLM
Training
Pretraining Fine-tuning RLHF
Using it
Inference Temperature Hallucination Prompt RAG
Frontier
Agent Multimodal Open vs Closed weights
The glossary
| Term | One-line plain definition |
|---|---|
| AI | Any machine doing something that used to need a human brain. |
| AGI | The hypothetical "can do anything a person can" AI — not here yet. |
| ML | Machine Learning: software that learns patterns from examples instead of being hand-coded. |
| Deep Learning | ML using many stacked layers — the assembly line that powers today's AI. |
| Neural Network | A web of tiny calculators (loosely brain-inspired) that turns inputs into answers. |
| Parameter / Weight | The faders on a mixing board — internal numbers the model tunes while learning (weights are the main kind). |
| Token | The LEGO bricks of text — words and word-chunks the model reads in pieces. |
| Embedding | GPS coordinates for meaning — numbers that place a word on a map where similar ideas sit close. |
| Vector | Just the list of numbers itself — the actual coordinates an embedding lives at. |
| Context Window | The model's short-term memory — how much it can hold in mind at once. |
| Transformer | The engine design behind modern AI — it reads all the words at once instead of one by one. |
| Attention | Everyone in the room sharing notes — each word decides which others to listen to. |
| LLM | Large Language Model: phone autocomplete scaled up a trillion-fold. |
| Pretraining | Reading a huge pile of text to learn how language works — the raw study phase. |
| Fine-tuning | Extra coaching on a focused subject to specialize an already-trained model. |
| RLHF | Teaching the model manners with human thumbs-up / thumbs-down so it answers helpfully. |
| Inference | Taking the exam — actually using the trained model to get an answer. |
| Temperature | The creativity dial — low for careful and predictable, high for loose and surprising. |
| Hallucination | Confidently making something up — plausible-sounding text with no truth check. |
| Prompt | What you type in — your instructions and question to the model. |
| RAG | Letting the model peek at your documents first, so it answers from facts, not just memory. |
| Agent | An LLM given hands — tools like search, code, and memory so it can act, not just talk. |
| Multimodal | A model that also handles images, audio, or video — not text alone. |
| Open vs Closed weights | Open = you can download and run the model yourself; closed = you rent it through someone's API. |
If you only remember three
Token = text in pieces. Embedding = meaning as coordinates. Inference = using the trained model. Almost everything else hangs off these.
Why echoes, not new words
Each line reuses the picture from its home section on purpose — recognition beats re-learning every single time.
Recognize, don't recite
You don't need to define these cold. Spotting them in the wild and nodding "ah, I know that one" is the whole goal.