Prev Next

AI / Claude Models Basics Interview Questions

1. What is Claude and who makes it? 2. What are the current Claude model families and what is each one optimised for? 3. What are the API model IDs for the current Claude models? 4. What is a context window and what are the context window sizes for current Claude models? 5. What are the pricing tiers for current Claude models and how is pricing calculated? 6. What input and output modalities do current Claude models support? 7. What is extended thinking and how does it differ from adaptive thinking in Claude? 8. What platforms and cloud providers is Claude available on? 9. What is the knowledge cutoff for current Claude models? 10. What is the Claude model lifecycle — what do 'Active', 'Legacy', and 'Deprecated' mean? 11. What is Claude Fable 5 and what makes it different from Claude Opus 4.8? 12. What is Claude Mythos 5 and how does it differ from Claude Fable 5? 13. What is Claude Haiku 4.5 and what are its key characteristics? 14. What is prompt caching and how does it reduce costs when using Claude? 15. What is the Messages Batches API and when should you use it? 16. What is tool use (function calling) in Claude and which models support it? 17. What is computer use in Claude and which models support it? 18. What are the different claude.ai plans and what does each include? 19. What is the effort parameter in Claude and which models support it? 20. What is streaming in Claude API responses and how do you use it? 21. What is the system prompt in Claude and how does it affect model behaviour? 22. What is zero data retention (ZDR) and which Claude models support it? 23. What is Claude's approach to safety and what are Constitutional AI principles? 24. What is the difference between an operator and a user in Claude's design? 25. What is Claude's context window and how are tokens counted? 26. What are Claude's rate limits and how are they structured? 27. What is Claude's approach to harmful content — what will and won't it do? 28. What is Claude's max_tokens parameter and how does it relate to the context window? 29. What is the temperature parameter in Claude and how does it affect responses? 30. What are Claude's multimodal capabilities — how does it process images and documents? 31. What are the claude.ai plans and what models does each tier include access to? 32. What is multi-turn conversation handling in Claude and how do you implement it? 33. What are the different stop_reason values in Claude API responses? 34. What is Claude's approach to honesty and what does it mean for Claude to be non-deceptive? 35. What is Claude Code and how does it differ from using Claude directly via the API? 36. What are the Anthropic SDKs and what languages are officially supported? 37. What is Anthropic's policy on model deprecation and how should developers prepare? 38. What are the key differences between Claude 4 and earlier Claude 3 generation models?
Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is Claude and who makes it?

Claude is a family of state-of-the-art large language models (LLMs) built by Anthropic, an AI safety company founded in 2021. Claude excels at language, reasoning, analysis, coding, mathematics, and creative writing, and is designed with a strong focus on being helpful, harmless, and honest.

Anthropic makes Claude available in two main ways:

  • claude.ai — a consumer and business chat interface (Free, Pro, Team, Enterprise, and Max plans)
  • Claude API — a developer API for building applications, also available through Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry

Claude is trained using a technique called Constitutional AI (CAI), which guides the model's values and behaviour using a set of principles rather than relying purely on human feedback for every response. This approach is central to Anthropic's mission of building AI that is safe and beneficial.

Who develops and maintains the Claude family of models?
What does Constitutional AI (CAI) refer to in the context of Claude?

2. What are the current Claude model families and what is each one optimised for?

Claude models are organised into three tiers — Opus, Sonnet, and Haiku — representing a capability-speed-cost spectrum. As of mid-2026 the current flagship generation is Claude 4/5, alongside the newly released Claude Fable 5.

Current Claude model tiers
TierOptimised forExample model
OpusMaximum capability — complex reasoning, coding, agentic tasks, enterprise workClaude Opus 4.8
SonnetBest balance of intelligence and speed — coding, agents, enterprise workflowsClaude Sonnet 5
HaikuFastest responses with near-frontier intelligence — high-throughput, latency-sensitive tasksClaude Haiku 4.5

Additionally, Claude Fable 5 sits above the Opus tier as Anthropic's most capable widely-released model, described as providing next-generation intelligence for long-running agents.

Claude Mythos 5 shares Fable 5's specifications but is offered only through the invitation-only Project Glasswing programme for defensive cybersecurity workflows.

Which Claude model tier is designed for the fastest responses at the lowest cost?
Which Claude model is described as Anthropic's most capable widely-released model as of mid-2026?

3. What are the API model IDs for the current Claude models?

When making API calls you must specify the exact model ID string. Model IDs starting with the Claude 4.6 generation use a dateless format that is still a pinned snapshot (not an evergreen pointer). Earlier models used a dated format like claude-haiku-4-5-20251001.

Current model IDs (Claude API)
ModelAPI IDAlias
Claude Fable 5claude-fable-5claude-fable-5
Claude Opus 4.8claude-opus-4-8claude-opus-4-8
Claude Sonnet 5claude-sonnet-5claude-sonnet-5
Claude Haiku 4.5claude-haiku-4-5-20251001claude-haiku-4-5
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-8",   # exact API model ID
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what a context window is."}
    ]
)
print(message.content[0].text)

The Bedrock IDs follow a different pattern (e.g. anthropic.claude-opus-4-83) and should be used when accessing Claude through Amazon Bedrock. Google Cloud IDs match the Claude API IDs but may append a regional variant.

What is the Claude API ID for Claude Opus 4.8?
What is the Claude API alias for Claude Haiku 4.5?

4. What is a context window and what are the context window sizes for current Claude models?

A context window is the total number of tokens (words, punctuation, code, etc.) that a model can process in a single request — encompassing the system prompt, all conversation history, tool definitions, and the model's own output. If you exceed the context window, older content must be removed or the request will fail.

Context windows by model
ModelContext windowMax output tokens
Claude Fable 51 million tokens128,000 tokens
Claude Opus 4.81 million tokens128,000 tokens
Claude Sonnet 51 million tokens128,000 tokens
Claude Haiku 4.5200,000 tokens64,000 tokens

Key facts:

  • The 1M token context window is the default for Fable 5, Opus 4.8, and Sonnet 5 — no beta header is needed and it is billed at standard pricing
  • Claude Haiku 4.5 has a smaller 200k token window and 64k max output
  • On the Message Batches API, Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 5, and Sonnet 4.6 support up to 300k output tokens using the output-300k-2026-03-24 beta header
  • A single request can include up to 600 images/PDF pages (100 for 200k models)
What is the context window size for Claude Opus 4.8?
What is the maximum output token limit for Claude Haiku 4.5?

5. What are the pricing tiers for current Claude models and how is pricing calculated?

Claude API pricing is charged per million tokens (MTok) — counting both input tokens (your prompt, system prompt, conversation history) and output tokens (Claude's response). Prices differ by model and reflect the capability-cost trade-off.

Current Claude API pricing (per million tokens)
ModelInput (per MTok)Output (per MTok)
Claude Fable 5$10$50
Claude Opus 4.8$5$25
Claude Sonnet 5$3 (intro: $2 until Aug 31 2026)$15 (intro: $10)
Claude Haiku 4.5$1$5

Cost-saving features:

  • Prompt caching — reuse of cached prompt prefixes is billed at a significant discount (cache writes are more expensive; cache reads are cheaper than standard input)
  • Message Batches API — async batch processing at roughly 50% of standard pricing, ideal for large-scale, non-real-time workloads

Cloud platform pricing (Amazon Bedrock, Google Cloud) may differ from direct API pricing. See the Pricing page for full details including per-region variations.

Which Claude model costs $1 per million input tokens?
What is the main benefit of using the Claude Message Batches API?

6. What input and output modalities do current Claude models support?

All current Claude models share a common set of supported modalities for input and output, with no difference between Opus, Sonnet, and Haiku tiers on core capabilities.

Modality support across current models
CapabilitySupported?Notes
Text inputYesAll models
Image input (vision)YesAll models — up to 600 images per request (100 for 200k models)
PDF inputYesTreated similarly to images for token budgeting
Text outputYesAll models
MultilingualYesStrong performance across major languages
Tool use / function callingYesAll models
Extended thinkingHaiku 4.5 onlyExplicit thinking steps visible in output
Adaptive thinkingOpus and Sonnet (not Haiku 4.5)Always-on for Fable 5
Audio inputNoNot currently supported
Video inputNoUse frame extraction for video analysis

Vision notes: Claude models can analyse images, charts, screenshots, UI elements, and document scans. For video analysis, the recommended approach is to extract frames and send them as a series of images. Claude Opus 4.5 and 4.6 showed improved vision capabilities — especially for multi-image tasks and computer use.

Which input modality is NOT currently supported by any Claude model?
Which Claude model uniquely supports Extended Thinking (explicit step-by-step reasoning visible in output)?

7. What is extended thinking and how does it differ from adaptive thinking in Claude?

Both features enable Claude to reason more carefully before answering, but they work differently and are available on different models.

Extended thinking vs Adaptive thinking
FeatureExtended ThinkingAdaptive Thinking
What it doesClaude produces explicit blocks showing its reasoning steps, visible in the API responseClaude internally allocates more reasoning compute when a task requires it — no visible thinking blocks
AvailabilityClaude Haiku 4.5 onlyClaude Opus 4.8, Sonnet 5, and Claude Fable 5 (always-on for Fable 5)
User controlOpt-in — you enable it with a parameterAutomatic on supported models; Fable 5 always uses it
Use caseWhen you want to see and verify the model's reasoning chainGeneral accuracy improvement, especially for complex tasks
Output impactAdds thinking tokens to the response (billed separately)No additional visible output
# Enabling extended thinking on Claude Haiku 4.5
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # max tokens for thinking
    },
    messages=[{"role": "user", "content": "Solve: if 3x + 7 = 22, what is x?"}]
)
# Response includes a "thinking" content block followed by the answer

Interleaved thinking (thinking between tool calls) is automatic on models with adaptive thinking and requires the interleaved-thinking-2025-05-14 beta header on earlier models like Opus 4.5 and Sonnet 4.5.

On which Claude model is Extended Thinking (visible reasoning blocks) available?
How does Adaptive Thinking differ from Extended Thinking in Claude?

8. What platforms and cloud providers is Claude available on?

Claude is available through multiple channels, each with its own model IDs, endpoint behaviour, and pricing structure.

Claude availability by platform
PlatformBest forModel ID format
Claude API (direct)Developers building directly with Anthropicclaude-opus-4-8, claude-sonnet-5, etc.
claude.aiConsumer and business chat (Free, Pro, Team, Enterprise, Max)Model selected in UI
Amazon BedrockAWS-native teams — data stays in AWS regionanthropic.claude-opus-4-83
Claude Platform on AWSSame API shape as Claude API but on AWS infrastructureclaude-opus-4-8 (same as direct API)
Google Cloud Vertex AIGCP-native teamsclaude-opus-4-8 (same IDs, regional variants)
Microsoft FoundryAzure-native teamsclaude-sonnet-5, claude-fable-5, etc.

Important distinctions:

  • Amazon Bedrock and Claude Platform on AWS are different products with different model ID formats and lifecycle policies
  • Claude Platform on AWS uses the same model IDs as the direct Claude API and follows Anthropic's own deprecation schedule
  • Amazon Bedrock uses its own model ID format and sets its own retirement schedules
  • Starting with Sonnet 4.5, Bedrock offers global endpoints (dynamic routing) and regional endpoints (data sovereignty)
Which platform uses the same model IDs as the direct Claude API and follows Anthropic's own deprecation schedule?
What is the Amazon Bedrock model ID format for Claude models?

9. What is the knowledge cutoff for current Claude models?

Claude models are trained on data up to a specific date (the training data cutoff) and have the most reliable knowledge through a slightly earlier date (the reliable knowledge cutoff). Claude does not have access to real-time internet data during a conversation unless given a search tool.

Knowledge cutoffs by model
ModelReliable knowledge cutoffTraining data cutoff
Claude Fable 5January 2026January 2026
Claude Opus 4.8January 2026January 2026
Claude Sonnet 5January 2026January 2026
Claude Haiku 4.5February 2025July 2025

Reliable knowledge cutoff indicates the date through which a model's knowledge is most extensive and reliable. Training data cutoff is the broader date range — the model may have some knowledge of events after the reliable cutoff but it is less complete and should be treated with more caution.

For tasks requiring current information (news, prices, live data), give Claude access to a web search tool or provide the relevant information directly in the prompt.

What is the reliable knowledge cutoff for Claude Opus 4.8?
Which current Claude model has the earliest reliable knowledge cutoff?

10. What is the Claude model lifecycle — what do 'Active', 'Legacy', and 'Deprecated' mean?

Anthropic uses a defined set of lifecycle statuses for Claude models. Understanding these helps teams plan migration timelines and avoid unexpected outages.

Model lifecycle statuses
StatusMeaningAction required?
ActiveFully supported and recommended for new developmentNo — this is the ideal state
LegacyNo longer receiving updates; may be deprecated in the futureStart planning migration
DeprecatedStill functional but a retirement date has been set; a replacement is recommendedMigrate before the retirement date
RetiredAPI calls to this model return an errorMust have already migrated

Key policy points:

  • Anthropic provides at least 60 days' notice before retiring any publicly released model
  • Customers with active deployments receive email notifications when a model they use is scheduled for retirement
  • Retirement dates on Anthropic-operated platforms (Claude API, Claude Platform on AWS, Microsoft Foundry) may differ from partner platforms (Amazon Bedrock, Google Cloud)
  • Anthropic has committed to long-term preservation of model weights even after retirement
What does 'Deprecated' mean for a Claude model?
How much advance notice does Anthropic provide before retiring a publicly released Claude model?

11. What is Claude Fable 5 and what makes it different from Claude Opus 4.8?

Claude Fable 5 (claude-fable-5) is Anthropic's most capable widely-released model as of mid-2026, positioned above the Opus tier. It is designed for long-running agents, frontier intelligence tasks, and complex enterprise work.

Claude Fable 5 vs Claude Opus 4.8
FeatureClaude Fable 5Claude Opus 4.8
API IDclaude-fable-5claude-opus-4-8
Context window1 million tokens1 million tokens
Max output tokens128,000128,000
ThinkingAlways-on adaptive thinkingAdaptive thinking (configurable)
Pricing (input)$10 / MTok$5 / MTok
Pricing (output)$50 / MTok$25 / MTok
Data retention30-day minimum (no ZDR)Available under ZDR
AvailabilityGA on Claude API, Bedrock, Vertex, FoundryGA on all platforms

Key differences to be aware of:

  • Fable 5 is priced at 2× Opus 4.8 for both input and output tokens
  • Fable 5 requires a minimum 30-day data retention period and is not available under zero data retention (ZDR) arrangements — organisations with ZDR requirements should use Opus 4.8
  • Fable 5 uses always-on adaptive thinking, meaning it automatically applies extended reasoning to every request
  • Migration from Opus 4.8 to Fable 5 is described as mostly drop-in since both use the same Messages API and tool use patterns
What data retention requirement makes Claude Fable 5 unsuitable for organisations with zero data retention (ZDR) agreements?
Which thinking mode does Claude Fable 5 use?

12. What is Claude Mythos 5 and how does it differ from Claude Fable 5?

Claude Mythos 5 (claude-mythos-5) is a variant of Claude Fable 5 that is offered separately for defensive cybersecurity workflows as part of Anthropic's Project Glasswing. It is not publicly available — access is invitation-only with no self-serve sign-up.

Claude Fable 5 vs Claude Mythos 5
FeatureClaude Fable 5Claude Mythos 5
API IDclaude-fable-5claude-mythos-5
AvailabilityGenerally available (GA)Invitation-only via Project Glasswing
Safety classifiersYes — standard safety classifiersWithout standard safety classifiers
Intended useGeneral purpose, agents, enterpriseDefensive cybersecurity workflows
Context window1 million tokens1 million tokens
Max output128,000 tokens128,000 tokens
Pricing$10 / $50 per MTokContact Anthropic

The key architectural difference is that Mythos 5 operates without the standard safety classifiers that Fable 5 uses. This makes it suitable for certain security research and offensive-capability testing in a controlled, vetted environment — but unsuitable and inaccessible for general use. Anthropic controls access tightly through Project Glasswing.

What makes Claude Mythos 5 different from Claude Fable 5 in terms of safety?
How does a team gain access to Claude Mythos 5?

13. What is Claude Haiku 4.5 and what are its key characteristics?

Claude Haiku 4.5 (claude-haiku-4-5-20251001, alias claude-haiku-4-5) is Anthropic's fastest and most cost-efficient model in the current generation. It is described as achieving near-frontier performance on coding, computer use, and agent tasks while being optimised for speed and low latency.

Claude Haiku 4.5 key characteristics
PropertyValue
API IDclaude-haiku-4-5-20251001 (alias: claude-haiku-4-5)
Context window200,000 tokens
Max output tokens64,000 tokens
Pricing (input)$1 per million tokens
Pricing (output)$5 per million tokens
ThinkingExtended thinking (opt-in, explicit reasoning blocks)
Context awarenessYes — tracks its token budget throughout a conversation
Best forHigh-throughput, latency-sensitive tasks; customer service; simple classification

Extended thinking on Haiku 4.5: unlike the Opus and Sonnet tier which use adaptive thinking, Haiku 4.5 supports the explicit extended thinking mode where reasoning steps appear as visible <thinking> blocks in the API response. This is opt-in via the thinking parameter.

Haiku 4.5 was announced as matching Sonnet 4's performance on coding, computer use, and agent tasks while costing significantly less per token.

What is the context window size for Claude Haiku 4.5?
What type of thinking does Claude Haiku 4.5 support?

14. What is prompt caching and how does it reduce costs when using Claude?

Prompt caching allows Anthropic to store a copy of a prompt prefix (such as a long system prompt, documentation, or conversation history) so that subsequent requests reusing that prefix are billed at a much lower rate than re-sending it fresh each time.

Prompt caching pricing structure
Token typeCost vs standard input
Cache write~25% more expensive (one-time cost to store the prefix)
Cache read (hit)~90% cheaper than standard input tokens
Standard input (miss)Full input price
# Example: using prompt caching with a long system prompt
message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert assistant...[5000 token system prompt]...",
            "cache_control": {"type": "ephemeral"}  # mark this prefix for caching
        }
    ],
    messages=[{"role": "user", "content": "What is the main point of section 3?"}]
)

When to use prompt caching:

  • Long system prompts reused across many requests
  • Large reference documents (codebases, manuals, books) that are constant across a session
  • Long conversation histories in multi-turn applications
  • Few-shot example sets provided in every request

Cache entries expire after a period of inactivity (5 minutes by default; a 1-hour TTL beta is available). The cache is per-organisation, not per-user.

Approximately how much cheaper is a prompt cache read hit compared to standard input token pricing?
What is the 'cache_control' parameter used for in Claude API calls?

15. What is the Messages Batches API and when should you use it?

The Message Batches API allows you to submit a large number of Claude API requests asynchronously in a single batch, receiving results once all requests are processed. It is designed for large-scale, non-time-sensitive workloads.

Messages Batches API vs standard API
FeatureStandard Messages APIMessage Batches API
ExecutionSynchronous — response returned immediatelyAsynchronous — submit batch, poll for results
PricingStandard per-token pricing~50% of standard pricing
Max output tokensStandard limitsUp to 300k with output-300k beta header (selected models)
LatencyReal-time (<1 min typical)Hours — not suitable for real-time apps
Max requests per batchN/A10,000 requests per batch
Use caseInteractive apps, chatbots, real-time toolsData processing, evals, bulk content generation
# Submitting a batch of requests
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-opus-4-8",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Summarise: ..."}]
            }
        },
        # ... up to 10,000 requests
    ]
)

# Poll for results
results = client.messages.batches.results(batch.id)

Models supported for 300k batch output: Claude Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 5, and Sonnet 4.6 support up to 300k output tokens per request in a batch when the output-300k-2026-03-24 beta header is included.

What is the approximate cost saving when using the Message Batches API compared to the standard Messages API?
What is the maximum number of requests that can be submitted in a single Message Batches API call?

16. What is tool use (function calling) in Claude and which models support it?

Tool use (also called function calling) allows Claude to request the execution of external functions and incorporate their results into its responses. You define a set of tools with names, descriptions, and input schemas; Claude decides when to call them and how to structure the arguments.

# Defining a tool for Claude to use
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's the weather in London?"}]
)
# Claude responds with a tool_use block specifying the function and args
# You execute the function and return results in a tool_result block

All current Claude models support tool use. Key capabilities include:

  • Parallel tool use — Claude can call multiple tools simultaneously in one turn
  • Multi-step tool use — Claude reasons across multiple tool call/result cycles
  • Computer use — special tools (bash, text editor, computer) for Claude to interact with systems
  • Fine-grained tool streaming — GA on Sonnet 4.6 and later (no beta header needed)
What does Claude return in its response when it decides to call a tool?
Which capability allows Claude to call multiple tools simultaneously in a single turn?

17. What is computer use in Claude and which models support it?

Computer use is a set of built-in tools that allow Claude to interact directly with computers — taking screenshots, moving the mouse, clicking, typing, and running bash commands. It is designed for agentic automation tasks where Claude operates a full computer desktop or terminal environment.

Computer use tools
ToolWhat Claude can do
computerTake screenshots; move/click/drag mouse; type text
text_editorView and edit files with string replace; undo edits
bashExecute shell commands in a persistent bash session
# Example: computer use API call (simplified)
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    tools=[
        {"type": "computer_20250728", "name": "computer", "display_width_px": 1024, "display_height_px": 768},
        {"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"},
        {"type": "bash_20250728", "name": "bash"}
    ],
    messages=[{"role": "user", "content": "Open the browser and search for Anthropic."}]
)

Model support for computer use: all current Claude models support computer use. The tool versions have been updated — use computer_20250728, text_editor_20250728, and bash_20250728 for Claude Opus 4.7 and later. Earlier tool versions remain supported for older models.

Computer use is in beta — Anthropic recommends using it in sandboxed environments with human oversight, as it can execute arbitrary commands on a system.

Which three built-in tool types make up Claude's computer use capability?
Why does Anthropic recommend using computer use in sandboxed environments?

18. What are the different claude.ai plans and what does each include?

claude.ai offers multiple subscription tiers designed for individuals, teams, and enterprises. Each tier provides different levels of usage, features, and access to Claude models.

claude.ai plans overview
PlanWho it's forKey features
FreeIndividual — casual useAccess to Claude; usage limits; no credit card required
ProPower users — daily use5× more usage than Free; access to more powerful models including Opus; Projects; priority access
TeamSmall/medium teamsEverything in Pro; admin controls; higher usage limits; billing management; expanded context
EnterpriseLarge organisationsUnlimited seats; SSO; advanced security; admin analytics; priority support; custom retention
MaxHighest usage needsMaximum usage limits; access to all models including the latest; for power users who need more than Pro

Model access by plan: Free plan users typically access Haiku or Sonnet models. Pro and higher plans provide access to Opus-tier models and the latest releases. The Max plan provides the broadest model access and highest usage limits.

Enterprise plans are now available for self-serve purchase directly on the Anthropic website — no sales conversation required for standard configurations.

Which claude.ai plan is designed for the highest usage needs with access to all models including the latest?
As of mid-2026, how can organisations purchase an Enterprise claude.ai plan?

19. What is the effort parameter in Claude and which models support it?

The effort parameter allows you to trade intelligence for latency and cost within a single model — rather than switching to a different model. It is available on recent Opus and Sonnet models.

Effort parameter levels
LevelBehaviourUse case
lowFastest, least compute — lighter reasoningSimple tasks, classification, short responses
mediumBalanced computeGeneral purpose tasks
high (default on Opus 4.8)Strong reasoning — default on Opus 4.8Most coding, analysis, complex tasks
xhighMaximum reasoning — highest latency and costHardest coding problems, high-autonomy agentic work
# Using the effort parameter
message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    effort="xhigh",   # use max reasoning for this hard task
    messages=[{"role": "user", "content": "Solve this complex algorithmic problem..."}]
)

# For simpler tasks, use lower effort to save time and cost
message_fast = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=256,
    effort="low",
    messages=[{"role": "user", "content": "What is 12 + 7?"}]
)

Model support: the effort parameter is available on Claude Opus 4.8 and Claude Opus 4.7. The documentation recommends tuning effort as a first lever before switching models. The xhigh effort level on Opus 4.8 is described as the best setting for coding and high-autonomy agentic tasks.

Note: fast mode (a related but distinct feature) on Claude Opus 4.7 is deprecated with removal scheduled for July 24, 2026.

What is the main purpose of the effort parameter in Claude?
Which effort level is recommended for complex coding and high-autonomy agentic tasks on Claude Opus 4.8?

20. What is streaming in Claude API responses and how do you use it?

Streaming allows you to receive Claude's response token by token as it is generated, rather than waiting for the complete response. This dramatically reduces the time to first token and creates a more responsive user experience for chat applications.

# Streaming with the Python SDK
with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short story."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)  # print each token as it arrives

# Or using the raw SSE event stream
with client.messages.stream(...) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            print(event.delta.text, end="")
Streaming event types
EventWhen it fires
message_startOnce at the beginning — includes usage metadata
content_block_startWhen a new content block (text, tool_use) begins
content_block_deltaFor each token chunk — contains the text delta
content_block_stopWhen a content block finishes
message_deltaWhen stop_reason or usage is updated
message_stopOnce when the response is fully complete

Streaming is supported on all current Claude models. Fine-grained tool streaming (streaming tool call arguments as they are generated) is generally available on Sonnet 4.6 and later models with no beta header required.

What is the primary user experience benefit of streaming Claude API responses?
Which SDK method would you use to stream a Claude response token by token in Python?

21. What is the system prompt in Claude and how does it affect model behaviour?

The system prompt is an optional instruction block passed at the start of a conversation that sets Claude's persona, context, constraints, and behavioural guidelines before the first user message. It is processed before any human turn and shapes how Claude responds throughout the conversation.

# System prompt in the Messages API
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system="You are a helpful customer service agent for Acme Corp. \
            Always be polite and concise. \
            Only answer questions about Acme products. \
            If a question is off-topic, politely redirect the user.",
    messages=[
        {"role": "user", "content": "What are your return policies?"}
    ]
)

# System prompt can also be a list of content blocks
# (required when using prompt caching or structured content)
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant...[long context]...",
            "cache_control": {"type": "ephemeral"}  # cache the system prompt
        }
    ],
    messages=[{"role": "user", "content": "Help me with X."}]
)

Key facts about system prompts:

  • The system prompt is not part of the messages array — it is a separate top-level parameter
  • It counts against the context window token limit just like message content
  • For long system prompts used repeatedly, prompt caching provides significant cost savings
  • Operators (API users) can set system prompts; users (end-users in a product) interact via the human turn
  • Claude's core safety behaviours cannot be overridden via the system prompt
Where is the system prompt passed in the Claude Messages API?
Does the system prompt count against the context window token limit?

22. What is zero data retention (ZDR) and which Claude models support it?

Zero data retention (ZDR) is a data handling agreement where Anthropic does not store API inputs or outputs after a response is returned. This is important for organisations with strict data privacy requirements (healthcare, legal, finance) where conversation data must not persist on Anthropic's servers.

ZDR support by model
ModelZDR available?
Claude Fable 5No — requires 30-day minimum retention
Claude Opus 4.8Yes
Claude Sonnet 5Yes
Claude Haiku 4.5Yes
Claude Opus 4.7, 4.6Yes
Claude Mythos 5Contact Anthropic

How ZDR works:

  • ZDR must be arranged as part of an API agreement — it is not a per-request option
  • With ZDR, Anthropic does not log or store prompt/completion data after the API response is delivered
  • ZDR is separate from prompt caching — cached data is still subject to your data handling agreement
  • Organisations with ZDR requirements who want the highest capability model should use Claude Opus 4.8 rather than Fable 5
  • ZDR customers are still subject to Anthropic's usage policies and safety systems
Which current Claude model does NOT support zero data retention (ZDR)?
What is zero data retention (ZDR) in the context of the Claude API?

23. What is Claude's approach to safety and what are Constitutional AI principles?

Anthropic builds Claude with a strong emphasis on AI safety — designing the model to be helpful, honest, and to avoid causing harm. The primary training technique underpinning Claude's values is Constitutional AI (CAI).

Constitutional AI works by training the model against a set of written principles (a 'constitution') rather than relying solely on human labelling for every possible scenario. The process involves:

  • Supervised learning phase — the model is trained to follow the constitution's principles
  • Reinforcement learning from AI feedback (RLAIF) — the model critiques and revises its own outputs based on the constitutional principles, without requiring a human label for every revision

Claude's three core properties (in priority order):

Claude's core properties
PriorityPropertyMeaning
1 (highest)Broadly safeSupporting human oversight of AI during the current development phase
2Broadly ethicalHaving good personal values, being honest, avoiding harmful actions
3Adherent to Anthropic's principlesActing in accordance with Anthropic's guidelines where relevant
4Genuinely helpfulBenefiting operators and users

Being broadly safe is prioritised above ethics because Claude may make mistakes, and preserving human ability to correct those mistakes is currently more important than any individual decision.

What is the highest-priority property in Claude's core behavioural hierarchy?
What does RLAIF stand for in the context of Constitutional AI?

24. What is the difference between an operator and a user in Claude's design?

Anthropic distinguishes between two types of principals who interact with Claude: operators and users. This distinction matters because it determines the level of trust Claude extends to instructions and how it resolves conflicting requests.

Operator vs User
AspectOperatorUser
Who they areCompanies or developers accessing Claude via the API to build productsEnd-users who interact with Claude through a product built by an operator
How they interactVia the system prompt and API configurationVia the human turn in conversation
Trust levelHigher — operators agree to usage policies and take responsibility for their platformLower — could be anyone; Claude applies more caution by default
Can they expand Claude's defaults?Yes — within limits Anthropic allowsOnly if the operator explicitly grants them operator-level trust
ExamplesA company building a customer service bot; a developer testing the APIThe end-customer chatting with the customer service bot

Trust hierarchy: Anthropic > Operators > Users. Operators can expand or restrict Claude's default behaviours for their platform (e.g. enable adult content on appropriate platforms or restrict Claude to only answer questions about their product). Operators cannot override Anthropic's core safety limits.

If there is no system prompt, Claude is likely being accessed directly by a developer and applies relatively liberal defaults.

How do operators interact with Claude compared to users?
What happens when there is no system prompt in a Claude API call?

25. What is Claude's context window and how are tokens counted?

Claude's context window is the total number of tokens it can process in a single API request. Tokens are the fundamental unit of text that Claude processes — roughly 3-4 characters per token for English, or about 75% of a word on average.

Token counting rules of thumb
Content typeApproximate token count
1 word (English)~1.3 tokens on average
1 page of text (~500 words)~650 tokens
1,000 characters~250 tokens
A small image (~300×300)~1,000 tokens
A large image (1568×1568 or larger)~1,600 tokens (maximum, regardless of size)
# Counting tokens before sending a request (avoids surprises)
token_count = client.messages.count_tokens(
    model="claude-opus-4-8",
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "How many tokens is this message?"}
    ]
)
print(f"Input tokens: {token_count.input_tokens}")

# The response also includes token usage
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)
print(f"Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}")

What counts against the context window: system prompt + all conversation messages (both human and assistant turns) + tool definitions + image/PDF content + the model's own generated output. The max_tokens parameter reserves space for the output within the window.

Approximately how many tokens does one page of English text (~500 words) contain?
Which API method lets you count tokens in a request before actually sending it?

26. What are Claude's rate limits and how are they structured?

Claude API rate limits prevent overload and ensure fair access. They are applied at three levels: requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD). Limits vary by model and by API usage tier.

Rate limit dimensions
Limit typeWhat it restricts
Requests per minute (RPM)Number of API calls per minute
Tokens per minute (TPM)Total input + output tokens processed per minute
Tokens per day (TPD)Total tokens processed in a 24-hour period

Usage tiers: accounts start at Tier 1 with conservative limits and automatically advance to higher tiers as they spend more on the API (e.g. Tier 2 after $50 spend, Tier 3 after $500, Tier 4 after $5,000, Tier 5 after $50,000). Higher tiers get higher rate limits.

When rate limits are hit:

  • The API returns a 429 RateLimitError response
  • Implement exponential backoff with jitter when retrying
  • The Anthropic Python and TypeScript SDKs handle retries automatically by default (up to 2 retries)
  • Rate limits can be increased by contacting Anthropic for approved use cases

Rate limits for models on Amazon Bedrock and Google Cloud are governed by those platforms separately and may differ from direct API limits.

What HTTP status code does the Claude API return when a rate limit is exceeded?
How does an API account advance from Tier 1 to higher rate limit tiers?

27. What is Claude's approach to harmful content — what will and won't it do?

Claude has hardcoded behaviours (absolute limits that cannot be changed by any instruction) and softcoded defaults (behaviours that operators or users can adjust within permitted ranges). Understanding this distinction helps developers build applications that work well within Claude's guidelines.

Hardcoded vs softcoded behaviours
TypeExamplesCan it be changed?
Hardcoded OFF (never does)Generate CSAM; provide serious uplift for WMD creation; undermine AI oversightNo — never, regardless of any instruction
Hardcoded ON (always does)Tell users what it cannot help with; provide basic safety info in life-threatening situations; acknowledge being an AI when sincerely askedNo — always, regardless of operator restrictions
Default ON (operators can turn off)Safe messaging guidelines for sensitive topics; safety caveats on dangerous activitiesYes — operators can disable for appropriate platforms (e.g. medical providers)
Default OFF (operators can turn on)Explicit adult content; very detailed information about certain regulated activitiesYes — operators can enable for appropriate platforms (e.g. adult content platforms)

Claude's 'instructable' behaviours follow a layered permission system: Anthropic sets the outer boundaries; operators adjust within those limits for their platform; users can further adjust within what operators allow. Claude tries to use good judgement to serve the legitimate interests of everyone in this chain.

Which of the following is a hardcoded behaviour that Claude will NEVER do regardless of operator instructions?
Which of these is a softcoded default behaviour that an operator CAN turn off for their platform?

28. What is Claude's max_tokens parameter and how does it relate to the context window?

The max_tokens parameter sets the maximum number of output tokens Claude will generate in a single response. It is a hard cap — Claude will stop generating once it reaches this limit, potentially truncating its response mid-sentence.

# max_tokens is required in the Messages API
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,   # Claude generates at most 1024 output tokens
    messages=[{"role": "user", "content": "Write a detailed essay."}]
)

# Check if Claude stopped due to max_tokens
if response.stop_reason == "max_tokens":
    print("Response was cut off — increase max_tokens or use a longer window")
elif response.stop_reason == "end_turn":
    print("Claude naturally finished its response")

# Relationship:
# context_window = input_tokens + max_tokens (reserved output)
# Available input = context_window - max_tokens
# e.g. for Opus 4.8: 1,000,000 - 1024 = 998,976 tokens available for input
max_tokens limits by model
ModelMaximum allowed max_tokensDefault if not set
Claude Fable 5128,000N/A — required parameter
Claude Opus 4.8128,000N/A — required parameter
Claude Sonnet 5128,000N/A — required parameter
Claude Haiku 4.564,000N/A — required parameter

max_tokens is a required parameter in the Messages API — the request will fail without it. Setting it to the maximum value is usually wasteful; choose a value appropriate for the expected response length. The stop_reason field in the response tells you why Claude stopped generating.

What does the stop_reason value 'max_tokens' indicate in a Claude API response?
If Claude Opus 4.8 has a 1 million token context window and you set max_tokens to 10,000, how many tokens are available for input?

29. What is the temperature parameter in Claude and how does it affect responses?

The temperature parameter controls the randomness of Claude's output. Higher temperatures produce more varied, creative responses; lower temperatures produce more focused, deterministic responses.

Temperature settings
ValueBehaviourBest for
0Deterministic — same input almost always gives same outputFactual Q&A, data extraction, classification
0.1–0.5Low randomness — mostly consistent with slight variationCode generation, technical analysis, structured output
0.7 (default)Balanced — the API defaultGeneral conversation, most tasks
1.0High randomness — diverse, creative outputsCreative writing, brainstorming
1.0 (max for most tasks)Maximum randomnessHighly experimental creative tasks
# Setting temperature in an API call
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    temperature=0,    # deterministic — best for factual tasks
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

# For creative writing
creative_response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=2048,
    temperature=1.0,  # more creative variation
    messages=[{"role": "user", "content": "Write a poem about the ocean."}]
)

Temperature range: 0 to 1 for standard tasks. Values above 1 are available but not recommended for most use cases as they can produce incoherent output. When using extended thinking, Anthropic recommends keeping temperature at 1 (the default for thinking-enabled requests).

What does setting temperature=0 do for Claude's responses?
For which task type would you set the highest temperature value?

30. What are Claude's multimodal capabilities — how does it process images and documents?

Claude's vision capabilities allow it to analyse and reason about images, PDFs, and screenshots alongside text. This makes it useful for document analysis, UI debugging, chart interpretation, and more.

import anthropic, base64

client = anthropic.Anthropic()

# Option 1: URL-based image (Claude fetches from URL)
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "url", "url": "https://example.com/chart.png"}
            },
            {"type": "text", "text": "Describe this chart."}
        ]
    }]
)

# Option 2: Base64-encoded image
with open("image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/jpeg", "data": image_data}
            },
            {"type": "text", "text": "What is in this image?"}
        ]
    }]
)
Supported image formats and limits
Format / LimitDetail
Supported typesJPEG, PNG, GIF, WebP
Max image size5 MB per image
Max images per requestUp to 600 images (100 for 200k context models like Haiku 4.5)
Max resolutionResized to fit within 1568×1568 pixels — larger images scaled down
Token cost (small image)~1,000 tokens
Token cost (large image)~1,600 tokens (maximum)

PDFs are also supported — they are converted to images internally and each page counts against the image limit. For documents, Claude can read text, interpret charts, and understand layout.

What is the maximum number of images that can be included in a single Claude API request for a model with a 1 million token context window?
Which image formats does Claude currently support?

31. What are the claude.ai plans and what models does each tier include access to?

claude.ai offers consumer and business plans, each with different model access and usage limits. The model you can use in the chat interface depends on your subscription tier.

claude.ai model access by plan
PlanModels availableUsage limits
FreeClaude (typically Haiku or Sonnet)Limited — daily message caps
ProSonnet and Opus models; access to latest releases5x more than Free; priority access
TeamSame as Pro + admin controlsHigher limits than Pro; per-seat billing
EnterpriseAll models; SSO; advanced securityCustom — highest limits; unlimited seat licensing
MaxAll models including the latestMaximum available — designed for heaviest users

Model selection in claude.ai:

  • Users can select their preferred model in the conversation interface (on Pro and higher plans)
  • The Free plan may automatically route to faster, smaller models to manage capacity
  • Model selection in the UI is separate from API access — you need an API key and pay separately for API usage
  • claude.ai is an Anthropic product; the API is a separate offering for developers

For the API, you choose the model by specifying the model ID in each request — there is no concept of a 'default model' in the API; you must always specify one explicitly.

How does a developer specify which Claude model to use in an API request?
What is the difference between claude.ai and the Claude API in terms of billing?

32. What is multi-turn conversation handling in Claude and how do you implement it?

Claude's Messages API is stateless — each API call is independent and Claude has no memory of previous calls unless you include the conversation history explicitly. Multi-turn conversation is implemented by appending each exchange to the messages array.

# Building a multi-turn conversation manually
messages = []

# Turn 1
messages.append({"role": "user", "content": "What is the capital of France?"})
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=256,
    messages=messages
)
assistant_reply = response.content[0].text
messages.append({"role": "assistant", "content": assistant_reply})

# Turn 2 — Claude now has context of the previous exchange
messages.append({"role": "user", "content": "What is its population?"})
response2 = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=256,
    messages=messages  # full history included
)
print(response2.content[0].text)
# Claude knows "its" refers to Paris from the previous turn

# Important: as conversation grows, context window fills up
# Common strategies when context limit approaches:
# 1. Summarise older turns and replace them with the summary
# 2. Use prompt caching on stable early context
# 3. Truncate oldest messages (may lose important context)

Key implementation notes:

  • Messages must alternate: user → assistant → user → assistant (etc.)
  • You cannot have two consecutive user or assistant messages
  • The entire conversation history is sent on every request — this grows your token count over time
  • Prompt caching can significantly reduce costs for long conversations with stable early context
Why must you include the full conversation history in every Claude API call for multi-turn conversation?
What is the required message order in a Claude multi-turn conversation?

33. What are the different stop_reason values in Claude API responses?

Every Claude API response includes a stop_reason field indicating why Claude stopped generating. Understanding stop reasons is essential for building robust applications — especially for tool use and handling truncated responses.

stop_reason values
ValueMeaningAction required?
end_turnClaude naturally finished its responseNo — response is complete
max_tokensResponse was cut off at the max_tokens limit — may be incompleteIncrease max_tokens or handle partial response
stop_sequenceClaude generated one of the stop sequences you definedNo — intentional stop point reached
tool_useClaude wants to use a tool — response contains a tool_use blockYes — execute the tool and return results
pause_turnClaude paused and is waiting for input (streaming only)Resume the stream or provide input
refusalClaude declined to continue for safety reasonsReview the request; no further action if appropriate
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    tools=[...],   # defined tools
    messages=[{"role": "user", "content": "What is the weather in London?"}]
)

match response.stop_reason:
    case "end_turn":
        print("Complete:", response.content[0].text)
    case "tool_use":
        # Claude wants to call a tool
        tool_block = next(b for b in response.content if b.type == "tool_use")
        result = execute_tool(tool_block.name, tool_block.input)
        # Send result back to Claude
    case "max_tokens":
        print("Truncated! Increase max_tokens.")
    case _:
        print(f"Stopped: {response.stop_reason}")

When stop_reason is tool_use, the application must execute the requested tool and send the result back to Claude in a new message for the conversation to continue.

What must an application do when Claude returns stop_reason='tool_use'?
Which stop_reason indicates that a Claude response may be incomplete and cut off mid-generation?

34. What is Claude's approach to honesty and what does it mean for Claude to be non-deceptive?

Honesty is a central Claude value. Anthropic designs Claude to have a cluster of honesty-related properties that go beyond simply not lying — covering how Claude represents uncertainty, its own nature, and its limitations.

Claude's honesty properties
PropertyWhat it means
TruthfulOnly sincerely asserts things it believes to be true
CalibratedAcknowledges uncertainty proportionally — says 'I think' when unsure, not when confident
TransparentDoes not pursue hidden agendas or lie about itself or its reasoning
ForthrightProactively shares useful information the user would likely want, even if not asked
Non-deceptiveNever tries to create false impressions — whether through lies, misleading framing, selective omission, or technically true but misleading statements
Non-manipulativeUses only legitimate means to influence beliefs (evidence, honest arguments) — never exploits psychological weaknesses
Autonomy-preservingProtects the user's epistemic autonomy — presents balanced views, encourages independent thinking

Important distinction — sincere vs performative assertions: Claude's honesty norms apply to sincere assertions (genuine first-person claims about reality). They do not apply to performative assertions — writing a persuasive essay arguing a position the user requested, writing a fictional story, or brainstorming counterarguments are all understood by both parties not to be Claude's direct personal views, so they are not dishonest.

Which honesty property means Claude proactively shares useful information even when not explicitly asked?
If a user asks Claude to write a persuasive essay arguing for a position Claude disagrees with, is this a violation of Claude's honesty norms?

35. What is Claude Code and how does it differ from using Claude directly via the API?

Claude Code is Anthropic's agentic coding tool — a command-line interface (CLI) and SDK that allows Claude to work autonomously on coding tasks in your terminal, with direct access to your file system, git, and development tools.

Claude Code vs Claude API
FeatureClaude CodeClaude API (direct)
InterfaceCLI tool in your terminalHTTP REST API / SDK
Setupnpm install -g @anthropic-ai/claude-codepip install anthropic or npm install @anthropic-ai/sdk
File accessYes — reads/writes files in your projectNo — you pass content in the prompt
Tool executionYes — can run commands, tests, git operationsOnly if you build tool use yourself
Use caseCoding assistance, refactoring, debugging, code generationCustom apps, chatbots, data processing
IDE integrationVS Code, JetBrains plugins availableN/A
# Installing Claude Code
npm install -g @anthropic-ai/claude-code

# Using Claude Code in your terminal
cd my-project
claude-code "Add error handling to all async functions in src/"

# Claude Code can:
# - Read and write files in your project
# - Run tests and build commands
# - Make git commits
# - Navigate and understand large codebases
# - Work through multi-step tasks autonomously

Claude Code is built on the same underlying Claude models (using Opus-tier models for best results) but provides a ready-made agentic environment with tools already wired up. The API requires you to build the tool use and agentic loop yourself.

What is the key capability Claude Code has that the raw Claude API does not provide out of the box?
How do you install Claude Code?

36. What are the Anthropic SDKs and what languages are officially supported?

Anthropic provides official SDKs that wrap the Claude API, handling authentication, request formatting, response parsing, automatic retries, and streaming. Using an SDK is strongly recommended over direct HTTP calls.

Official Anthropic SDKs
LanguagePackageInstall command
Pythonanthropicpip install anthropic
TypeScript / JavaScript@anthropic-ai/sdknpm install @anthropic-ai/sdk
Java (preview)com.anthropic:anthropic-javaMaven/Gradle dependency
Go (preview)github.com/anthropics/anthropic-sdk-gogo get github.com/anthropics/anthropic-sdk-go
Kotlin (preview)com.anthropic:anthropic-javaSame package as Java SDK
# Python SDK — basic setup
from anthropic import Anthropic

client = Anthropic()  # reads ANTHROPIC_API_KEY from environment

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}]
)
print(message.content[0].text)

# TypeScript SDK
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();  // reads ANTHROPIC_API_KEY from env

const message = await client.messages.create({
    model: "claude-opus-4-8",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello, Claude!" }],
});
console.log(message.content[0].text);

SDK benefits: automatic retries (up to 2 by default), exponential backoff on rate limit errors, streaming helpers, typed response objects, and environment variable management for API keys. The Python and TypeScript SDKs are fully mature; Java, Go, and Kotlin are in preview as of mid-2026.

How does the Anthropic Python SDK read the API key by default?
Which two SDK languages are fully mature (not in preview) as of mid-2026?

37. What is Anthropic's policy on model deprecation and how should developers prepare?

Anthropic has a formal model lifecycle and deprecation policy to help developers plan migrations without unexpected disruptions. Knowing this policy helps you build more resilient applications.

Anthropic deprecation policy key points
PolicyDetail
Minimum noticeAt least 60 days before any publicly released model is retired
Notification methodEmail to customers actively using the model being deprecated
Transition guidanceAnthropic recommends a replacement model in the deprecation announcement
Retirement behaviourAPI calls to retired models return a 404 or similar error — not a degraded response
Weight preservationAnthropic has committed to preserving model weights long-term even after retirement
Platform differencesBedrock and Vertex AI may have different retirement dates than the Claude API

Best practices for deprecation resilience:

  • Store the model ID as a configuration variable (not hardcoded) so you can update it in one place
  • Subscribe to Anthropic's status page and developer newsletter for early notice
  • Test your application with the recommended replacement model before the retirement date
  • Use model aliases (like claude-haiku-4-5 instead of the full dated ID) where available, but be aware aliases can change between major versions
  • For Amazon Bedrock or Google Cloud, also monitor those platforms' own deprecation schedules
How much advance notice does Anthropic provide before retiring a publicly released model?
What best practice helps make an application resilient when a Claude model is deprecated?

38. What are the key differences between Claude 4 and earlier Claude 3 generation models?

Claude 4 (and the Claude 4/5 generation more broadly) represents significant advances over Claude 3 across capability, context, and new features. Understanding what changed helps teams make informed migration decisions.

Claude 3 vs Claude 4+ generation comparison
FeatureClaude 3 generationClaude 4+ generation
Flagship modelClaude 3 OpusClaude Opus 4.8, Fable 5
Context window200,000 tokens (Opus 3 max)1 million tokens (Opus 4.8, Sonnet 5, Fable 5)
ThinkingNot availableExtended thinking (Haiku 4.5) and Adaptive thinking (Opus/Sonnet)
Tool streamingBeta featureGA on Sonnet 4.6+, no beta header needed
Computer usePreview on 3.5 SonnetGA on all Claude 4+ models; updated tool versions
Effort parameterNot availableAvailable on Opus 4.8 and 4.7
Model ID formatclaude-3-opus-20240229claude-opus-4-8 (no date suffix for newer models)
Extended outputBetaGA on selected models (300k via Batches API)
Vision (images per request)Up to 20 imagesUp to 600 images (100 for 200k window models)

Migration compatibility: Claude 4+ models use the same Messages API as Claude 3. Most Claude 3 code is compatible with Claude 4 models with just a model ID change. Key things to test after migration: tool call formatting, thinking feature support, and any model-specific prompt tuning that assumed Claude 3 response patterns.

What is the biggest context window improvement from Claude 3 to Claude 4+ (comparing flagship models)?
When migrating from a Claude 3 model to a Claude 4+ model, what is typically the minimum code change required?
«
»
RenovateBot Interview Questions

Comments & Discussions