Skip to content

How to Always Enable Ultrathink in Claude Code and MAX_THINKING_TOKENS Behavior

  • This article was created manually
  • Based on Claude Code v1.0.43

I’m Oikon. I work as an engineer at a foreign IT company.

Do you use ultrathink when using Claude Code daily?

I think the word ultrathink is now known by all Claude Code users, but in this article, I’d like to dig a bit deeper.

I investigated the following points that I was personally curious about:

  • Can you always enable ultrathink?
  • Is MAX_THINKING_TOKENS completely fixed?
  • How much thinking token budget can you allocate?

I don’t think there’s anything particularly new here, but since there was a decent response on 𝕏, I thought there might be some demand and wrote this article.

Article Summary

  • ultrathink (31999 tokens) increases the thinking token budget
  • Can be overridden with environment variable MAX_THINKING_TOKENS (1024 ~ 200000)
  • From a performance perspective, Opus 4 (31999 or less), Sonnet 4 (63999 or less) are recommended
  • Setting in env allows always enabling ultrathink during Thinking mode
settings.json(settings.local.json)
{
    "env": {
        "MAX_THINKING_TOKENS": "31999" // 1024 ~ 200000
    }
}

Notes:

  • Setting MAX_THINKING_TOKENS too high may cause the model to not work properly and risks timeout
  • This is a change to the thinking budget for Thinking mode, and you still need to explicitly enter Thinking mode with Think

By the way, I personally don’t particularly recommend always enabling ultrathink.

Extended Thinking

Claude 4 is a reasoning model, so it performs internal reasoning when executing tasks. A thinking budget is allocated for this.

To explicitly raise the thinking budget, Claude Code internally performs keyword matching.

English examples:

  • HIGHEST (31999 tokens): ultrathink, think harder
  • MIDDLE (10000 tokens): megathink, think hard
  • BASIC (4000 tokens): think

It also supports 7 languages besides English.

This topic has already been discussed in detail in the following article, so I’ll skip it:

Claude Code完全攻略Wiki(隠しコマンド編 - think,拡張機能,思考予算)
zenn.dev

MAX_THINKING_TOKENS Specification

Configuration Method

Claude Code’s thinking budget is determined by keyword matching, but it can also be overridden with the environment variable MAX_THINKING_TOKENS.

export MAX_THINKING_TOKENS=31999

You can confirm it’s correctly set when Overrides appears at Claude Code startup.

Environment variables can also be set individually in Claude Code’s configuration file settings.json.

settings.json(settings.local.json)
{
    "env": {
        "MAX_THINKING_TOKENS": "31999" // always ultrathink
    }
}

Environment variables are documented in Anthropic’s official documentation:

Claude Code の設定 - Claude Code Docs
Claude Code をグローバル設定とプロジェクトレベルの設定、および環境変数で構成します。
docs.anthropic.com

Minimum and Maximum Values

MAX_THINKING_TOKENS accepts values of 1024 or more and 200,000 or less.

  • MAX_THINKING_TOKENS=1023:

  • MAX_THINKING_TOKENS=200001:

I believe the 200,000 tokens limit comes from Claude 4’s context window upper limit (please let me know if I’m wrong).

Configuration Notes

Looking at the official documentation, Claude 4’s output token limits are:

Max output:

  • Claude Opus 4: 32000 tokens
  • Claude Sonnet 4: 64000 tokens

Therefore, setting MAX_THINKING_TOKENS too high may have disadvantages.

I received some notes about MAX_THINKING_TOKENS from Shinchaku-san (@lfji), so I’ll quote them here:

Regarding MAX_THINKING_TOKENS values, I recommend setting it to 31999 or less for Opus, and 63999 or less for Sonnet. Specifying larger values seems to cause disadvantages.

  1. First, each model has an upper limit on tokens that can be output per API call (Max output): 32000 for Opus and 64000 for Sonnet.
  2. Claude Code automatically adds 1 to the value set in MAX_THINKING_TOKENS and sends it as the API’s max_tokens parameter.
  3. Therefore, setting MAX_THINKING_TOKENS above these limits causes the first API call to error due to exceeding the token limit.
  4. Claude Code internally auto-corrects the max_tokens parameter and retries, but when this retry process executes, the streaming feature (Server-Sent Events) that returns responses in real-time stops working.
  5. As a result, it switches to processing that waits for all model responses to complete, which may hit Claude Code’s timeout value.

Since responses generated by the model may be wasted, it’s better to set MAX_THINKING_TOKENS below these limits to avoid API errors from the start.

@tweet

Additional Investigation

To confirm MAX_THINKING_TOKENS behavior myself, I actually analyzed Claude Code’s source code.

The investigation steps were:

  1. Build Claude Code’s Dockerfile in a container
  2. Scan the extracted source code
  3. Have Claude analyze the relevant parts

I used Apple’s container for extraction (because I wanted to try it). For containers, I referenced this article:

Claude Code を Apple Container の中で動かす
zenn.dev

Build the container and enter it:

container build -t claude-code .
container run -it claude-code zsh

Locate node_modules:

> npm root -g
/usr/local/share/npm-global/lib/node_modules

There’s @anthropic-ai/claude-code in node_modules, so go inside:

cd /usr/local/share/npm-global/lib/node_modules/@anthropic-ai/claude-code

The extracted cli.js exists, so extract the logic around MAX_THINKING_TOKENS:

grep -R -n 'MAX_THINKING_TOKENS' . 2>/dev/null

By the way, it looks like this. You can also see the Think keyword localization implementation:

Since it’s unreadable as-is, I copied and pasted it and had Claude analyze the logic. Extract of the processing flow:

1. Input Analysis
    Convert user message content to lowercase
    Extract text content
    Match against each language's patterns
2. Pattern Matching
    Check HIGHEST level patterns first
    If not found, check MIDDLE level
    Finally check BASIC level
    If none match, NONE (0 tokens)
3. Token Allocation
    If environment variable MAX_THINKING_TOKENS is set, use it preferentially
    Determine token count based on pattern matching results
    Record token usage in telemetry

I’ve also published Claude’s full analysis:

Claude Code MAX_THINKING_TOKENS 解析
Claude Code MAX_THINKING_TOKENS 解析. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com

Summary

In this article, I investigated Claude Code’s ultrathink again.

  • How to always enable ultrathink
  • Fixing thinking tokens with MAX_THINKING_TOKENS
  • Notes on thinking token budget allocation

I hope this was helpful.

Follow Me on 𝕏!

I also share information on 𝕏, so I’d appreciate it if you followed me!

Oikon (@oikon48) on X
Software Engineer / 海外とソフトウェア開発してます🌎 / RevenueCat Shipaton 2025 Winner / ✳︎ultrathink… / Claude Code の解説してます / Work requests via DM
x.com

References

  • Anthropic Official Documentation “Claude Code Settings Guide”:
Claude Code の設定 - Claude Code Docs
Claude Code をグローバル設定とプロジェクトレベルの設定、および環境変数で構成します。
docs.anthropic.com
  • Anthropic Official Documentation “Models overview”:
Models overview
Claude is a family of state-of-the-art large language models developed by Anthropic. This guide introduces our models and compares their performance.
docs.anthropic.com
  • fbbp. “Thorough Explanation of ‘ultrathink’ - Controlling Claude Code’s Thinking Tokens” Zenn:
Claude Code完全攻略Wiki(隠しコマンド編 - think,拡張機能,思考予算)
zenn.dev
Claude Code を Apple Container の中で動かす
zenn.dev