How to Always Enable Ultrathink in Claude Code and MAX_THINKING_TOKENS Behavior
- This article was created manually
- Based on Claude Code v1.0.43
I’m Oikon. I work as an engineer at a foreign IT company.
Do you use ultrathink when using Claude Code daily?
I think the word ultrathink is now known by all Claude Code users, but in this article, I’d like to dig a bit deeper.
I investigated the following points that I was personally curious about:
- Can you always enable
ultrathink? - Is
MAX_THINKING_TOKENScompletely fixed? - How much thinking token budget can you allocate?
I don’t think there’s anything particularly new here, but since there was a decent response on 𝕏, I thought there might be some demand and wrote this article.
Article Summary
ultrathink(31999 tokens) increases the thinking token budget- Can be overridden with environment variable
MAX_THINKING_TOKENS(1024 ~ 200000) - From a performance perspective, Opus 4 (31999 or less), Sonnet 4 (63999 or less) are recommended
- Setting in env allows always enabling
ultrathinkduring Thinking mode
{
"env": {
"MAX_THINKING_TOKENS": "31999" // 1024 ~ 200000
}
} Notes:
- Setting MAX_THINKING_TOKENS too high may cause the model to not work properly and risks timeout
- This is a change to the thinking budget for Thinking mode, and you still need to explicitly enter Thinking mode with
Think
By the way, I personally don’t particularly recommend always enabling ultrathink.
Extended Thinking
Claude 4 is a reasoning model, so it performs internal reasoning when executing tasks. A thinking budget is allocated for this.
To explicitly raise the thinking budget, Claude Code internally performs keyword matching.
English examples:
- HIGHEST (31999 tokens):
ultrathink,think harder - MIDDLE (10000 tokens):
megathink,think hard - BASIC (4000 tokens):
think
It also supports 7 languages besides English.
This topic has already been discussed in detail in the following article, so I’ll skip it:
MAX_THINKING_TOKENS Specification
Configuration Method
Claude Code’s thinking budget is determined by keyword matching, but it can also be overridden with the environment variable MAX_THINKING_TOKENS.
export MAX_THINKING_TOKENS=31999
You can confirm it’s correctly set when Overrides appears at Claude Code startup.

Environment variables can also be set individually in Claude Code’s configuration file settings.json.
{
"env": {
"MAX_THINKING_TOKENS": "31999" // always ultrathink
}
} Environment variables are documented in Anthropic’s official documentation:
Minimum and Maximum Values
MAX_THINKING_TOKENS accepts values of 1024 or more and 200,000 or less.
-
MAX_THINKING_TOKENS=1023:

-
MAX_THINKING_TOKENS=200001:

I believe the 200,000 tokens limit comes from Claude 4’s context window upper limit (please let me know if I’m wrong).
Configuration Notes
Looking at the official documentation, Claude 4’s output token limits are:
Max output:
- Claude Opus 4: 32000 tokens
- Claude Sonnet 4: 64000 tokens
Therefore, setting MAX_THINKING_TOKENS too high may have disadvantages.
I received some notes about MAX_THINKING_TOKENS from Shinchaku-san (@lfji), so I’ll quote them here:
Regarding
MAX_THINKING_TOKENSvalues, I recommend setting it to 31999 or less for Opus, and 63999 or less for Sonnet. Specifying larger values seems to cause disadvantages.
- First, each model has an upper limit on tokens that can be output per API call (Max output): 32000 for Opus and 64000 for Sonnet.
- Claude Code automatically adds 1 to the value set in
MAX_THINKING_TOKENSand sends it as the API’smax_tokensparameter.- Therefore, setting
MAX_THINKING_TOKENSabove these limits causes the first API call to error due to exceeding the token limit.- Claude Code internally auto-corrects the
max_tokensparameter and retries, but when this retry process executes, the streaming feature (Server-Sent Events) that returns responses in real-time stops working.- As a result, it switches to processing that waits for all model responses to complete, which may hit Claude Code’s timeout value.
Since responses generated by the model may be wasted, it’s better to set
MAX_THINKING_TOKENSbelow these limits to avoid API errors from the start.
Additional Investigation
To confirm MAX_THINKING_TOKENS behavior myself, I actually analyzed Claude Code’s source code.
The investigation steps were:
- Build Claude Code’s Dockerfile in a container
- Scan the extracted source code
- Have Claude analyze the relevant parts
I used Apple’s container for extraction (because I wanted to try it). For containers, I referenced this article:
Build the container and enter it:
container build -t claude-code .
container run -it claude-code zsh
Locate node_modules:
> npm root -g
/usr/local/share/npm-global/lib/node_modules
There’s @anthropic-ai/claude-code in node_modules, so go inside:
cd /usr/local/share/npm-global/lib/node_modules/@anthropic-ai/claude-code
The extracted cli.js exists, so extract the logic around MAX_THINKING_TOKENS:
grep -R -n 'MAX_THINKING_TOKENS' . 2>/dev/null
By the way, it looks like this. You can also see the Think keyword localization implementation:

Since it’s unreadable as-is, I copied and pasted it and had Claude analyze the logic. Extract of the processing flow:
1. Input Analysis
Convert user message content to lowercase
Extract text content
Match against each language's patterns
2. Pattern Matching
Check HIGHEST level patterns first
If not found, check MIDDLE level
Finally check BASIC level
If none match, NONE (0 tokens)
3. Token Allocation
If environment variable MAX_THINKING_TOKENS is set, use it preferentially
Determine token count based on pattern matching results
Record token usage in telemetry
I’ve also published Claude’s full analysis:
Summary
In this article, I investigated Claude Code’s ultrathink again.
- How to always enable
ultrathink - Fixing thinking tokens with
MAX_THINKING_TOKENS - Notes on thinking token budget allocation
I hope this was helpful.
Follow Me on 𝕏!
I also share information on 𝕏, so I’d appreciate it if you followed me!
References
- Anthropic Official Documentation “Claude Code Settings Guide”:
- Anthropic Official Documentation “Models overview”:
- fbbp. “Thorough Explanation of ‘ultrathink’ - Controlling Claude Code’s Thinking Tokens” Zenn:
-
lfji (@lfji). “MAX_THINKING_TOKENS Recommended Limits and Notes” (Post) 𝕏: https://x.com/lfji/status/1941282304762183879
-
schroneko. “Notes on Running Claude Code with Apple Container” Zenn: