A developer’s experiment in minimalist communication has turned into a practical cost-cutting tool for the AI community. The original post, published on r/ClaudeAI last week, has accumulated over 10,000 votes and 400 comments after the author demonstrated that forcing Claude to respond in short, blunt sentences — reminiscent of prehistoric speech — could reduce output token counts by as much as 75%. The technique strips away pleasantries, step-by-step narration, and closing offers to assist, leaving only direct results.
The core mechanic is straightforward: instead of allowing the model to warm up with conversational filler and explain every action it takes, developers constrain it to deliver results first with minimal elaboration. A standard web search task that would normally generate around 180 output tokens dropped to roughly 45 under these constraints. The approach is summarised neatly by one commenter’s observation that there is little reason to use many words when fewer words achieve the same outcome.
However, the headline figure requires context. The technique only affects output tokens, leaving untouched the input context — which includes full conversation history, attached files, and system instructions that the model re-reads on every turn. In longer coding sessions, input typically far exceeds output in volume. When total token usage is accounted for, real-world savings land closer to 25%, which is still a meaningful reduction but well below the maximum claimed figure. Developers are also advised to keep their own instructions clear and conventional, as feeding the model degraded input risks producing degraded output in return.
A separate concern raised by researchers in the thread is whether constraining the model’s verbal style could impair its reasoning quality. The argument is that forcing an AI into a less sophisticated linguistic persona might cause cognitive constraints to follow. This question has not been definitively resolved, though it remains a relevant consideration for developers evaluating the trade-off between brevity and output quality.
The idea quickly moved from Reddit to GitHub, where developer Shawnchee packaged the approach into a standalone caveman skill compatible with Claude Code, Cursor, Windsurf, Copilot, and more than 40 other agents. The skill encodes ten rules, including no filler phrases, executing tasks before explaining them, no meta-commentary, no preamble or postamble, no tool announcements, and treating errors as problems to fix rather than events to narrate. Benchmarks verified with tiktoken show output token reductions of 68% on web search tasks, 50% on code edits, and 72% on question-and-answer exchanges, averaging 61% across four standard task types.
A parallel repository by developer Julius Brussee, framed as a SKILL.md file, has attracted 562 stars on GitHub with a slightly different implementation. Its specification instructs the model to respond like a knowledgeable but terse speaker, cutting articles, filler, and pleasantries while preserving all technical substance. Code blocks remain unchanged, error messages are quoted exactly, and technical terminology is kept intact. The repository also offers three modes — Normal, Lite, and Ultra — allowing developers to choose how aggressively they want responses trimmed.
The financial stakes give the experiment practical weight beyond its comedic framing. Anthropic is among the more expensive AI providers on a per-token basis, and for developers running agentic workflows with dozens of turns per session, verbose output is a direct cost rather than a stylistic inconvenience. Tokens saved across thousands of API calls accumulate into a measurable line item. The caveman skill is installable in a single command via skills.sh and applies globally across projects, making adoption relatively low-effort for teams already using supported tools.
Originally reported by Decrypt.
