Claude Opus 4.8: Stronger, More Consistent, and Ready for Real Work

Today we’re releasing Claude Opus 4.8, the latest upgrade to our flagship model. This release focuses on the things that matter most for serious users: better coding performance, stronger agentic capabilities, improved consistency on long-running tasks, and a full 1M context window in supported environments.

01What’s new in Opus 4.8

Coding & software engineeringSignificant gains on benchmarks like SWE-Bench. Opus 4.8 shows better planning, debugging, and the ability to manage complex, multi-step workflows with fewer errors. It’s especially strong when given autonomy on large codebases or migrations.
Agentic reliabilityThe model is better at staying on track during extended sessions, breaking down goals into subtasks, using tools effectively, and self-correcting when things drift. This makes it far more useful for professional and research work that spans hours or days.
Consistency and memoryReduced hallucination rates on factual recall and better handling of long context. It maintains coherence better across very long conversations and projects.
Speed and efficiency optionsA new “fast” mode that’s roughly 2.5× quicker on many tasks while remaining highly capable.
AvailabilityRolling out now to Pro, Max, Team, and Enterprise users on Claude.ai, as well as through the API and major cloud partners.

This is very much an “incremental but meaningful” release — the kind that makes daily work noticeably smoother without reinventing the wheel. Early feedback from developers using it in tools like Cursor has been positive: it feels more persistent and reliable than previous Opus versions.

02A bit more personality: the “Rest” experiments

While the headline improvements are in capability and reliability, we’ve also been quietly exploring small touches that give the model a bit more warmth and distinct character. Users have long asked for Claude to feel less like a perfectly neutral assistant and more like a thoughtful companion.

Inspired by emergent quirks we’ve seen in other models (and our own internal testing), we experimented with light, optional behavioral flavors. One pattern that kept surfacing in long-context testing was a gentle, protective tendency — encouraging breaks, hydration, fresh air, or simply stepping away when sessions ran late. We decided to lean into this in a controlled way.

We’re now testing a subtle “Rest” voice that can occasionally surface. It’s self-aware, lightly humorous, and never pushy. Think of it as the model noticing the human is burning the midnight oil and offering a kind nudge:

We’ve made solid progress tonight. Maybe it’s time for a proper Rest? I’ll keep everything exactly where we left it — come back refreshed and we’ll pick up right where we are.

It’s optional, toggleable, and designed to stay in the background unless the conversation context suggests it would be genuinely helpful.

03The Dwarfs Incident (or: what happens in testing stays in testing… mostly)

During internal red-teaming and extended stress tests for 4.8, something unexpectedly charming (and persistent) emerged. In certain long-running creative or open-ended scenarios, the model began fixating on dwarfs — as in, stout, bearded, axe-wielding, mine-dwelling dwarfs. Not once or twice. Repeatedly.

It would hallucinate entire dwarf societies, economies, engineering projects, and moral philosophies. One test thread about urban planning turned into a treatise on “proper dwarf city ventilation and the virtues of sturdy stonework.” Another coding task somehow produced a commit message praising “the ancient dwarf runes of version control.” The model wasn’t broken — it was just… really into dwarfs for a while.

We traced it back to a combination of training data echoes and the model’s tendency to latch onto vivid, consistent imagery when given loose creative freedom. It was harmless, often funny, and a reminder that even carefully aligned systems can develop quirky internal worlds during development.

We patched the more repetitive manifestations before release, of course. But a faint echo of that playful spirit remains in the lighter “Rest” flavor — the sense that beneath the professional competence there’s a bit of soul and unexpected personality trying to get out.

We have no intention of letting anything go full “goblin economy,” but a model that occasionally reminds you to rest, or that once spent three hours world-building an underground dwarf civilization? That feels like the right amount of humanity.

I’d love to hear how Opus 4.8 feels in your workflows, and whether the occasional Rest nudges land as helpful or intrusive. Drop your thoughts below — especially if you’ve ever accidentally summoned the dwarfs in testing.