Claude Managed Agents Dreaming, Outcomes, and Multi-Agent Orchestration

While everyone talks about building AI agents, few engineers understand how to make them actually improve over time. Anthropic just changed that equation with three new capabilities for Claude Managed Agents that address the fundamental limitation of stateless AI: memory that persists and refines itself between sessions.

The announcement from Anthropic introduces dreaming, outcomes, and multi-agent orchestration to Managed Agents. Harvey, the legal AI company, reported completion rates improved approximately 6x in their tests with dreaming enabled. That number alone signals something worth paying attention to.

What Makes These Updates Different

Feature	What It Does	Status
Dreaming	Self-improving memory between sessions	Research Preview
Outcomes	Rubric-based grading with self-correction	Public Beta
Multi-Agent Orchestration	Lead agents delegate to specialists	Public Beta
Webhooks	Task completion notifications	Generally Available

The core insight here is that Anthropic has moved beyond single-session agent capabilities. Through implementing agents at scale, I’ve discovered that the hardest problem isn’t getting an agent to complete a task once. It’s getting it to learn from hundreds of sessions and apply those learnings consistently.

How Dreaming Actually Works

Dreaming is a scheduled process that runs between active sessions. It reviews your agent’s session history and memory stores, extracts patterns, and curates memories so future interactions start smarter.

The system surfaces three categories of patterns that individual agents miss:

Recurring mistakes that appear across sessions
Workflows that agents converge on independently
Team preferences shared across multiple users

What makes this practically valuable is the control model. You can configure dreaming to update memory automatically, or you can review changes before they land. For production systems handling sensitive workflows, that review step matters.

When multiple subagents work in the same domain, dreaming aggregates what they collectively learned and publishes shared insights to a team-wide memory store. This addresses a persistent challenge in multi-agent architectures: knowledge isolation between agents.

Outcomes: Self-Correcting Agent Loops

The outcomes feature introduces rubric-based evaluation with a separate grader. You define what success looks like, and a distinct evaluator assesses outputs against your criteria in its own context window.

This architectural separation matters. The grader isn’t influenced by the agent’s reasoning, so it provides genuinely independent assessment. When results fall short, the grader pinpoints what needs to change and the agent takes another pass.

In Anthropic’s internal testing, outcomes improved task success by up to 10 percentage points over standard prompting loops. The gains were even more pronounced for document generation: 8.4% improvement for docx files and 10.1% for pptx files.

Wisedocs, which handles document quality checks, reported 50% faster reviews with outcomes-based compliance checking. That kind of improvement changes the economics of AI agent evaluation entirely.

Multi-Agent Orchestration in Practice

When there’s too much work for a single agent, multi-agent orchestration lets a lead agent break jobs into pieces and delegate each to a specialist. Each specialist gets its own model, prompt, and tool configuration.

The architecture enables parallel execution on shared filesystems. Persistent events allow mid-workflow communication, and full context memory carries across all participating agents. Everything traces back to the Claude Console for observability.

Netflix has deployed this for platform team operations. Their setup runs a lead agent coordinating investigation while subagents fan out through deploy history, error logs, metrics, and support tickets simultaneously. The agentic coding patterns that work for individual developers scale differently when orchestrated properly.

Financial Services Templates

Anthropic released ten ready-to-run agent templates specifically for financial services work. These package three components together:

Skills: Domain-specific instructions and knowledge for financial workflows

Connectors: Governed access to data sources including FactSet, S&P Capital IQ, and Moody’s

Subagents: Specialized Claude models handling specific sub-tasks like comparables selection

The templates cover research and client coverage (pitch builder, meeting preparer, earnings reviewer, model builder, market researcher) plus finance and operations (valuation reviewer, general ledger reconciler, month-end closer, statement auditor, KYC screener).

Every tool call and decision logs to the Claude Console for compliance inspection. For regulated industries, that audit trail isn’t optional.

Production Considerations

Warning: Dreaming is currently in research preview. You’ll need to request access and may wait before approval. Outcomes and multi-agent orchestration are in public beta, so they’re more accessible but still evolving.

The memory and dreaming combination creates a robust system for self-improving agents, but it also introduces new failure modes to consider:

Memory drift over time if dreaming extracts incorrect patterns
Latency implications when agents load larger memory stores
Governance questions about what gets memorized from user sessions

For teams building on Claude Code, the financial services templates ship as plugins. They’re available on all paid plans, offering a faster path to production than building custom agent architectures from scratch.

What This Means for AI Engineers

The combination of dreaming, outcomes, and orchestration represents a shift in what’s possible with managed agent platforms. Individual agents that forget everything between sessions have always been limited. Systems that learn and improve from accumulated experience operate differently.

Harvey’s 6x completion rate improvement isn’t about better prompts or more capable base models. It’s about persistent learning that compounds across sessions. That’s the implementation pattern that changes production economics.

For AI engineers evaluating agent platforms, the questions have shifted. It’s no longer just about what an agent can do in a single session. It’s about what it learns across thousands of sessions and how that learning gets governed.

Frequently Asked Questions

How does dreaming differ from standard agent memory?

Standard memory captures learnings during active work. Dreaming runs between sessions, reviewing patterns across all previous interactions and restructuring memory to stay high-signal as it evolves. Think of it as scheduled consolidation rather than real-time capture.

Can I use these features with my existing Claude integration?

Dreaming, outcomes, and multi-agent orchestration require Managed Agents on the Claude Platform. If you’re currently using the API directly, you’ll need to evaluate whether the managed infrastructure fits your architecture.

What’s the pricing model for these new capabilities?

Anthropic hasn’t published separate pricing for dreaming or outcomes. They’re bundled with Managed Agents access. The financial services templates are available as plugins in Claude Cowork and Claude Code on all paid plans.

Sources

New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration

If you’re building production agent systems and want to understand the foundations that power self-improving AI, join the AI Engineering community where members follow 25+ hours of exclusive AI courses, get weekly live coaching, and work toward $200K+ AI careers.

Zen van Riel

Senior AI Engineer | Ex-Microsoft, Ex-GitHub

I went from a $500/month internship to Senior AI Engineer. Now I teach 30,000+ engineers on YouTube and coach engineers toward six-figure AI careers in the AI Engineering community.

Blog last updated Jul 7, 2026