Claude Managed Agents Add Dreaming, Outcomes, and Multi-Agent Orchestration
Most AI agents forget everything between sessions. They make the same mistakes repeatedly, require constant hand-holding, and never develop the institutional knowledge that makes human teams effective over time. Anthropic just changed that equation.
At their Code with Claude 2026 event yesterday, Anthropic announced three features that fundamentally shift how production agents operate: dreaming for between-session learning, outcomes for autonomous goal pursuit, and multi-agent orchestration for coordinated specialist teams. These are the infrastructure upgrades that separate toy demos from enterprise-grade systems.
| Feature | Status | Key Capability |
|---|---|---|
| Dreaming | Research Preview | Self-improvement between sessions |
| Outcomes | Public Beta | Autonomous iteration toward goals |
| Multi-Agent Orchestration | Public Beta | 20 parallel specialist agents |
| Memory | Public Beta | Cross-session knowledge retention |
Why Agents That Dream Matter for Production Systems
Dreaming is a scheduled background process that reviews past agent sessions, extracts patterns, and curates memory stores so your agents improve over time. Unlike in-session memory that captures what happens during a conversation, dreaming refines knowledge between sessions by pulling shared learnings across agents.
The practical impact is significant. Harvey, a legal AI company using Managed Agents for document work, reported approximately 6x higher completion rates after implementing dreaming. Not from a model upgrade, but purely from agents carrying institutional knowledge across sessions. The agents learned filetype workarounds, tool-specific patterns, and client preferences that previously required human intervention.
What makes dreaming particularly valuable for agentic AI development is its ability to surface patterns that individual agents cannot detect: recurring mistakes, workflows that teams converge on, and preferences shared across specialists. For long-running projects and multi-agent coordination, this collective intelligence becomes a competitive advantage.
Important constraint: Dreaming remains in research preview. You can request access, but it is not generally available. The original memory store stays unchanged, with Claude generating a separate output store that you can review before committing changes.
Outcomes: Autonomous Iteration Without Constant Supervision
The outcomes feature addresses a persistent production challenge: getting agents to work toward defined quality standards without requiring human review of every output. You write a rubric describing what success looks like, and the agent iterates autonomously until it meets your criteria.
A separate grader evaluates the output against your criteria in its own context window, isolated from the agent’s reasoning. When something falls short, the grader pinpoints what needs to change and the agent takes another pass. In Anthropic’s internal testing, outcomes improved task success by up to 10 percentage points over standard prompting loops, with the largest gains on harder problems.
The use cases span both objective requirements (complete all specified sections, include required data points) and subjective standards (match brand voice, maintain design consistency). Wisedocs reduced their review cycles by 50% while maintaining quality standards using this approach. For document generation specifically, Anthropic reports +8.4% success improvement on Word files and +10.1% on PowerPoint.
This changes how you think about agent evaluation frameworks. Instead of building elaborate post-processing validation, you define success criteria upfront and let the system iterate.
Multi-Agent Orchestration for Complex Workflows
When work exceeds what a single agent handles well, multi-agent orchestration lets a lead agent decompose tasks and delegate to specialists. Each specialist runs with its own model, prompt, and tools. They work in parallel on a shared filesystem and contribute to the lead agent’s overall context.
The platform supports up to 20 parallel specialist agents. Persistent event tracking allows agents to check on peer progress mid-workflow, and all agents maintain full memory of completed actions. The Claude Console provides complete execution visibility showing agent identity, sequence, and rationale for every decision.
Netflix’s platform team demonstrates a practical pattern: their analysis agent processes logs from hundreds of builds across different sources. With changes affecting thousands of applications, what matters is patterns that recur across many builds, not individual failures. Multi-agent orchestration lets them analyze batches in parallel and surface only recurring patterns worth acting on.
Rakuten deployed enterprise agents across product, sales, marketing, and finance that plug into Slack and Teams. Employees assign tasks and receive deliverables like spreadsheets, slides, and applications. Each specialist agent was deployed within a week using this orchestration approach.
Warning: This level of coordination requires careful architecture. The agent pipeline patterns that work for single agents do not translate directly. You need clear handoff protocols, shared state management, and monitoring for coordination failures.
The Memory Foundation Connecting Everything
Memory on Managed Agents entered public beta alongside these features. Your agents capture what they learn during sessions using a filesystem-based memory layer. Because memories are stored as files, you can export them, manage them via API, and maintain full control over what agents retain.
Rakuten’s task-based agents demonstrate the practical value: they learn from every session and avoid repeating past mistakes, cutting first-pass errors by 97%. The memories remain observable and workspace-scoped for enterprise compliance requirements.
The combination creates a coherent system: memory captures knowledge in real-time, dreaming refines it between sessions, outcomes ensure quality without manual review, and orchestration coordinates specialists when work exceeds single-agent capacity.
Implementation Considerations for AI Engineers
These features represent infrastructure, not magic. Before adopting them, consider your agent scaling challenges and whether organizational readiness matches technical capability.
When to use dreaming: Long-running projects where agents perform similar tasks repeatedly. The value compounds over weeks and months, not individual sessions.
When to use outcomes: Tasks with clear quality criteria that previously required human review. The separate grader architecture prevents the agent from gaming its own evaluation.
When to use multi-agent orchestration: Complex workflows requiring diverse specializations. For simpler tasks, single-agent approaches remain more predictable and easier to debug.
The API volume on Claude’s platform increased 17x year-over-year, and the SpaceX Colossus partnership doubled rate limits for Claude Code. This suggests Anthropic is betting heavily on agentic workloads becoming the dominant pattern for their infrastructure.
For engineers evaluating these capabilities, the question is not whether agentic AI works. It clearly does at enterprises like Harvey, Rakuten, and Netflix. The question is whether your organization has the production-grade evaluation processes to deploy agents responsibly.
Frequently Asked Questions
How do I access dreaming for Claude Managed Agents?
Dreaming remains in research preview. You can request access through Anthropic’s documentation portal. The feature requires an existing Managed Agents deployment with memory enabled.
Can I use outcomes with any Claude model?
Outcomes work with Managed Agents across model tiers. The grader runs in a separate context window, so model selection affects cost and capability but not the evaluation architecture.
What happens if multi-agent orchestration fails mid-workflow?
All agents maintain full memory of completed actions, and the platform provides webhook notifications for task completion. You can design recovery flows that resume from the last successful checkpoint.
Recommended Reading
- Agentic AI Practical Guide for AI Engineers
- AI Agent Development Practical Guide for Engineers
- AI Agent Pipelines Structure, Pitfalls, and Best Practices
Sources
To see how production agent architectures evolve with these new capabilities, watch the full video tutorial on YouTube.
If you are building AI agents that need to learn and improve over time, join the AI Engineering community where engineers share implementation patterns for production agentic systems.
Inside the community, you will find direct guidance on agent memory management, multi-agent coordination, and the evaluation frameworks that ensure your agents deliver consistent results.