Choosing an Agentic Framework: LangGraph vs CrewAI vs AutoGen vs Custom

The agentic framework market is crowded and moving fast. A framework that was the obvious choice 12 months ago might now be a maintenance liability. A framework that is well-suited for prototyping might fall over the first time you need real observability or multi-tenant deployment.

This article compares the major frameworks across the dimensions that actually matter in production, walks through how the same task looks in three different frameworks, and ends with a five-question checklist for choosing.

The contenders

LangGraph. A graph-based orchestration layer from the LangChain team. You define agents and tools as nodes and the flow as edges. State is explicit. Strong observability via LangSmith. Mature for complex agent workflows.

CrewAI. A role-based framework. You define agents with roles, goals, and tools, and tasks that crews of agents execute. Higher level than LangGraph, less explicit about state machines.

AutoGen. Microsoft's framework for multi-agent conversation. Strong primitives for agent-to-agent chat. Now part of a larger Microsoft AI agent ecosystem (Semantic Kernel, Agent Framework). Production-readiness improved significantly through 2025-2026.

Anthropic Claude Agent SDK. A focused SDK for building Claude-based agents with tools, MCP servers, sessions, and streaming. Less of a framework, more of an agent runtime tightly aligned with Claude capabilities.

Vercel AI SDK. A TypeScript-first SDK with first-class streaming, React hooks, and provider-agnostic model calls. Lighter than full frameworks. Strong for web app integration.

Vanilla / custom. Direct provider SDKs (Anthropic, OpenAI, Google) plus your own orchestration. More control, more code, no framework lock-in.

Comparison matrix

| Dimension | LangGraph | CrewAI | AutoGen | Claude Agent SDK | Vercel AI SDK | Custom | | --- | --- | --- | --- | --- | --- | --- | | Primitives | Graphs of nodes and edges | Crews, agents, tasks | Multi-agent chat | Sessions, tools, MCP | Generate, stream, tools | Whatever you build | | State management | Explicit state schema with reducers | Implicit per-crew | Conversation history | Session-managed | Stateless or app-managed | Custom | | Tool calling | Provider-agnostic via LangChain | Provider-agnostic | Provider-agnostic | Claude tools + MCP native | Provider-agnostic | Direct provider | | MCP support | Via integration package | Via integration package | Via integration package | Native | Plugin / external | Direct | | Observability | LangSmith first-class | LangSmith / external | OpenTelemetry / external | OTel / external | OTel / external | Whatever you wire | | Production readiness | Mature | Maturing | Mature | Mature | Mature for web | Depends | | Learning curve | Medium to steep | Low to medium | Medium | Low | Low | Highest | | Community size | Large | Growing | Large (Microsoft-backed) | Growing | Large (web) | N/A | | Breaking changes per major | Frequent in LangChain ecosystem | Moderate | Moderate | Low so far | Moderate | None | | Lock-in risk | Medium-high (LangChain ecosystem) | Medium | Medium | High (Claude-specific) | Low | None |

Same task, three frameworks

Consider a simple task: a research agent that takes a query, searches the web, reads top results, and produces a structured summary with citations.

LangGraph sketch

You define a graph with nodes for: parse query, web search, fetch and rank URLs, read pages, synthesize, format output. State is a typed dict carrying query, URLs, page contents, draft, citations. Edges include conditional branches (skip read if cache hit, retry on fetch failure). The framework manages state transitions and persists intermediate state for replay.

Strengths: explicit state, retryability, you can visualize the graph and trace it node by node. Strong fit when the workflow is mostly deterministic with model calls at well-defined steps.

Weaknesses: more boilerplate for simple tasks. The graph abstraction can feel heavy when your workflow is actually "loop until done."

CrewAI sketch

You define agents: a "researcher" with web search tools, a "writer" with summarization tools. You define tasks: "research this query" and "synthesize a summary with citations." A crew runs the tasks, optionally with the researcher's output feeding the writer.

Strengths: very readable. Easy to onboard non-experts. Role-based metaphor matches how teams think about delegation.

Weaknesses: state management is less explicit. Debugging hand-off failures between agents can be harder than in LangGraph. Less control over the exact prompt and tool flow.

Claude Agent SDK sketch

You define tools (web_search, fetch_url, summarize) and let the agent loop. The SDK handles the tool-use protocol, session persistence, streaming, and MCP integration. You write less code; the agent decides the order of operations.

Strengths: minimal boilerplate. Claude's tool use is strong, so the implicit planning works well. Streaming and session APIs are first-class.

Weaknesses: provider lock-in to Claude. Less explicit about graph structure if your workflow needs deterministic stages.

The right choice depends on how deterministic vs autonomous you want the agent to be. Graph-based frameworks favor determinism. Agent-loop frameworks favor autonomy.

Build custom vs adopt a framework

The honest answer: most teams should start with a framework and migrate to custom only when they hit specific limitations.

Reasons to build custom:

You have a unique state model that does not fit any framework's primitives
You need extreme cost or latency optimization that requires bespoke batching, caching, and routing
You are building the framework as a product (you ship the framework to customers)
You have a strong existing platform that any framework would fight against
You need to keep the dependency footprint minimal for compliance reasons

Reasons to adopt a framework:

Time to first working agent is days, not weeks
Observability hooks are pre-built
Community has solved the integration problems you would otherwise hit
Your team is not a research lab and does not need to invent state machines

The most common mistake: adopting a heavy framework for a simple use case, then carrying its complexity forever. The second most common mistake: building custom too early because the team likes the idea of "owning the stack," and ending up with half a framework that nobody can maintain.

Framework selection in 5 questions

When you sit down to choose, answer these in order.

How autonomous is the agent? If it is highly autonomous (the model picks tools freely and loops until done), the Claude Agent SDK or a minimal custom wrapper around provider SDKs is often the best fit. If the workflow has clear deterministic stages, LangGraph or a state machine you build yourself fits better.
How many providers do you need? If you are committed to Claude long-term, the Claude Agent SDK gives you the deepest integration. If you need provider-agnostic from day one, Vercel AI SDK, LangGraph, or custom with a thin abstraction are better.
What is your team's existing stack? If you are already on LangChain, LangGraph is a small step. If you are on Next.js and want streaming UI, Vercel AI SDK fits. If you are deep in Microsoft tooling, AutoGen and the Microsoft Agent Framework are natural.
What is your observability story? LangSmith for LangChain ecosystem. Helicone or Langfuse for provider-agnostic. Datadog if you are enterprise on Datadog. OpenTelemetry-based for the most future-proof option. Pick observability before framework; the framework choice should not force a separate observability silo.
What is your lock-in tolerance? Heavier frameworks lock you into their abstractions. Migration costs are real. If you expect the platform to live for years and the agentic landscape to keep shifting, lean toward thinner abstractions and direct provider SDKs.

Decision matrix shortcut

| Situation | Likely best fit | | --- | --- | | Web app with streaming chat, mixed providers | Vercel AI SDK | | Deep Claude integration, MCP-heavy | Claude Agent SDK | | Complex multi-stage workflow with explicit state | LangGraph | | Role-based multi-agent prototype, fast onboarding | CrewAI | | Multi-agent conversation, Microsoft stack | AutoGen / Semantic Kernel | | Existing platform with strong opinions, max control | Custom | | Compliance-heavy enterprise with strict dependency review | Custom + thin provider SDK wrapper |

Migration risk and exit strategy

Pick frameworks with the assumption you will migrate off them within 2-3 years. The landscape is too volatile to commit forever.

Keep the surface area between framework code and your business logic thin. Define your own types for agent input/output. Define your own tool interface that wraps the framework's. When you migrate, you change one adapter, not your whole codebase.

This is the same discipline that helps with the broader engineering challenges: thin layers, replaceable components, agent observability metrics that work across frameworks because they live above the framework, not inside it.

Selection checklist

[ ] Documented the agent's autonomy level (deterministic stages vs free-form loop)
[ ] Listed all model providers needed now and likely within 12 months
[ ] Reviewed existing team stack and observability commitments
[ ] Estimated lock-in cost of each candidate framework
[ ] Prototyped the same task in two candidates before committing
[ ] Verified production readiness: streaming, retries, observability hooks, deployment patterns
[ ] Checked the last 12 months of breaking changes in the framework's release notes
[ ] Identified the exit path: how would we migrate if we had to?
[ ] Confirmed framework license fits your distribution model
[ ] Wrote down the selection rationale so future engineers know why

Production readiness checks per framework

Each framework has a different definition of "production ready." Some details that distinguish prototype-grade from production-grade:

Streaming support: can the framework stream tokens, tool calls, and intermediate state to the client? If your UX depends on it, this is non-negotiable.
Retries and idempotency: how does the framework handle transient failures? Does it expose enough state to make retries safe?
Concurrency model: can you run many concurrent agent invocations safely within one process? In serverless? Across nodes?
Persistence: can you serialize the agent's state to a database between turns? Restart and resume? This matters for long-running workflows.
Authentication and multi-tenancy: can the framework cleanly separate per-tenant credentials, rate limits, and data?
Deployment patterns: is there a documented path to deploying on your infrastructure (Vercel, AWS, Cloud Run, Kubernetes)? Or are you on your own?

Prototype demos rarely exercise these. Production traffic does, immediately.

The cost of switching frameworks

Migrating between frameworks is rarely a clean port. Specific costs to budget for:

Rewriting tool definitions in the new framework's idioms
Reworking state management and persistence
Replacing observability integrations
Retraining the team
Re-running evals to confirm the new implementation matches the old
Carrying both implementations in parallel during transition

A realistic migration for a mature agent platform is 3-6 months of dedicated engineering. That cost is the strongest argument for thin abstractions: when the day comes that you need to switch, the migration should be measured in weeks, not quarters.

Framework risk red flags

A short list of signals that a framework may not be a safe long-term bet:

Breaking changes in every minor release with thin migration guides
Maintainer attention shifting elsewhere (new project, new company)
Documentation that lags many releases behind the code
Eval and observability stories that are bolted on, not first-class
A community that is mostly demos, not production case studies
Pricing or license terms that change in ways unfavorable to commercial users

Even mature frameworks can hit these. Re-evaluate your choice every 12 months. Treat the framework decision as a renewable commitment, not a permanent one.

Hybrid approaches

You do not have to pick one framework for everything. Many production teams run a hybrid:

LangGraph or a custom state machine for the orchestration layer
Claude Agent SDK or Vercel AI SDK for individual model interactions and streaming
MCP servers for tool integrations, framework-agnostic
Direct provider SDKs for cost-sensitive batch workloads

The hybrid works because each layer is replaceable. The framework you use for orchestration is decoupled from the framework you use for streaming and the protocol you use for tools.

Team and hiring considerations

Framework choice has a hiring dimension. If your framework has a small community, you will struggle to hire engineers familiar with it. If it has a huge community but quality is uneven, you will get inconsistent contributions. LangChain and Vercel AI SDK have the largest communities today. Claude Agent SDK and AutoGen are growing fast. CrewAI sits in the middle.

For internal team development, pick a framework with documentation good enough that a new hire can ship something useful in week one. If the framework requires weeks of ramp-up before productivity, it is a tax on every future hire.

Next steps

If your team is about to commit to a framework that will shape your agent platform for the next few years, this is the right week to slow down and prototype two candidates side by side. We help teams make framework decisions with a clear-eyed view of cost, lock-in, and migration risk. Reach out if you want a second opinion before you commit.

View All Insights