Subagents: Native Delegation Support¶
Status: Production Ready (v0.x.x+)
Subagents enable true multi-agent coordination where a coordinator agent delegates tasks to specialist agents while preserving full metadata (tokens, duration, delegation chain) for accurate cost tracking and observability.
Overview¶
The Solution¶
Native subagents= parameter preserves Response metadata through executor interception:
# ✅ New approach - preserves metadata
specialist = Agent(name="specialist", model="gpt-5.4")
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={"specialist": specialist}, # Full Response[T] preserved
)
response = await coordinator.run("Analyze this data")
# Token count includes coordinator + specialist ✅
print(f"Total tokens: {response.metadata.tokens.total_tokens}")
print(f"Subagent calls: {len(response.subagent_responses)}")
How It Works¶
- LLM Perspective: Subagents appear as regular tools with a
taskparameter - Executor Interception: Executor detects subagent tools and routes to SubagentRegistry
- Metadata Preservation: Full
Response[T]objects are cached, not just strings - Automatic Aggregation: Tokens, duration, and delegation chain are aggregated automatically
Key Principle: Zero LLM behavior changes - uses existing tool calling mechanism.
Quick Start¶
Basic Example¶
from cogent import Agent
# Create specialist agents
data_analyst = Agent(
name="data_analyst",
model="gpt-5.4-mini",
instructions="Analyze data and provide statistical insights.",
)
market_researcher = Agent(
name="market_researcher",
model="gpt-5.4-mini",
instructions="Research market trends and competitive landscape.",
)
# Create coordinator with subagents
coordinator = Agent(
name="coordinator",
model="gpt-5.4-mini",
instructions="""Coordinate research tasks:
- Use data_analyst for numerical analysis
- Use market_researcher for market trends
Synthesize their findings.""",
# Simply pass the agents - their names become tool names
subagents=[data_analyst, market_researcher],
)
# Run task - coordinator will delegate automatically
response = await coordinator.run(
"Analyze Q4 2025 e-commerce growth: 18% YoY to $1.2T globally, "
"mobile is 65% of total. What are the key insights?"
)
print(response.content)
print(f"Total tokens: {response.metadata.tokens.total_tokens}")
Note: You can also use a dict to override tool names:
# Dict form: explicit tool names (optional)
subagents={
"data_analyst": data_analyst, # Tool name = "data_analyst"
"market_researcher": market_researcher, # Tool name = "market_researcher"
}
# List form: uses agent.name (simpler!)
subagents=[data_analyst, market_researcher] # Uses agent names automatically
Structured Output from Subagents¶
Use returns= on a subagent to declare a structured output schema. The coordinator's LLM receives clean JSON it can reason over directly, instead of a plain string.
from pydantic import BaseModel
from typing import Literal
class ReviewScore(BaseModel):
score: int
verdict: Literal["approved", "needs_revision"]
feedback: str
reviewer = Agent(
name="reviewer",
model="gpt-5.4-mini",
description="Review copy for quality and compliance",
returns=ReviewScore,
instructions="Review content. Score 1-10. Be concise.",
)
editor = Agent(
name="editor",
model="gpt-5.4-mini",
subagents=[reviewer],
instructions="Have the reviewer check the copy.",
)
# Coordinator LLM sees: {"score": 8, "verdict": "approved", "feedback": "..."}
result = await editor.run("Review this tweet")
See Structured Output — Subagent Structured Output for the full details — how the pipeline works, remote A2A round-trips, and when to use returns=.
Accessing Metadata¶
# Token aggregation (coordinator + all subagents)
tokens = response.metadata.tokens
print(f"Total tokens: {tokens.total_tokens}")
print(f" Prompt: {tokens.prompt_tokens}")
print(f" Completion: {tokens.completion_tokens}")
if tokens.reasoning_tokens:
print(f" Reasoning: {tokens.reasoning_tokens}")
# Individual subagent responses
for sub_resp in response.subagent_responses:
print(f"{sub_resp.metadata.agent}: {sub_resp.metadata.tokens.total_tokens} tokens")
if sub_resp.metadata.tokens.reasoning_tokens:
print(f" └─ Reasoning: {sub_resp.metadata.tokens.reasoning_tokens}")
# Delegation chain
for delegation in response.metadata.delegation_chain:
print(f"{delegation['agent']} - {delegation['tokens']} tokens - {delegation['duration']:.2f}s")
API Reference¶
Agent Constructor¶
Agent(
name: str,
model: str | BaseChatModel,
subagents: dict[str, Agent] | Sequence[Agent] | None = None,
**kwargs
)
Parameters:
- subagents: Local agents or remote A2A adapters for delegation
- dict form: Explicit tool names {"tool_name": agent}
- list/tuple form: Uses agent.config.name as tool name [agent1, remote_agent]
Examples:
# Local agents only
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents=[analyst_agent, researcher_agent],
)
# Mix local and remote agents
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents=[
local_analyst,
Agent(
url="http://remote-svc/a2a",
name="remote_writer",
description="Remote writing specialist",
),
],
)
# Dict form - override tool names if needed
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={
"custom_analyst_name": analyst_agent,
"custom_researcher_name": researcher_agent,
},
)
Remote Agents¶
Agent(
url: str,
name: str,
description: str = "",
timeout: float = 60.0,
headers: dict[str, str] | None = None,
returns: type | dict | None = None,
)
Creates a remote A2A agent proxy that can be passed to subagents= alongside local agents.
returns= accepts the same schemas as a local Agent (Pydantic model, dataclass, TypedDict, JSON Schema dict). When provided, the text response from the remote agent is parsed and validated before being returned to the caller. The per-call returns= kwarg on .run() overrides the instance-level value.
A2AServer¶
A2AServer(
agent: Agent,
*,
host: str = "0.0.0.0",
port: int = 10000,
url: str | None = None, # public base URL for AgentCard; defaults to http://{host}:{port}
version: str = "1.0",
skills: list[AgentSkill] | None = None,
task_store: str | TaskStore | None = None, # persistence backend
streaming: bool = True, # emit incremental SSE token events
push_notifications: bool = False, # enable webhook push-notification callbacks
security_schemes: dict[str, SecurityScheme] | None = None, # AgentCard auth schemes
security: list[dict[str, list[str]]] | None = None, # AgentCard security requirements
)
| Member | Description |
|---|---|
.app |
FastAPI ASGI application (cached, safe to mount) |
.agent_card() |
Build and return the a2a.types.AgentCard |
await .start() |
Start in background (default); returns self when port is bound. Pass background=False to block until stopped |
await .stop() |
Stop a background server |
.run() |
asyncio.run(self.start(background=False)) — for scripts |
await A2AServer.start_many(...) |
Start multiple servers concurrently; returns a ServerGroup |
async with A2AServer(...) as srv: |
Start in background, stop on exit — context-manager style |
task_store options:
| Value | Behaviour |
|---|---|
None (default) |
InMemoryTaskStore — tasks lost on restart |
"sqlite+aiosqlite:///tasks.db" |
SQLite via aiosqlite; schema auto-created |
"postgresql+asyncpg://user:pass@host/db" |
PostgreSQL via asyncpg |
TaskStore instance |
Bring your own implementation |
Requires aiosqlite or asyncpg for database URLs.
push_notifications — webhook callbacks:
When push_notifications=True, the server enables the A2A push-notification protocol.
Clients may register a callback URL via tasks/pushNotification/set; the server will
POST task-status updates to that URL as the task progresses.
security_schemes and security — AgentCard authentication:
Declare authentication requirements in the AgentCard following OpenAPI 3.0 conventions.
Clients inspect the card to discover how to authenticate before sending tasks.
from a2a.types import APIKeySecurityScheme, SecurityScheme
server = A2AServer(
agent,
port=10002,
security_schemes={
"api-key": SecurityScheme(root=APIKeySecurityScheme(name="X-API-Key", in_="header"))
},
security=[{"api-key": []}],
)
The Agent(url=...) client side already supports headers= for passing auth tokens:
serve_agent¶
async def serve_agent(
agent: Agent,
*,
host: str = "0.0.0.0",
port: int = 10000,
url: str | None = None,
version: str = "1.0",
task_store: str | TaskStore | None = None,
streaming: bool = True,
push_notifications: bool = False,
security_schemes: dict[str, SecurityScheme] | None = None,
security: list[dict[str, list[str]]] | None = None,
) -> None
Async convenience wrapper around A2AServer.start().
Agent.serve¶
agent.serve(
*,
host: str = "0.0.0.0",
port: int = 10000,
url: str | None = None,
version: str = "1.0",
) -> None
Blocking one-liner for scripts. Internally creates A2AServer(self, ...).run().
Response Metadata¶
@dataclass
class Response[T]:
content: T
metadata: ResponseMetadata
subagent_responses: list[Response] | None # NEW: Responses from delegated subagents
# ... other fields
@dataclass
class ResponseMetadata:
agent: str
model: str
tokens: TokenUsage
duration: float
delegation_chain: list[dict] | None # NEW: Chain of delegations
# ... other fields
Delegation Chain Structure:
{
"agent": "analyst", # Subagent name
"model": "gpt-5.4-mini", # Model used
"tokens": 150, # Total tokens
"duration": 2.5, # Seconds
}
Serving an Agent over A2A¶
To expose any Agent as a remote A2A endpoint, use one of three entry points depending
on how much control you need.
One-liner (scripts and demos):
agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
agent.serve(port=10002) # blocks until Ctrl+C
Async entrypoint (integrates into an existing async main):
from cogent.agent.a2a_server import serve_agent
async def main():
agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
await serve_agent(agent, port=10002) # blocks until Ctrl+C
Mount into an existing FastAPI app (low-level):
from fastapi import FastAPI
from cogent.agent.a2a_server import A2AServer
import uvicorn
agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
server = A2AServer(agent, port=10002, url="http://localhost:10002/a2a")
app = FastAPI()
@app.get("/health")
def health():
return {"status": "ok"}
app.mount("/a2a", server.app)
uvicorn.run(app, host="0.0.0.0", port=10002)
Once running, call it from anywhere with Agent(url=...):
from cogent import Agent
remote = Agent(url="http://localhost:10002", name="analyst")
coordinator = Agent(name="coordinator", model="gpt-5.4", subagents=[remote])
Best Practices¶
1. Clear Agent Responsibilities¶
# ✅ GOOD: Specific, non-overlapping responsibilities
data_cleaner = Agent(
name="data_cleaner",
instructions="Clean and normalize messy data. Fix formatting, handle nulls.",
)
data_validator = Agent(
name="data_validator",
instructions="Validate data quality. Check for errors, inconsistencies.",
)
# ❌ BAD: Overlapping, vague responsibilities
helper1 = Agent(name="helper1", instructions="Help with data stuff")
helper2 = Agent(name="helper2", instructions="Also help with data")
2. Descriptive Naming¶
# ✅ GOOD: Names that indicate purpose
subagents={
"sql_generator": sql_agent,
"data_visualizer": viz_agent,
"report_writer": report_agent,
}
# ❌ BAD: Generic names
subagents={
"agent1": sql_agent,
"helper": viz_agent,
"assistant": report_agent,
}
3. Coordinator Instructions¶
# ✅ GOOD: Explicit delegation guidelines
coordinator = Agent(
instructions="""You coordinate ETL tasks:
- Use data_analyst to understand CSV structure and issues
- Use data_cleaner to design transformation rules
- Use sql_generator to create database schema
Synthesize their work into a complete ETL plan.""",
subagents={...},
)
# ❌ BAD: Vague instructions
coordinator = Agent(
instructions="You have some helpers. Use them if you want.",
subagents={...},
)
4. Observability¶
Attach an Observer to see delegation flow:
from cogent import Agent
from cogent.observability import Observer
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={...},
observer=Observer(),
)
The observer emits subagent-specific events for both streaming and non-streaming execution:
| Console label | Event | When |
|---|---|---|
[subagent-decision] |
llm.tool_decision |
LLM chose to delegate |
[subagent-call] |
subagent.called |
Delegation started |
[subagent-result] |
subagent.result |
Delegation completed |
[subagent-context] |
subagent.context |
Conversation summary injected |
[subagent-error] |
subagent.error |
Delegation failed |
5. Context Propagation¶
Context automatically propagates through delegation:
from cogent import RunContext
ctx = RunContext(
thread_id="session-123",
user_id="user-456",
metadata={"department": "analytics"},
)
response = await coordinator.run("Analyze data", context=ctx)
# All subagents receive the same context automatically
Advanced Patterns¶
Nested Subagents¶
Subagents can have their own subagents:
# Specialist with sub-specialists
data_analyst = Agent(
name="data_analyst",
model="gpt-5.4",
subagents={
"statistician": statistician_agent,
"visualizer": viz_agent,
},
)
# Top-level coordinator
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={
"data_analyst": data_analyst, # Has its own subagents
"report_writer": writer_agent,
},
)
Remote Agents via A2A¶
Agent(url=...) wraps any Agent2Agent (A2A) protocol endpoint so it
participates in delegation exactly like a local agent. Requires the a2a extra:
from cogent import Agent
remote_analyst = Agent(
url="http://analyst-service/a2a",
name="analyst",
description="Remote financial analyst running on a separate service",
)
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents=[
local_writer, # in-process Agent
remote_analyst, # remote A2A server
],
)
response = await coordinator.run("Analyse Q4 results and write a report")
The LLM sees both as regular tools — no special instructions needed. RunContext is not
forwarded over the wire; each remote call carries the task text and any structured data
or file references attached via Message.
Structured Messages¶
Agent.run() accepts a plain string or structured data via data= and
files= keyword arguments. Both local and remote subagents receive the full
message.
from cogent import Agent, FilePart
# Plain string — backward compatible
await agent.run("Summarise Q4 results")
# Text + structured data
await agent.run("Analyse trends", data={"region": "EMEA", "year": 2026})
# Text + file reference
await agent.run(
"Review this document",
files=[FilePart(uri="s3://bucket/report.pdf", mime_type="application/pdf")],
)
# Text + data + files together
await agent.run("Process report", data={"format": "summary"}, files=[...])
Internally these are wrapped into a Message envelope — the transport type that
flows through the executor, registry, and A2A wire. You can also construct a
Message directly for advanced multi-part payloads:
from cogent import Message, TextPart, DataPart, FilePart
await agent.run(Message(parts=[
TextPart("Analyse this document"),
DataPart({"format": "summary", "max_length": 500}),
FilePart(uri="s3://bucket/report.pdf", mime_type="application/pdf"),
]))
When delegation happens via subagent tools the LLM can pass an optional data dict
alongside the task string. The executor wraps both into a Message before
calling the subagent:
# LLM tool call (automatic, no user code needed):
# {"task": "Analyse Q4 trends", "data": {"region": "EMEA"}}
# → SubagentRegistry receives Message("Analyse Q4 trends", data={"region": "EMEA"})
For remote agents the Message parts map 1:1 to A2A protocol parts
(TextPart, DataPart, FilePart) so structured data travels over the wire
without extra serialisation.
Input Contracts¶
The accepts parameter declares what structured input a subagent expects — symmetric
to returns which declares the output. When set, two things happen automatically:
- LLM guidance — the subagent tool's
datafield description includes the full JSON Schema so the coordinating LLM knows exactly what to send. - Validation — when the agent receives a
Messagewith data, the payload is validated against the schema before execution begins. Invalid data raisesValueErrorimmediately instead of producing a confusing LLM error.
from pydantic import BaseModel
from cogent import Agent
class SalesReport(BaseModel):
quarter: str
revenue: float
currency: str = "USD"
analyst = Agent(
name="analyst",
model="gpt-4o",
instructions="You are a financial analyst.",
accepts=SalesReport, # declares expected input
returns=AnalysisSummary, # declares expected output
)
# Direct call — data validated against SalesReport before the agent runs
await analyst.run("Analyse trends", data={"quarter": "Q1", "revenue": 120.5})
# As a subagent — the LLM sees the SalesReport schema in the tool definition
coordinator = Agent(
name="coordinator",
model="gpt-4o",
subagents=[analyst],
)
Both type (Pydantic model, dataclass, TypedDict) and dict (raw JSON Schema)
are supported. Runtime validation uses pydantic.TypeAdapter for type-based schemas;
raw dict schemas guide the LLM via the tool description but skip runtime validation.
Selective Context Forwarding¶
When a coordinator delegates to a subagent, context flows automatically:
- Correlation IDs — every delegation generates a fresh
run_idfor the child and setsparent_run_idto the caller'srun_id. This applies to both local and remote agents, enabling distributed trace stitching. - Metadata — the
metadatadict is shared by reference between parent and child. Mutations in the subagent are visible to the coordinator after the call completes. - Remote agents — correlation metadata (
run_id,parent_run_id) is sent in the A2A messagemetadatafield so remote services can link back to the caller.
Interceptors and Delegation¶
The PRE_ACT / POST_ACT interceptor phases fire for subagent calls just like regular
tool calls. The InterceptContext.is_subagent flag lets you distinguish them:
from cogent.interceptors.base import Interceptor, InterceptResult, InterceptContext
class DelegationLogger(Interceptor):
async def intercept(self, ctx: InterceptContext) -> InterceptResult:
if ctx.is_subagent and ctx.phase.value == "pre_act":
print(f"Delegating to {ctx.tool_name}: {ctx.tool_args}")
return InterceptResult.ok()
class ContextEnricher(Interceptor):
"""Attach domain context to subagent calls via modified tool args."""
async def intercept(self, ctx: InterceptContext) -> InterceptResult:
if ctx.is_subagent and ctx.phase.value == "pre_act":
args = dict(ctx.tool_args or {})
# Inject extra data the LLM didn't provide
args["data"] = '{"tenant_id": "acme-corp"}'
return InterceptResult.modify_args(args)
return InterceptResult.ok()
Interceptors can also block delegation entirely:
class DelegationGate(Interceptor):
async def intercept(self, ctx: InterceptContext) -> InterceptResult:
if ctx.is_subagent and ctx.tool_name == "expensive_agent":
return InterceptResult.stop("Delegation blocked by policy")
return InterceptResult.ok()
Agent Discovery¶
The AgentDirectory lets coordinators discover subagents at runtime by skill
tags instead of hard-coding agent lists.
Tags¶
Agents declare tags that describe their capabilities:
from cogent import Agent
analyst = Agent(
name="analyst",
model="gpt-4o",
tags=["data", "finance", "analytics"],
description="Analyses financial data and trends",
)
Tags are also embedded in A2A Agent Cards when serving agents over HTTP.
AgentDirectory¶
Register agents in a directory, then query by tag, name, or description:
from cogent import Agent, AgentDirectory
directory = AgentDirectory()
directory.register(
Agent(name="analyst", model="gpt-4o", tags=["data", "finance"]),
Agent(name="writer", model="gpt-4o", tags=["content", "writing"]),
Agent(name="data_engineer", model="gpt-4o", tags=["data", "engineering"]),
)
# Find by tag (OR within tags)
data_agents = directory.find(tags=["data"]) # → [analyst, data_engineer]
# Combine criteria (AND across, OR within tags)
directory.find(tags=["data"], name="engineer") # → [data_engineer]
# Search by description keyword
directory.find(description="financial") # → [analyst]
Dynamic Team Composition¶
Pass find() results directly to subagents=:
# Coordinator assembles its team from the directory at runtime
coordinator = Agent(
name="coordinator",
model="gpt-4o",
subagents=directory.find(tags=["data"]),
)
response = await coordinator.run("Analyse Q4 revenue trends")
This enables scenarios where different coordinators discover different specialist teams based on the task requirements, without coupling to specific agent instances.
Serving a Cogent Agent via A2A¶
Any cogent agent can be exposed as an A2A HTTP endpoint so that external A2A clients
(including other Agent(url=...) instances) can call it. Requires the same a2a extra.
High-level — recommended for scripts and __main__ blocks:
agent = Agent(
name="analyst",
model="gpt-5.4",
instructions="You are a financial analyst.",
description="Financial data analyst",
)
agent.serve(port=10002) # blocks until Ctrl+C
Mid-level — for use inside an existing async entrypoint:
from cogent.agent.a2a_server import serve_agent
async def main():
await serve_agent(agent, port=10002)
asyncio.run(main())
Low-level — mount into an existing FastAPI application:
from fastapi import FastAPI
from cogent.agent.a2a_server import A2AServer
import uvicorn
app = FastAPI()
server = A2AServer(agent, port=10002, url="http://myhost:10002/a2a")
app.mount("/a2a", server.app) # A2AServer.app is a FastAPI sub-application
uvicorn.run(app, port=10002)
Once served, any A2A client — another cogent system, a different framework, or a raw
HTTP client — can call the agent at http://host:port/ and discover its capabilities
at http://host:port/.well-known/agent.json.
In-process — self-contained scripts and tests:
start() launches the server as a background task and returns as soon as the port is
bound. Call stop() when you are done. async with is also supported and stops the
server automatically on block exit.
from cogent.agent.a2a_server import A2AServer
# Explicit start / stop
analyst_server = await A2AServer(agent, port=10099).start()
remote = Agent(url="http://localhost:10099", name="analyst")
response = await remote.run("What is 18% of 250?")
print(response.content)
await analyst_server.stop()
Multiple servers — flat, no nesting:
from cogent.agent.a2a_server import A2AServer
group = await A2AServer.start_many(
(analyst, 10001),
(researcher, 10002),
(writer, 10003),
)
response = await coordinator.run("...")
await group.stop_all()
Context-manager style — automatic cleanup:
from cogent.agent.a2a_server import A2AServer
async with A2AServer(agent, port=10099) as server:
remote = Agent(url="http://localhost:10099", name="analyst")
response = await remote.run("What is 18% of 250?")
print(response.content)
# Server stopped automatically on exit.
Conditional Delegation¶
The LLM decides when to delegate:
coordinator = Agent(
instructions="""Analyze requests:
- For simple questions, answer directly
- For complex analysis, delegate to data_analyst
- For market research, delegate to market_researcher
Use your judgment on which tasks need specialist help.""",
subagents={
"data_analyst": analyst,
"market_researcher": researcher,
},
)
# LLM may or may not delegate based on complexity
response1 = await coordinator.run("What is 2+2?") # Answers directly
response2 = await coordinator.run("Analyze Q4 sales trends") # Delegates to analyst
Multi-Agent Streaming¶
When a coordinator runs with stream=True, subagent responses are streamed
through the coordinator so consumers see tokens in real-time — no silence during
delegation.
Each StreamChunk carries an agent field identifying which agent produced
the token, and a type field for lifecycle events:
coordinator = Agent(
name="coordinator",
model="gpt-4o",
subagents=[analyst],
)
async for chunk in coordinator.run("Analyse Q4 data", stream=True):
if chunk.type == "subagent_start":
print(f"\n[{chunk.agent} started]")
elif chunk.type == "subagent_end":
print(f"\n[{chunk.agent} finished]")
elif chunk.type == "content":
# chunk.agent tells you who's speaking
print(chunk.content, end="", flush=True)
Chunk types:
type |
Meaning |
|---|---|
"content" |
Regular content token (default) |
"subagent_start" |
Subagent delegation began |
"subagent_end" |
Subagent delegation finished |
"tool_result" |
Tool execution result |
How it works:
- Coordinator streams its own tokens with
agent="coordinator" - When the LLM decides to delegate, a
subagent_startchunk is yielded - If the subagent supports streaming, its tokens flow through with
agent="analyst"(or whatever the subagent name is) - A
subagent_endchunk signals completion - The coordinator continues with its final synthesis
This lets UIs show multiple agents working in real-time, attribute tokens to sources, and render side-by-side panels for parallel agents.
Observability: The streaming path emits the same subagent.called,
subagent.result, and subagent.context events as the non-streaming executor,
so the Observer console output is consistent regardless of execution mode.
Conversation Context Summarization¶
When a coordinator delegates to a subagent, the subagent only receives the task text — not the coordinator's full conversation history. For multi-turn conversations this can lose important context.
Enable delegate_summary to automatically attach a compressed briefing from
the coordinator's conversation:
from cogent import Agent
# Default heuristic (fast, free, no LLM call)
coordinator = Agent(
name="coordinator",
model="gpt-4o",
subagents=[analyst],
delegate_summary=True,
)
The summarizer extracts recent user/assistant turns and prepends them to the
delegated task inside <delegation_context> tags.
Built-in strategies:
| Strategy | How it works | Cost |
|---|---|---|
HeuristicSummarizer (default) |
Extracts last N turns, truncates to char limit | Free |
LLMSummarizer |
Asks a model for a 1-paragraph briefing | Tokens |
from cogent import Agent, HeuristicSummarizer, LLMSummarizer
# Custom heuristic
coordinator = Agent(
name="coordinator",
model="gpt-4o",
subagents=[analyst],
delegate_summary=HeuristicSummarizer(max_turns=3, max_chars=1000),
)
# LLM-powered (higher quality, costs tokens)
from cogent.models import ChatModel
summary_model = ChatModel(model="gpt-4o-mini")
coordinator = Agent(
name="coordinator",
model="gpt-4o",
subagents=[analyst],
delegate_summary=LLMSummarizer(model=summary_model),
)
You can also implement the ContextSummarizer protocol for custom strategies:
from cogent.agent.summarizer import ContextSummarizer
from cogent.core.messages import BaseMessage
class MyCustomSummarizer:
async def summarize(
self, messages: list[BaseMessage], task: str
) -> str | None:
# Custom logic — return a string or None to skip
...
Mixed Tools and Subagents¶
Subagents and regular tools work together:
from cogent import tool
@tool
def search_database(query: str) -> str:
"""Search internal database."""
return database.search(query)
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
tools=[search_database], # Regular tool
subagents={
"analyst": analyst_agent, # Subagent
},
)
# LLM can use both:
# 1. Call search_database tool → get data
# 2. Delegate to analyst → analyze data
Troubleshooting¶
Subagent not being called¶
Problem: LLM ignores subagent tools
Solutions: - Make coordinator instructions explicit about delegation - Use descriptive subagent names (e.g., "data_analyst" not "helper") - Add descriptions to subagent Agent configs
specialist = Agent(
name="specialist",
model="gpt-5.4",
description="Expert in data analysis and statistics", # Helps LLM understand when to call
)
Token counts seem wrong¶
Problem: Tokens don't match expectations
Debug:
tokens = response.metadata.tokens
print(f"Coordinator total: {tokens.total_tokens}")
print(f" Prompt: {tokens.prompt_tokens}")
print(f" Completion: {tokens.completion_tokens}")
if tokens.reasoning_tokens:
print(f" Reasoning: {tokens.reasoning_tokens}")
for sub in response.subagent_responses:
sub_tokens = sub.metadata.tokens
print(f"{sub.metadata.agent}: {sub_tokens.total_tokens} tokens")
if sub_tokens.reasoning_tokens:
print(f" └─ Reasoning: {sub_tokens.reasoning_tokens}")
print(f"Delegation chain: {response.metadata.delegation_chain}")
Note: Token counts include prompt + completion + reasoning (when available). All categories are aggregated across coordinator and all subagents.
Subagent errors¶
Problem: Subagent fails during execution
Behavior: Error returned to coordinator as tool result, LLM can retry or handle
# LLM sees error message and can:
# 1. Retry with different parameters
# 2. Try different subagent
# 3. Handle error in response
Performance Considerations¶
Memory Usage¶
Each subagent maintains its own conversation history if conversation=True:
# ✅ GOOD: Disable conversation for stateless subagents
data_cleaner = Agent(
name="data_cleaner",
model="gpt-5.4-mini",
conversation=False, # Saves memory
)
# ❌ BAD: Unnecessary conversation history
data_cleaner = Agent(
name="data_cleaner",
model="gpt-5.4-mini",
conversation=True, # Wastes memory if not needed
)
Parallel Execution¶
Subagents execute in parallel when LLM calls multiple at once:
# LLM decides to call both in one turn
# → Both execute in parallel automatically
# → Results returned together
Model Selection¶
Use appropriate models for each role:
# ✅ GOOD: Match model to task complexity
coordinator = Agent(
name="coordinator",
model="gpt-5.4", # Complex orchestration
subagents={
"summarizer": Agent(model="gpt-5.4-mini"), # Simple task
"analyst": Agent(model="gpt-5.4"), # Complex analysis
},
)
Examples¶
See:
- examples/subagent/simple_delegation.py — Minimal delegation
- examples/a2a/delegation.py — Remote agents alongside local agents
- examples/a2a/showcase.py — Discovery, streaming, summarization, and context forwarding
- examples/a2a/input_contracts.py — Input contracts with accepts=
- examples/a2a/data_forwarding.py — Automatic data forwarding
- examples/a2a/structured_messages.py — Structured data between agents
Comparison: Single Agent vs Multi-Agent¶
| Aspect | Single Agent | Multi-Agent (Subagents) |
|---|---|---|
| Memory | Full memory architecture | Must share via RunContext |
| Complexity | Simple, one config | More setup, multiple configs |
| Specialization | Generalist approach | Focused specialists |
| Token cost | Usually lower | Higher (multiple calls) |
| Observability | One agent trace | Full delegation chain |
| Best for | Linear workflows | Complex coordination |
Rule of Thumb: Start with a single agent. Add subagents when you need: - Clear specialist roles (SQL expert, data cleaner, etc.) - Separation of concerns (analysis vs presentation) - Delegated decision-making (coordinator decides who handles what)
FAQ¶
Q: Can subagents have different models?
A: Yes! Each agent can use a different model.
Q: Do subagents share conversation history?
A: No. Each agent has its own conversation if conversation=True. Use RunContext to share state.
Q: Can I mix subagents= and tools=?
A: Yes! They work together seamlessly.
Q: Are token counts accurate?
A: Yes - coordinator + all subagent tokens are aggregated automatically.
Q: Can subagents call each other?
A: Not directly. But nested subagents work (subagent has its own subagents).
Q: What if a subagent fails?
A: Error is returned to coordinator as a tool result. The LLM can handle it.
Q: How deep can I nest?
A: No hard limit, but 2-3 levels max is recommended for clarity.
Q: Does this work with all models?
A: Yes - any model that supports tool calling (OpenAI, Anthropic, Gemini, etc.)