Subagents: Native Delegation Support¶

Status: Production Ready (v0.x.x+)

Subagents enable true multi-agent coordination where a coordinator agent delegates tasks to specialist agents while preserving full metadata (tokens, duration, delegation chain) for accurate cost tracking and observability.

Overview¶

The Solution¶

Native subagents= parameter preserves Response metadata through executor interception:

# ✅ New approach - preserves metadata
specialist = Agent(name="specialist", model="gpt-5.4")
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents={"specialist": specialist},  # Full Response[T] preserved
)

response = await coordinator.run("Analyze this data")
# Token count includes coordinator + specialist ✅
print(f"Total tokens: {response.metadata.tokens.total_tokens}")
print(f"Subagent calls: {len(response.subagent_responses)}")

How It Works¶

LLM Perspective: Subagents appear as regular tools with a task parameter
Executor Interception: Executor detects subagent tools and routes to SubagentRegistry
Metadata Preservation: Full Response[T] objects are cached, not just strings
Automatic Aggregation: Tokens, duration, and delegation chain are aggregated automatically

Key Principle: Zero LLM behavior changes - uses existing tool calling mechanism.

Quick Start¶

Basic Example¶

from cogent import Agent

# Create specialist agents
data_analyst = Agent(
    name="data_analyst",
    model="gpt-5.4-mini",
    instructions="Analyze data and provide statistical insights.",
)

market_researcher = Agent(
    name="market_researcher",
    model="gpt-5.4-mini",
    instructions="Research market trends and competitive landscape.",
)

# Create coordinator with subagents
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4-mini",
    instructions="""Coordinate research tasks:
- Use data_analyst for numerical analysis
- Use market_researcher for market trends
Synthesize their findings.""",
    # Simply pass the agents - their names become tool names
    subagents=[data_analyst, market_researcher],
)

# Run task - coordinator will delegate automatically
response = await coordinator.run(
    "Analyze Q4 2025 e-commerce growth: 18% YoY to $1.2T globally, "
    "mobile is 65% of total. What are the key insights?"
)

print(response.content)
print(f"Total tokens: {response.metadata.tokens.total_tokens}")

Note: You can also use a dict to override tool names:

# Dict form: explicit tool names (optional)
subagents={
    "data_analyst": data_analyst,      # Tool name = "data_analyst"
    "market_researcher": market_researcher,  # Tool name = "market_researcher"
}

# List form: uses agent.name (simpler!)
subagents=[data_analyst, market_researcher]  # Uses agent names automatically

Structured Output from Subagents¶

Use returns= on a subagent to declare a structured output schema. The coordinator's LLM receives clean JSON it can reason over directly, instead of a plain string.

from pydantic import BaseModel
from typing import Literal

class ReviewScore(BaseModel):
    score: int
    verdict: Literal["approved", "needs_revision"]
    feedback: str

reviewer = Agent(
    name="reviewer",
    model="gpt-5.4-mini",
    description="Review copy for quality and compliance",
    returns=ReviewScore,
    instructions="Review content. Score 1-10. Be concise.",
)

editor = Agent(
    name="editor",
    model="gpt-5.4-mini",
    subagents=[reviewer],
    instructions="Have the reviewer check the copy.",
)

# Coordinator LLM sees: {"score": 8, "verdict": "approved", "feedback": "..."}
result = await editor.run("Review this tweet")

See Structured Output — Subagent Structured Output for the full details — how the pipeline works, remote A2A round-trips, and when to use returns=.

Accessing Metadata¶

# Token aggregation (coordinator + all subagents)
tokens = response.metadata.tokens
print(f"Total tokens: {tokens.total_tokens}")
print(f"  Prompt: {tokens.prompt_tokens}")
print(f"  Completion: {tokens.completion_tokens}")
if tokens.reasoning_tokens:
    print(f"  Reasoning: {tokens.reasoning_tokens}")

# Individual subagent responses
for sub_resp in response.subagent_responses:
    print(f"{sub_resp.metadata.agent}: {sub_resp.metadata.tokens.total_tokens} tokens")
    if sub_resp.metadata.tokens.reasoning_tokens:
        print(f"  └─ Reasoning: {sub_resp.metadata.tokens.reasoning_tokens}")

# Delegation chain
for delegation in response.metadata.delegation_chain:
    print(f"{delegation['agent']} - {delegation['tokens']} tokens - {delegation['duration']:.2f}s")

API Reference¶

Agent Constructor¶

Agent(
    name: str,
    model: str | BaseChatModel,
    subagents: dict[str, Agent] | Sequence[Agent] | None = None,
    **kwargs
)

Parameters: - subagents: Local agents or remote A2A adapters for delegation - dict form: Explicit tool names {"tool_name": agent} - list/tuple form: Uses agent.config.name as tool name [agent1, remote_agent]

Examples:

# Local agents only
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents=[analyst_agent, researcher_agent],
)

# Mix local and remote agents
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents=[
        local_analyst,
        Agent(
            url="http://remote-svc/a2a",
            name="remote_writer",
            description="Remote writing specialist",
        ),
    ],
)

# Dict form - override tool names if needed
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents={
        "custom_analyst_name": analyst_agent,
        "custom_researcher_name": researcher_agent,
    },
)

Remote Agents¶

Agent(
    url: str,
    name: str,
    description: str = "",
    timeout: float = 60.0,
    headers: dict[str, str] | None = None,
    returns: type | dict | None = None,
)

Creates a remote A2A agent proxy that can be passed to subagents= alongside local agents.

returns= accepts the same schemas as a local Agent (Pydantic model, dataclass, TypedDict, JSON Schema dict). When provided, the text response from the remote agent is parsed and validated before being returned to the caller. The per-call returns= kwarg on .run() overrides the instance-level value.

A2AServer¶

A2AServer(
    agent: Agent,
    *,
    host: str = "0.0.0.0",
    port: int = 10000,
    url: str | None = None,         # public base URL for AgentCard; defaults to http://{host}:{port}
    version: str = "1.0",
    skills: list[AgentSkill] | None = None,
    task_store: str | TaskStore | None = None,  # persistence backend
    streaming: bool = True,         # emit incremental SSE token events
    push_notifications: bool = False,  # enable webhook push-notification callbacks
    security_schemes: dict[str, SecurityScheme] | None = None,  # AgentCard auth schemes
    security: list[dict[str, list[str]]] | None = None,         # AgentCard security requirements
)

Member	Description
`.app`	FastAPI ASGI application (cached, safe to mount)
`.agent_card()`	Build and return the `a2a.types.AgentCard`
`await .start()`	Start in background (default); returns `self` when port is bound. Pass `background=False` to block until stopped
`await .stop()`	Stop a background server
`.run()`	`asyncio.run(self.start(background=False))` — for scripts
`await A2AServer.start_many(...)`	Start multiple servers concurrently; returns a `ServerGroup`
`async with A2AServer(...) as srv:`	Start in background, stop on exit — context-manager style

task_store options:

Value	Behaviour
`None` (default)	`InMemoryTaskStore` — tasks lost on restart
`"sqlite+aiosqlite:///tasks.db"`	SQLite via aiosqlite; schema auto-created
`"postgresql+asyncpg://user:pass@host/db"`	PostgreSQL via asyncpg
`TaskStore` instance	Bring your own implementation

Requires aiosqlite or asyncpg for database URLs.

push_notifications — webhook callbacks:

When push_notifications=True, the server enables the A2A push-notification protocol. Clients may register a callback URL via tasks/pushNotification/set; the server will POST task-status updates to that URL as the task progresses.

security_schemes and security — AgentCard authentication:

Declare authentication requirements in the AgentCard following OpenAPI 3.0 conventions. Clients inspect the card to discover how to authenticate before sending tasks.

from a2a.types import APIKeySecurityScheme, SecurityScheme

server = A2AServer(
    agent,
    port=10002,
    security_schemes={
        "api-key": SecurityScheme(root=APIKeySecurityScheme(name="X-API-Key", in_="header"))
    },
    security=[{"api-key": []}],
)

The Agent(url=...) client side already supports headers= for passing auth tokens:

remote = Agent(
    url="http://svc/a2a",
    name="analyst",
    headers={"X-API-Key": "secret"},
)

serve_agent¶

async def serve_agent(
    agent: Agent,
    *,
    host: str = "0.0.0.0",
    port: int = 10000,
    url: str | None = None,
    version: str = "1.0",
    task_store: str | TaskStore | None = None,
    streaming: bool = True,
    push_notifications: bool = False,
    security_schemes: dict[str, SecurityScheme] | None = None,
    security: list[dict[str, list[str]]] | None = None,
) -> None

Async convenience wrapper around A2AServer.start().

Agent.serve¶

agent.serve(
    *,
    host: str = "0.0.0.0",
    port: int = 10000,
    url: str | None = None,
    version: str = "1.0",
) -> None

Blocking one-liner for scripts. Internally creates A2AServer(self, ...).run().

Response Metadata¶

@dataclass
class Response[T]:
    content: T
    metadata: ResponseMetadata
    subagent_responses: list[Response] | None  # NEW: Responses from delegated subagents
    # ... other fields

@dataclass
class ResponseMetadata:
    agent: str
    model: str
    tokens: TokenUsage
    duration: float
    delegation_chain: list[dict] | None  # NEW: Chain of delegations
    # ... other fields

Delegation Chain Structure:

{
    "agent": "analyst",           # Subagent name
    "model": "gpt-5.4-mini",       # Model used
    "tokens": 150,                # Total tokens
    "duration": 2.5,              # Seconds
}

Serving an Agent over A2A¶

To expose any Agent as a remote A2A endpoint, use one of three entry points depending on how much control you need.

One-liner (scripts and demos):

agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
agent.serve(port=10002)  # blocks until Ctrl+C

Async entrypoint (integrates into an existing async main):

from cogent.agent.a2a_server import serve_agent

async def main():
    agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
    await serve_agent(agent, port=10002)  # blocks until Ctrl+C

Mount into an existing FastAPI app (low-level):

from fastapi import FastAPI
from cogent.agent.a2a_server import A2AServer
import uvicorn

agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
server = A2AServer(agent, port=10002, url="http://localhost:10002/a2a")

app = FastAPI()

@app.get("/health")
def health():
    return {"status": "ok"}

app.mount("/a2a", server.app)
uvicorn.run(app, host="0.0.0.0", port=10002)

Once running, call it from anywhere with Agent(url=...):

from cogent import Agent

remote = Agent(url="http://localhost:10002", name="analyst")
coordinator = Agent(name="coordinator", model="gpt-5.4", subagents=[remote])

Best Practices¶

1. Clear Agent Responsibilities¶

# ✅ GOOD: Specific, non-overlapping responsibilities
data_cleaner = Agent(
    name="data_cleaner",
    instructions="Clean and normalize messy data. Fix formatting, handle nulls.",
)

data_validator = Agent(
    name="data_validator",
    instructions="Validate data quality. Check for errors, inconsistencies.",
)

# ❌ BAD: Overlapping, vague responsibilities
helper1 = Agent(name="helper1", instructions="Help with data stuff")
helper2 = Agent(name="helper2", instructions="Also help with data")

2. Descriptive Naming¶

# ✅ GOOD: Names that indicate purpose
subagents={
    "sql_generator": sql_agent,
    "data_visualizer": viz_agent,
    "report_writer": report_agent,
}

# ❌ BAD: Generic names
subagents={
    "agent1": sql_agent,
    "helper": viz_agent,
    "assistant": report_agent,
}

3. Coordinator Instructions¶

# ✅ GOOD: Explicit delegation guidelines
coordinator = Agent(
    instructions="""You coordinate ETL tasks:
- Use data_analyst to understand CSV structure and issues
- Use data_cleaner to design transformation rules
- Use sql_generator to create database schema
Synthesize their work into a complete ETL plan.""",
    subagents={...},
)

# ❌ BAD: Vague instructions
coordinator = Agent(
    instructions="You have some helpers. Use them if you want.",
    subagents={...},
)

4. Observability¶

Attach an Observer to see delegation flow:

from cogent import Agent
from cogent.observability import Observer

coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents={...},
    observer=Observer(),
)

The observer emits subagent-specific events for both streaming and non-streaming execution:

Console label	Event	When
`[subagent-decision]`	`llm.tool_decision`	LLM chose to delegate
`[subagent-call]`	`subagent.called`	Delegation started
`[subagent-result]`	`subagent.result`	Delegation completed
`[subagent-context]`	`subagent.context`	Conversation summary injected
`[subagent-error]`	`subagent.error`	Delegation failed

5. Context Propagation¶

Context automatically propagates through delegation:

from cogent import RunContext

ctx = RunContext(
    thread_id="session-123",
    user_id="user-456",
    metadata={"department": "analytics"},
)

response = await coordinator.run("Analyze data", context=ctx)
# All subagents receive the same context automatically

Advanced Patterns¶

Nested Subagents¶

Subagents can have their own subagents:

# Specialist with sub-specialists
data_analyst = Agent(
    name="data_analyst",
    model="gpt-5.4",
    subagents={
        "statistician": statistician_agent,
        "visualizer": viz_agent,
    },
)

# Top-level coordinator
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents={
        "data_analyst": data_analyst,  # Has its own subagents
        "report_writer": writer_agent,
    },
)

Remote Agents via A2A¶

Agent(url=...) wraps any Agent2Agent (A2A) protocol endpoint so it participates in delegation exactly like a local agent. Requires the a2a extra:

uv add "cogent-ai[a2a]"

from cogent import Agent

remote_analyst = Agent(
    url="http://analyst-service/a2a",
    name="analyst",
    description="Remote financial analyst running on a separate service",
)

coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    subagents=[
        local_writer,       # in-process Agent
        remote_analyst,     # remote A2A server
    ],
)

response = await coordinator.run("Analyse Q4 results and write a report")

The LLM sees both as regular tools — no special instructions needed. RunContext is not forwarded over the wire; each remote call carries the task text and any structured data or file references attached via Message.

Structured Messages¶

Agent.run() accepts a plain string or structured data via data= and files= keyword arguments. Both local and remote subagents receive the full message.

from cogent import Agent, FilePart

# Plain string — backward compatible
await agent.run("Summarise Q4 results")

# Text + structured data
await agent.run("Analyse trends", data={"region": "EMEA", "year": 2026})

# Text + file reference
await agent.run(
    "Review this document",
    files=[FilePart(uri="s3://bucket/report.pdf", mime_type="application/pdf")],
)

# Text + data + files together
await agent.run("Process report", data={"format": "summary"}, files=[...])

Internally these are wrapped into a Message envelope — the transport type that flows through the executor, registry, and A2A wire. You can also construct a Message directly for advanced multi-part payloads:

from cogent import Message, TextPart, DataPart, FilePart

await agent.run(Message(parts=[
    TextPart("Analyse this document"),
    DataPart({"format": "summary", "max_length": 500}),
    FilePart(uri="s3://bucket/report.pdf", mime_type="application/pdf"),
]))

When delegation happens via subagent tools the LLM can pass an optional data dict alongside the task string. The executor wraps both into a Message before calling the subagent:

# LLM tool call (automatic, no user code needed):
#   {"task": "Analyse Q4 trends", "data": {"region": "EMEA"}}
# → SubagentRegistry receives Message("Analyse Q4 trends", data={"region": "EMEA"})

For remote agents the Message parts map 1:1 to A2A protocol parts (TextPart, DataPart, FilePart) so structured data travels over the wire without extra serialisation.

Input Contracts¶

The accepts parameter declares what structured input a subagent expects — symmetric to returns which declares the output. When set, two things happen automatically:

LLM guidance — the subagent tool's data field description includes the full JSON Schema so the coordinating LLM knows exactly what to send.
Validation — when the agent receives a Message with data, the payload is validated against the schema before execution begins. Invalid data raises ValueError immediately instead of producing a confusing LLM error.

from pydantic import BaseModel
from cogent import Agent

class SalesReport(BaseModel):
    quarter: str
    revenue: float
    currency: str = "USD"

analyst = Agent(
    name="analyst",
    model="gpt-4o",
    instructions="You are a financial analyst.",
    accepts=SalesReport,    # declares expected input
    returns=AnalysisSummary, # declares expected output
)

# Direct call — data validated against SalesReport before the agent runs
await analyst.run("Analyse trends", data={"quarter": "Q1", "revenue": 120.5})

# As a subagent — the LLM sees the SalesReport schema in the tool definition
coordinator = Agent(
    name="coordinator",
    model="gpt-4o",
    subagents=[analyst],
)

Both type (Pydantic model, dataclass, TypedDict) and dict (raw JSON Schema) are supported. Runtime validation uses pydantic.TypeAdapter for type-based schemas; raw dict schemas guide the LLM via the tool description but skip runtime validation.

Selective Context Forwarding¶

When a coordinator delegates to a subagent, context flows automatically:

Correlation IDs — every delegation generates a fresh run_id for the child and sets parent_run_id to the caller's run_id. This applies to both local and remote agents, enabling distributed trace stitching.
Metadata — the metadata dict is shared by reference between parent and child. Mutations in the subagent are visible to the coordinator after the call completes.
Remote agents — correlation metadata (run_id, parent_run_id) is sent in the A2A message metadata field so remote services can link back to the caller.

Interceptors and Delegation¶

The PRE_ACT / POST_ACT interceptor phases fire for subagent calls just like regular tool calls. The InterceptContext.is_subagent flag lets you distinguish them:

from cogent.interceptors.base import Interceptor, InterceptResult, InterceptContext

class DelegationLogger(Interceptor):
    async def intercept(self, ctx: InterceptContext) -> InterceptResult:
        if ctx.is_subagent and ctx.phase.value == "pre_act":
            print(f"Delegating to {ctx.tool_name}: {ctx.tool_args}")
        return InterceptResult.ok()

class ContextEnricher(Interceptor):
    """Attach domain context to subagent calls via modified tool args."""

    async def intercept(self, ctx: InterceptContext) -> InterceptResult:
        if ctx.is_subagent and ctx.phase.value == "pre_act":
            args = dict(ctx.tool_args or {})
            # Inject extra data the LLM didn't provide
            args["data"] = '{"tenant_id": "acme-corp"}'
            return InterceptResult.modify_args(args)
        return InterceptResult.ok()

Interceptors can also block delegation entirely:

class DelegationGate(Interceptor):
    async def intercept(self, ctx: InterceptContext) -> InterceptResult:
        if ctx.is_subagent and ctx.tool_name == "expensive_agent":
            return InterceptResult.stop("Delegation blocked by policy")
        return InterceptResult.ok()

Agent Discovery¶

The AgentDirectory lets coordinators discover subagents at runtime by skill tags instead of hard-coding agent lists.

Tags¶

Agents declare tags that describe their capabilities:

from cogent import Agent

analyst = Agent(
    name="analyst",
    model="gpt-4o",
    tags=["data", "finance", "analytics"],
    description="Analyses financial data and trends",
)

Tags are also embedded in A2A Agent Cards when serving agents over HTTP.

AgentDirectory¶

Register agents in a directory, then query by tag, name, or description:

from cogent import Agent, AgentDirectory

directory = AgentDirectory()

directory.register(
    Agent(name="analyst", model="gpt-4o", tags=["data", "finance"]),
    Agent(name="writer", model="gpt-4o", tags=["content", "writing"]),
    Agent(name="data_engineer", model="gpt-4o", tags=["data", "engineering"]),
)

# Find by tag (OR within tags)
data_agents = directory.find(tags=["data"])  # → [analyst, data_engineer]

# Combine criteria (AND across, OR within tags)
directory.find(tags=["data"], name="engineer")  # → [data_engineer]

# Search by description keyword
directory.find(description="financial")  # → [analyst]

Dynamic Team Composition¶

Pass find() results directly to subagents=:

# Coordinator assembles its team from the directory at runtime
coordinator = Agent(
    name="coordinator",
    model="gpt-4o",
    subagents=directory.find(tags=["data"]),
)

response = await coordinator.run("Analyse Q4 revenue trends")

This enables scenarios where different coordinators discover different specialist teams based on the task requirements, without coupling to specific agent instances.

Serving a Cogent Agent via A2A¶

Any cogent agent can be exposed as an A2A HTTP endpoint so that external A2A clients (including other Agent(url=...) instances) can call it. Requires the same a2a extra.

High-level — recommended for scripts and __main__ blocks:

agent = Agent(
    name="analyst",
    model="gpt-5.4",
    instructions="You are a financial analyst.",
    description="Financial data analyst",
)

agent.serve(port=10002)   # blocks until Ctrl+C

Mid-level — for use inside an existing async entrypoint:

from cogent.agent.a2a_server import serve_agent

async def main():
    await serve_agent(agent, port=10002)

asyncio.run(main())

Low-level — mount into an existing FastAPI application:

from fastapi import FastAPI
from cogent.agent.a2a_server import A2AServer
import uvicorn

app = FastAPI()
server = A2AServer(agent, port=10002, url="http://myhost:10002/a2a")

app.mount("/a2a", server.app)   # A2AServer.app is a FastAPI sub-application
uvicorn.run(app, port=10002)

Once served, any A2A client — another cogent system, a different framework, or a raw HTTP client — can call the agent at http://host:port/ and discover its capabilities at http://host:port/.well-known/agent.json.

In-process — self-contained scripts and tests:

start() launches the server as a background task and returns as soon as the port is bound. Call stop() when you are done. async with is also supported and stops the server automatically on block exit.

from cogent.agent.a2a_server import A2AServer

# Explicit start / stop
analyst_server = await A2AServer(agent, port=10099).start()
remote = Agent(url="http://localhost:10099", name="analyst")
response = await remote.run("What is 18% of 250?")
print(response.content)
await analyst_server.stop()

Multiple servers — flat, no nesting:

from cogent.agent.a2a_server import A2AServer

group = await A2AServer.start_many(
    (analyst,    10001),
    (researcher, 10002),
    (writer,     10003),
)

response = await coordinator.run("...")

await group.stop_all()

Context-manager style — automatic cleanup:

from cogent.agent.a2a_server import A2AServer

async with A2AServer(agent, port=10099) as server:
    remote = Agent(url="http://localhost:10099", name="analyst")
    response = await remote.run("What is 18% of 250?")
    print(response.content)
# Server stopped automatically on exit.

Conditional Delegation¶

The LLM decides when to delegate:

coordinator = Agent(
    instructions="""Analyze requests:
- For simple questions, answer directly
- For complex analysis, delegate to data_analyst
- For market research, delegate to market_researcher
Use your judgment on which tasks need specialist help.""",
    subagents={
        "data_analyst": analyst,
        "market_researcher": researcher,
    },
)

# LLM may or may not delegate based on complexity
response1 = await coordinator.run("What is 2+2?")  # Answers directly
response2 = await coordinator.run("Analyze Q4 sales trends")  # Delegates to analyst

Multi-Agent Streaming¶

When a coordinator runs with stream=True, subagent responses are streamed through the coordinator so consumers see tokens in real-time — no silence during delegation.

Each StreamChunk carries an agent field identifying which agent produced the token, and a type field for lifecycle events:

coordinator = Agent(
    name="coordinator",
    model="gpt-4o",
    subagents=[analyst],
)

async for chunk in coordinator.run("Analyse Q4 data", stream=True):
    if chunk.type == "subagent_start":
        print(f"\n[{chunk.agent} started]")
    elif chunk.type == "subagent_end":
        print(f"\n[{chunk.agent} finished]")
    elif chunk.type == "content":
        # chunk.agent tells you who's speaking
        print(chunk.content, end="", flush=True)

Chunk types:

`type`	Meaning
`"content"`	Regular content token (default)
`"subagent_start"`	Subagent delegation began
`"subagent_end"`	Subagent delegation finished
`"tool_result"`	Tool execution result

How it works:

Coordinator streams its own tokens with agent="coordinator"
When the LLM decides to delegate, a subagent_start chunk is yielded
If the subagent supports streaming, its tokens flow through with agent="analyst" (or whatever the subagent name is)
A subagent_end chunk signals completion
The coordinator continues with its final synthesis

This lets UIs show multiple agents working in real-time, attribute tokens to sources, and render side-by-side panels for parallel agents.

Observability: The streaming path emits the same subagent.called, subagent.result, and subagent.context events as the non-streaming executor, so the Observer console output is consistent regardless of execution mode.

Conversation Context Summarization¶

When a coordinator delegates to a subagent, the subagent only receives the task text — not the coordinator's full conversation history. For multi-turn conversations this can lose important context.

Enable delegate_summary to automatically attach a compressed briefing from the coordinator's conversation:

from cogent import Agent

# Default heuristic (fast, free, no LLM call)
coordinator = Agent(
    name="coordinator",
    model="gpt-4o",
    subagents=[analyst],
    delegate_summary=True,
)

The summarizer extracts recent user/assistant turns and prepends them to the delegated task inside <delegation_context> tags.

Built-in strategies:

Strategy	How it works	Cost
`HeuristicSummarizer` (default)	Extracts last N turns, truncates to char limit	Free
`LLMSummarizer`	Asks a model for a 1-paragraph briefing	Tokens

from cogent import Agent, HeuristicSummarizer, LLMSummarizer

# Custom heuristic
coordinator = Agent(
    name="coordinator",
    model="gpt-4o",
    subagents=[analyst],
    delegate_summary=HeuristicSummarizer(max_turns=3, max_chars=1000),
)

# LLM-powered (higher quality, costs tokens)
from cogent.models import ChatModel
summary_model = ChatModel(model="gpt-4o-mini")
coordinator = Agent(
    name="coordinator",
    model="gpt-4o",
    subagents=[analyst],
    delegate_summary=LLMSummarizer(model=summary_model),
)

You can also implement the ContextSummarizer protocol for custom strategies:

from cogent.agent.summarizer import ContextSummarizer
from cogent.core.messages import BaseMessage

class MyCustomSummarizer:
    async def summarize(
        self, messages: list[BaseMessage], task: str
    ) -> str | None:
        # Custom logic — return a string or None to skip
        ...

Mixed Tools and Subagents¶

Subagents and regular tools work together:

from cogent import tool

@tool
def search_database(query: str) -> str:
    """Search internal database."""
    return database.search(query)

coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",
    tools=[search_database],  # Regular tool
    subagents={
        "analyst": analyst_agent,  # Subagent
    },
)

# LLM can use both:
# 1. Call search_database tool → get data
# 2. Delegate to analyst → analyze data

Troubleshooting¶

Subagent not being called¶

Problem: LLM ignores subagent tools

Solutions: - Make coordinator instructions explicit about delegation - Use descriptive subagent names (e.g., "data_analyst" not "helper") - Add descriptions to subagent Agent configs

specialist = Agent(
    name="specialist",
    model="gpt-5.4",
    description="Expert in data analysis and statistics",  # Helps LLM understand when to call
)

Token counts seem wrong¶

Problem: Tokens don't match expectations

Debug:

tokens = response.metadata.tokens
print(f"Coordinator total: {tokens.total_tokens}")
print(f"  Prompt: {tokens.prompt_tokens}")
print(f"  Completion: {tokens.completion_tokens}")
if tokens.reasoning_tokens:
    print(f"  Reasoning: {tokens.reasoning_tokens}")

for sub in response.subagent_responses:
    sub_tokens = sub.metadata.tokens
    print(f"{sub.metadata.agent}: {sub_tokens.total_tokens} tokens")
    if sub_tokens.reasoning_tokens:
        print(f"  └─ Reasoning: {sub_tokens.reasoning_tokens}")

print(f"Delegation chain: {response.metadata.delegation_chain}")

Note: Token counts include prompt + completion + reasoning (when available). All categories are aggregated across coordinator and all subagents.

Subagent errors¶

Problem: Subagent fails during execution

Behavior: Error returned to coordinator as tool result, LLM can retry or handle

# LLM sees error message and can:
# 1. Retry with different parameters
# 2. Try different subagent
# 3. Handle error in response

Performance Considerations¶

Memory Usage¶

Each subagent maintains its own conversation history if conversation=True:

# ✅ GOOD: Disable conversation for stateless subagents
data_cleaner = Agent(
    name="data_cleaner",
    model="gpt-5.4-mini",
    conversation=False,  # Saves memory
)

# ❌ BAD: Unnecessary conversation history
data_cleaner = Agent(
    name="data_cleaner",
    model="gpt-5.4-mini",
    conversation=True,  # Wastes memory if not needed
)

Parallel Execution¶

Subagents execute in parallel when LLM calls multiple at once:

# LLM decides to call both in one turn
# → Both execute in parallel automatically
# → Results returned together

Model Selection¶

Use appropriate models for each role:

# ✅ GOOD: Match model to task complexity
coordinator = Agent(
    name="coordinator",
    model="gpt-5.4",  # Complex orchestration
    subagents={
        "summarizer": Agent(model="gpt-5.4-mini"),  # Simple task
        "analyst": Agent(model="gpt-5.4"),  # Complex analysis
    },
)

Examples¶

See: - examples/subagent/simple_delegation.py — Minimal delegation - examples/a2a/delegation.py — Remote agents alongside local agents - examples/a2a/showcase.py — Discovery, streaming, summarization, and context forwarding - examples/a2a/input_contracts.py — Input contracts with accepts= - examples/a2a/data_forwarding.py — Automatic data forwarding - examples/a2a/structured_messages.py — Structured data between agents

Comparison: Single Agent vs Multi-Agent¶

Aspect	Single Agent	Multi-Agent (Subagents)
Memory	Full memory architecture	Must share via RunContext
Complexity	Simple, one config	More setup, multiple configs
Specialization	Generalist approach	Focused specialists
Token cost	Usually lower	Higher (multiple calls)
Observability	One agent trace	Full delegation chain
Best for	Linear workflows	Complex coordination

Rule of Thumb: Start with a single agent. Add subagents when you need: - Clear specialist roles (SQL expert, data cleaner, etc.) - Separation of concerns (analysis vs presentation) - Delegated decision-making (coordinator decides who handles what)

FAQ¶

Q: Can subagents have different models?
A: Yes! Each agent can use a different model.

Q: Do subagents share conversation history?
A: No. Each agent has its own conversation if conversation=True. Use RunContext to share state.

Q: Can I mix subagents= and tools=?
A: Yes! They work together seamlessly.

Q: Are token counts accurate?
A: Yes - coordinator + all subagent tokens are aggregated automatically.

Q: Can subagents call each other?
A: Not directly. But nested subagents work (subagent has its own subagents).

Q: What if a subagent fails?
A: Error is returned to coordinator as a tool result. The LLM can handle it.

Q: How deep can I nest?
A: No hard limit, but 2-3 levels max is recommended for clarity.

Q: Does this work with all models?
A: Yes - any model that supports tool calling (OpenAI, Anthropic, Gemini, etc.)