Agentive Context Control (ACC)¶

Bounded memory for long conversations with drift prevention.

Overview¶

ACC (Agentic Context Compression) maintains bounded internal state instead of unbounded transcript replay. Based on arXiv:2601.11653, it prevents:

Context drift — Maintains constraints and entities across turns
Memory poisoning — Verifies artifacts before committing
Context overflow — Bounded state regardless of conversation length

Quick Start¶

Enable ACC with acc=True on Agent:

from cogent import Agent

agent = Agent(name="Assistant", model="gpt-5.4", acc=True)

# Use thread_id to persist context across turns
await agent.run("My name is Alice", thread_id="session-1")
await agent.run("I prefer dark mode", thread_id="session-1")
await agent.run("What's my name?", thread_id="session-1")  # Remembers!

Custom Bounds¶

For fine-grained control, pass custom bounds directly to AgentCognitiveCompressor:

from cogent import Agent
from cogent.memory.acc import AgentCognitiveCompressor

# Create ACC with custom bounds
acc = AgentCognitiveCompressor(
    max_constraints=10,  # Rules, guidelines (default: 10)
    max_entities=30,     # Facts, knowledge (default: 50)
    max_actions=20,      # Past actions (default: 30)
    max_context=15,      # Relevant context (default: 20)
)

# Pass directly to Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=acc)

# Access state for monitoring
print(f"Entities: {len(acc.state.entities)}/{acc.state.max_entities}")
print(f"Actions: {len(acc.state.actions)}/{acc.state.max_actions}")

Extraction Modes¶

ACC supports two extraction modes:

Mode	Description	Speed	Quality
`model`	LLM-based semantic extraction (default)	Moderate	High
`heuristic`	Rule-based fallback	⚡ Fast	Low

Model Mode (Default)¶

Uses an LLM to semantically extract constraints, entities, and actions. When no model= is specified, ACC uses the agent's model automatically:

# Default: uses agent's model automatically
acc = AgentCognitiveCompressor()

# Dedicated model for extraction (reduces cost if agent uses a large model)
acc = AgentCognitiveCompressor(model="gpt-5.4-mini")

# Any BaseChatModel works
from cogent.models import AnthropicChat
acc = AgentCognitiveCompressor(
    model=AnthropicChat(model="claude-3-haiku-20240307"),
)

Heuristic Mode¶

Fast rule-based fallback with no LLM call. Useful when extraction cost matters more than quality:

acc = AgentCognitiveCompressor(extraction_mode="heuristic")

When to Use ACC¶

Use ACC When	Don't Use When
Long conversations (>10 turns)	Short, stateless queries
Need to prevent drift	Simple Q&A
Bounded memory is critical	Need full transcript replay
Multi-turn workflows	One-off operations

How ACC Works¶

ACC maintains bounded internal state with four categories:

Category	Purpose	Default Max
Constraints	Rules, guidelines, requirements	10
Entities	Facts, knowledge, data	50
Actions	What worked/failed	30
Context	Relevant snippets	20

Total: ~110 items regardless of conversation length.

from cogent.memory.acc import BoundedMemoryState

# View state contents
state = BoundedMemoryState()
print(state.constraints)  # List of constraints
print(state.entities)     # List of entities
print(state.actions)      # List of actions
print(state.context)      # List of context items

ACC vs SemanticCache¶

Feature	ACC	SemanticCache
Purpose	Bounded conversation context	Cache tool outputs
Matching	Structured memory extraction	Semantic similarity
Use Case	Long conversations	Expensive tool calls
Thread-aware	Yes (thread_id)	No

Use together: ACC for conversation context, SemanticCache for tool output caching.

ACC vs ContextCompressor¶

ACC and the ContextCompressor interceptor both manage context size, but they solve different problems at different layers:

	ACC	ContextCompressor
Problem	Context drift over many turns	Message list too long for this turn
When	Every turn (LOAD/SAVE)	PRE_THINK — before each LLM call
How	Extracts structured artifacts into bounded state (~110 items max)	Summarises old messages into shorter text
Persistence	Cross-turn — remembers facts after messages are gone	None — only shortens the current message list
Drift prevention	Yes — semantic forget gate prunes by relevance + recency	No — summaries lose structure over time

Guidance:

Use ACC for any multi-turn conversation (>5 turns). It is the right default.
Add ContextCompressor as an extra safety layer when very large context windows could still overflow despite ACC's bounded state.
Don't use ContextCompressor alone for long conversations — it has no drift prevention.

from cogent import Agent
from cogent.interceptors import ContextCompressor

# Recommended: ACC handles drift, ContextCompressor catches overflow
agent = Agent(
    name="assistant",
    model="gpt-5.4",
    acc=True,
    interceptors=[ContextCompressor(threshold_tokens=8000, keep_recent=4)],
)

Best Practices¶

Always use thread_id — Required for context persistence across turns
Set appropriate bounds — Smaller bounds = less context but faster
Scope per user/session — Use unique thread_id per conversation
Monitor state — Check entity/action counts for debugging

Examples¶

See working examples: - examples/memory/acc.py — ACC usage patterns - examples/advanced/content_review.py — ACC with Memory integration

API Reference¶

BoundedMemoryState¶

class BoundedMemoryState:
    def __init__(
        self,
        max_constraints: int = 10,
        max_entities: int = 50,
        max_actions: int = 30,
        max_context: int = 20,
    ):
        """Initialize bounded state with category limits."""

    @property
    def constraints(self) -> list[str]: ...
    @property
    def entities(self) -> list[str]: ...
    @property
    def actions(self) -> list[str]: ...
    @property
    def context(self) -> list[str]: ...

AgentCognitiveCompressor¶

class AgentCognitiveCompressor:
    def __init__(
        self,
        state: BoundedMemoryState,
        forget_gate: SemanticForgetGate | None = None,
    ):
        """Initialize ACC with bounded state."""

    async def update_from_turn(
        self,
        user_message: str,
        assistant_message: str,
        tool_calls: list[dict],
        current_task: str,
    ) -> None:
        """Update memory state from a conversation turn."""