Skip to content

Agentive Context Control (ACC)

Bounded memory for long conversations with drift prevention.

Overview

ACC (Agentic Context Compression) maintains bounded internal state instead of unbounded transcript replay. Based on arXiv:2601.11653, it prevents:

  • Context drift — Maintains constraints and entities across turns
  • Memory poisoning — Verifies artifacts before committing
  • Context overflow — Bounded state regardless of conversation length

Quick Start

Enable ACC with acc=True on Agent:

from cogent import Agent

agent = Agent(name="Assistant", model="gpt-5.4", acc=True)

# Use thread_id to persist context across turns
await agent.run("My name is Alice", thread_id="session-1")
await agent.run("I prefer dark mode", thread_id="session-1")
await agent.run("What's my name?", thread_id="session-1")  # Remembers!

Custom Bounds

For fine-grained control, pass custom bounds directly to AgentCognitiveCompressor:

from cogent import Agent
from cogent.memory.acc import AgentCognitiveCompressor

# Create ACC with custom bounds
acc = AgentCognitiveCompressor(
    max_constraints=10,  # Rules, guidelines (default: 10)
    max_entities=30,     # Facts, knowledge (default: 50)
    max_actions=20,      # Past actions (default: 30)
    max_context=15,      # Relevant context (default: 20)
)

# Pass directly to Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=acc)

# Access state for monitoring
print(f"Entities: {len(acc.state.entities)}/{acc.state.max_entities}")
print(f"Actions: {len(acc.state.actions)}/{acc.state.max_actions}")

Extraction Modes

ACC supports two extraction modes:

Mode Description Speed Quality
model LLM-based semantic extraction (default) Moderate High
heuristic Rule-based fallback ⚡ Fast Low

Model Mode (Default)

Uses an LLM to semantically extract constraints, entities, and actions. When no model= is specified, ACC uses the agent's model automatically:

# Default: uses agent's model automatically
acc = AgentCognitiveCompressor()

# Dedicated model for extraction (reduces cost if agent uses a large model)
acc = AgentCognitiveCompressor(model="gpt-5.4-mini")

# Any BaseChatModel works
from cogent.models import AnthropicChat
acc = AgentCognitiveCompressor(
    model=AnthropicChat(model="claude-3-haiku-20240307"),
)

Heuristic Mode

Fast rule-based fallback with no LLM call. Useful when extraction cost matters more than quality:

acc = AgentCognitiveCompressor(extraction_mode="heuristic")

When to Use ACC

Use ACC When Don't Use When
Long conversations (>10 turns) Short, stateless queries
Need to prevent drift Simple Q&A
Bounded memory is critical Need full transcript replay
Multi-turn workflows One-off operations

How ACC Works

ACC maintains bounded internal state with four categories:

Category Purpose Default Max
Constraints Rules, guidelines, requirements 10
Entities Facts, knowledge, data 50
Actions What worked/failed 30
Context Relevant snippets 20

Total: ~110 items regardless of conversation length.

from cogent.memory.acc import BoundedMemoryState

# View state contents
state = BoundedMemoryState()
print(state.constraints)  # List of constraints
print(state.entities)     # List of entities
print(state.actions)      # List of actions
print(state.context)      # List of context items

ACC vs SemanticCache

Feature ACC SemanticCache
Purpose Bounded conversation context Cache tool outputs
Matching Structured memory extraction Semantic similarity
Use Case Long conversations Expensive tool calls
Thread-aware Yes (thread_id) No

Use together: ACC for conversation context, SemanticCache for tool output caching.

ACC vs ContextCompressor

ACC and the ContextCompressor interceptor both manage context size, but they solve different problems at different layers:

ACC ContextCompressor
Problem Context drift over many turns Message list too long for this turn
When Every turn (LOAD/SAVE) PRE_THINK — before each LLM call
How Extracts structured artifacts into bounded state (~110 items max) Summarises old messages into shorter text
Persistence Cross-turn — remembers facts after messages are gone None — only shortens the current message list
Drift prevention Yes — semantic forget gate prunes by relevance + recency No — summaries lose structure over time

Guidance:

  • Use ACC for any multi-turn conversation (>5 turns). It is the right default.
  • Add ContextCompressor as an extra safety layer when very large context windows could still overflow despite ACC's bounded state.
  • Don't use ContextCompressor alone for long conversations — it has no drift prevention.
from cogent import Agent
from cogent.interceptors import ContextCompressor

# Recommended: ACC handles drift, ContextCompressor catches overflow
agent = Agent(
    name="assistant",
    model="gpt-5.4",
    acc=True,
    interceptors=[ContextCompressor(threshold_tokens=8000, keep_recent=4)],
)

Best Practices

  1. Always use thread_id — Required for context persistence across turns
  2. Set appropriate bounds — Smaller bounds = less context but faster
  3. Scope per user/session — Use unique thread_id per conversation
  4. Monitor state — Check entity/action counts for debugging

Examples

See working examples: - examples/memory/acc.py — ACC usage patterns - examples/advanced/content_review.py — ACC with Memory integration

API Reference

BoundedMemoryState

class BoundedMemoryState:
    def __init__(
        self,
        max_constraints: int = 10,
        max_entities: int = 50,
        max_actions: int = 30,
        max_context: int = 20,
    ):
        """Initialize bounded state with category limits."""

    @property
    def constraints(self) -> list[str]: ...
    @property
    def entities(self) -> list[str]: ...
    @property
    def actions(self) -> list[str]: ...
    @property
    def context(self) -> list[str]: ...

AgentCognitiveCompressor

class AgentCognitiveCompressor:
    def __init__(
        self,
        state: BoundedMemoryState,
        forget_gate: SemanticForgetGate | None = None,
    ):
        """Initialize ACC with bounded state."""

    async def update_from_turn(
        self,
        user_message: str,
        assistant_message: str,
        tool_calls: list[dict],
        current_task: str,
    ) -> None:
        """Update memory state from a conversation turn."""

Further Reading