Skip to content

Models Module

The cogent.models module provides a 3-tier API for working with LLMs - from simple string-based models to full control with direct SDK access.

🎯 3-Tier Model API

Cogent offers three levels of abstraction - choose based on your needs:

The simplest way to get started. Just use model name strings:

from cogent import Agent

# Auto-resolves to gpt-4.1
agent = Agent("Helper", model="gpt4")

# Auto-resolves to gemini-2.5-flash
agent = Agent("Helper", model="gemini")

# Auto-resolves to claude-sonnet-4-6
agent = Agent("Helper", model="claude")

# Provider prefix for explicit control
agent = Agent("Helper", model="anthropic:claude-opus-4")
agent = Agent("Helper", model="openai:gpt-5.4")

# Reasoning effort with @suffix
agent = Agent("Helper", model="o3-mini@high")
agent = Agent("Helper", model="claude@high")
agent = Agent("Helper", model="gemini-pro@16k")  # thinking budget

@effort Shorthand:

Append @low, @medium, @high, or @max to any model string to set reasoning effort. Use @<number> or @<number>k for a token thinking budget:

Suffix Meaning Example
@low / @medium / @high / @max Named effort level "o3-mini@high"
@8192 / @16k Token thinking budget "claude-sonnet@16k"

The suffix is mapped to the correct provider-native parameter automatically (reasoning_effort for OpenAI/xAI, adaptive_thinking + effort for Anthropic, thinking_budget for Gemini). Explicit model_kwargs override the @ suffix.

Model Aliases (core providers): - gpt4, gpt4o, gpt4o-mini, gpt4-mini, gpt4-turbo, gpt35, gpt5, gpt5-mini, gpt5-nano - claude, claude-sonnet, claude-opus, claude-haiku - gemini, gemini-flash, gemini-flash-lite, gemini-pro, gemini3, gemini-3.1 ⚠️ - deepseek, deepseek-r1 - ollama

Other providers — use explicit provider:model syntax: - groq:llama-3.3-70b-versatile, xai:grok-4.20, mistral:mistral-large-latest - cerebras:llama3.1-8b, cohere:command-a-03-2025, openrouter:openai/gpt-4o

⚠️ = Preview model (not production-ready)

API Key Loading (Priority Order): 1. Explicit api_key= parameter (highest) 2. Environment variables (includes .env when loaded) 3. Config file cogent.toml / cogent.yaml or ~/.cogent/config.* (lowest)

Tier 2: Medium-Level (Factory Functions)

For when you need a model instance without an agent. Supports 4 flexible usage patterns:

from cogent.models import create_chat

# Pattern 1: Model name only (auto-detects provider)
llm = create_chat("gpt-5.4")              # OpenAI
llm = create_chat("gemini-2.5-pro")      # Google Gemini
llm = create_chat("claude-sonnet-4")     # Anthropic
llm = create_chat("llama-3.1-8b-instant")  # Groq
llm = create_chat("mistral-small-latest")  # Mistral

# Pattern 2: Provider:model syntax (explicit provider prefix)
llm = create_chat("openai:gpt-5.4")
llm = create_chat("gemini:gemini-2.5-flash")
llm = create_chat("anthropic:claude-sonnet-4-20250514")

# Pattern 3: Separate provider and model arguments
llm = create_chat("openai", "gpt-5.4")
llm = create_chat("gemini", "gemini-2.5-pro")
llm = create_chat("anthropic", "claude-sonnet-4")

# Pattern 4: With additional configuration
llm = create_chat("gpt-5.4", temperature=0.7, max_tokens=1000)
llm = create_chat("openai", "gpt-5.4", api_key="sk-custom...")

# Use the model
response = await llm.ainvoke("What is 2+2?")
print(response.content)

Auto-Detection: Patterns 1 and 2 automatically detect the provider from model name prefixes: - OpenAI: gpt-, o1-, o3-, o4-, text-embedding-, gpt-audio, gpt-realtime, sora- - Gemini: gemini-, text-embedding- - Anthropic: claude- - xAI: grok- - DeepSeek: deepseek- - Cerebras: llama3.1- (opinionated default — use cerebras:* for explicit routing) - Mistral: mistral-, ministral-, magistral-, devstral-, codestral-, voxtral-, ocr- - Cohere: command-, c4ai-aya-, embed-, rerank- - Groq: llama-, mixtral-, qwen-, gemma- - Cloudflare: @cf/

Tier 3: Low-Level (Direct Model Classes)

For maximum control over model configuration:

from cogent.models import OpenAIChat, AnthropicChat, GeminiChat

# Full control over all parameters
model = OpenAIChat(
    model="gpt-5.4",
    temperature=0.7,
    max_tokens=2000,
    api_key="sk-...",
    organization="org-...",
)

model = GeminiChat(
    model="gemini-2.5-flash",
    temperature=0.9,
    api_key="...",
)

model = AnthropicChat(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    api_key="sk-ant-...",
)

When to Use Each Tier:

Tier Use Case Example
Tier 1 (Strings) Quick prototyping, simple agents Agent(model="gpt4")
Tier 2 (Factory) Reusable model instances create_chat("claude")
Tier 3 (Direct) Custom config, advanced features OpenAIChat(temperature=0.9)

Model Catalog

ModelCatalog is a queryable collection of model metadata populated by live provider API calls. There is no bundled static catalog — every fetch hits the real API and returns current model IDs, pricing, and context windows.

Fetch models from a provider

from cogent.models.catalog import ModelCatalog

# Fetch from one provider
catalog = await ModelCatalog.from_provider("openai")

# List active models
for m in catalog.list_models():
    print(m.id, m.context_window)

# Query helpers
catalog.list_models(provider="openai", capability="tools")
catalog.get_model("gpt-5.4")
catalog.is_available("claude-sonnet-4-6")
catalog.find_latest(family="gpt-5.4")
catalog.cheapest(capability="tools", by="input")
catalog.summary()   # {provider: {status: count}}

Supported provider names: "openai", "anthropic", "groq", "mistral", "gemini", "xai", "deepseek", "cerebras", "cohere", "openrouter".

Fetch from OpenRouter (all providers in one call)

OpenRouter exposes 200+ models from all major providers with live pricing and context-window data in a single request.

catalog = await ModelCatalog.from_openrouter()

# Models from every provider, with pricing
for m in catalog.list_models(status=None):
    print(f"{m.provider}/{m.id}  ${m.input_cost_per_1m}/1M in")

# Find the cheapest tool-capable model across all providers
best = catalog.cheapest(capability="tools")
print(best.id, best.input_cost_per_1m)

Cache results locally

Save a snapshot to disk and reload it for offline use or to avoid redundant API calls.

# Save
catalog = await ModelCatalog.from_openrouter()
catalog.save("~/.cogent/models.json")

# Load
catalog = ModelCatalog.load("~/.cogent/models.json")
print(catalog.fetched_at)   # ISO-8601 timestamp of when it was fetched

The saved format is {"fetched_at": "...", "models": [...]} — a plain snapshot with no schema versioning. Use fetched_at to decide whether to refresh.

discover_models.py script

The scripts/discover_models.py utility probes provider APIs and prints available models. It delegates to ModelCatalog.from_provider() internally.

# Print all providers
uv run python scripts/discover_models.py

# Single provider
uv run python scripts/discover_models.py --provider anthropic

# OpenRouter (200+ models with pricing)
uv run python scripts/discover_models.py --provider openrouter

# Save to cache
uv run python scripts/discover_models.py --save ~/.cogent/models.json

Configuration

Create a .env file in your project root:

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
GROQ_API_KEY=gsk_...

Cogent automatically loads .env files using python-dotenv.

Model Overrides (Environment + Config)

You can override default chat or embedding models via env vars or config files.

Environment variables (highest):

OPENAI_CHAT_MODEL=gpt-4.1
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
GEMINI_CHAT_MODEL=gemini-2.5-flash
GEMINI_EMBEDDING_MODEL=gemini-embedding-001
MISTRAL_CHAT_MODEL=mistral-small-latest
MISTRAL_EMBEDDING_MODEL=mistral-embed
GROQ_CHAT_MODEL=llama-3.1-8b-instant
COHERE_CHAT_MODEL=command-a-03-2025
COHERE_EMBEDDING_MODEL=embed-english-v3.0
CLOUDFLARE_CHAT_MODEL=@cf/meta/llama-3.1-8b-instruct
CLOUDFLARE_EMBEDDING_MODEL=@cf/baai/bge-base-en-v1.5
GITHUB_CHAT_MODEL=gpt-4.1
GITHUB_EMBEDDING_MODEL=text-embedding-3-large
OLLAMA_CHAT_MODEL=qwen2.5:7b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

Config file (fallback):

[models.openai]
chat_model = "gpt-4.1"
embedding_model = "text-embedding-3-large"

Create a config file at one of these locations:

TOML Format (cogent.toml or ~/.cogent/config.toml):

[models]
default = "gpt4"

[models.openai]
api_key = "sk-..."
organization = "org-..."

[models.anthropic]
api_key = "sk-ant-..."

[models.gemini]
api_key = "..."

[models.groq]
api_key = "gsk_..."

YAML Format (cogent.yaml or ~/.cogent/config.yaml):

models:
  default: gpt4

  openai:
    api_key: sk-...
    organization: org-...

  anthropic:
    api_key: sk-ant-...

  gemini:
    api_key: ...

Environment Variables

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=AIza...
export GROQ_API_KEY=gsk_...

Provider Support

All chat models now accept multiple input formats for maximum convenience:

1. Simple String (Most Convenient)

response = await model.ainvoke("What is the capital of France?")

2. List of Dicts (Standard Format)

response = await model.ainvoke([
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "Hello"},
])

3. Message Objects (Type-Safe)

from cogent.core.messages import SystemMessage, HumanMessage

response = await model.ainvoke([
    SystemMessage(content="You are helpful"),
    HumanMessage(content="Hello"),
])


Factory Function

Create models dynamically by provider:

from cogent.models import create_chat, create_embedding

# OpenAI
model = create_chat("openai", model="gpt-5.4")

# Azure
model = create_chat(
    "azure",
    deployment="gpt-5.4",
    azure_endpoint="https://your-resource.openai.azure.com",
    entra=AzureEntraAuth(method="default"),
)

# Anthropic
model = create_chat("anthropic", model="claude-sonnet-4-20250514")

# Groq
model = create_chat("groq", model="llama-3.3-70b-versatile")

# Gemini
model = create_chat("gemini", model="gemini-2.0-flash")

# Ollama
model = create_chat("ollama", model="llama3.2")

# xAI (Grok)
model = create_chat("xai", model="grok-4-1-fast")

# DeepSeek
model = create_chat("deepseek", model="deepseek-chat")
model = create_chat("deepseek", model="deepseek-reasoner")  # Reasoning model

# Custom
model = create_chat(
    "custom",
    base_url="http://localhost:8000/v1",
    model="my-model",
)

Mock Models

For testing without API calls:

from cogent.models import MockChatModel, MockEmbedding

# Predictable responses
model = MockChatModel(responses=["Hello!", "How can I help?"])

response = await model.ainvoke([{"role": "user", "content": "Hi"}])
print(response.content)  # "Hello!"

response = await model.ainvoke([{"role": "user", "content": "Help"}])
print(response.content)  # "How can I help?"

# Mock embeddings
embeddings = MockEmbedding(dimension=384)
vectors = await embeddings.embed_documents(["test"])
print(len(vectors[0]))  # 384


Base Classes

BaseChatModel

Protocol for all chat models:

from cogent.models.base import BaseChatModel

class BaseChatModel(Protocol):
    async def ainvoke(
        self,
        messages: list[dict],
        **kwargs,
    ) -> AIMessage: ...

    async def astream(
        self,
        messages: list[dict],
        **kwargs,
    ) -> AsyncIterator[AIMessage]: ...

    def bind_tools(
        self,
        tools: list[BaseTool],
    ) -> BaseChatModel: ...

AIMessage

Response type from chat models:

from cogent.models.base import AIMessage

@dataclass
class AIMessage:
    content: str
    tool_calls: list[dict] | None = None
    usage: dict | None = None  # {"input_tokens": ..., "output_tokens": ...}
    raw: Any = None  # Original provider response

BaseEmbedding

Standardized protocol for all embedding models:

from cogent.models.base import BaseEmbedding
from cogent.core.messages import EmbeddingResult

class BaseEmbedding(ABC):
    # Primary methods - return full metadata
    @abstractmethod
    def embed(self, texts: list[str]) -> EmbeddingResult:
        """Embed texts synchronously with metadata."""
        ...

    @abstractmethod
    async def aembed(self, texts: list[str]) -> EmbeddingResult:
        """Embed texts asynchronously with metadata."""
        ...

    # Convenience methods - single text, no metadata
    def embed_one(self, text: str) -> list[float]:
        """Embed single text synchronously, returns vector only."""
        ...

    async def aembed_one(self, text: str) -> list[float]:
        """Embed single text asynchronously, returns vector only."""
        ...

    # VectorStore protocol - async, no metadata
    async def embed_texts(self, texts: list[str]) -> list[list[float]]:
        """Embed texts for VectorStore (async, returns vectors only)."""
        ...

    async def embed_query(self, text: str) -> list[float]:
        """Embed query for VectorStore (async, returns vector only)."""
        ...

    @property
    def dimension(self) -> int:
        """Return embedding dimension."""
        ...

All 9 providers implement this API: - OpenAIEmbedding - AzureOpenAIEmbedding - OllamaEmbedding - CohereEmbedding - GeminiEmbedding - CloudflareEmbedding - MistralEmbedding - CustomEmbedding - MockEmbedding


API Reference

ChatModel Aliases

Alias Actual Class
ChatModel OpenAIChat
EmbeddingModel OpenAIEmbedding

Provider Classes

Provider Chat Class Embedding Class
OpenAI OpenAIChat OpenAIEmbedding
Azure AzureOpenAIChat AzureOpenAIEmbedding
Anthropic AnthropicChat -
Groq GroqChat -
Gemini GeminiChat GeminiEmbedding
xAI XAIChat -
DeepSeek DeepSeekChat -
Ollama OllamaChat OllamaEmbedding
OpenRouter OpenRouterChat -
Custom CustomChat CustomEmbedding

Factory Functions

Function Description
create_chat(provider, **kwargs) Create chat model for any provider
create_embedding(provider, **kwargs) Create embedding model for any provider

Further Reading

  • Providers — Setup for OpenAI, Anthropic, Gemini, Groq, Azure, xAI, DeepSeek, Ollama, OpenRouter, and more
  • Embeddings — Standardized embedding API across 9 providers
  • Reasoning & Streaming — Thinking models, streaming metadata, structured output from models