Models Module¶
The cogent.models module provides a 3-tier API for working with LLMs - from simple string-based models to full control with direct SDK access.
🎯 3-Tier Model API¶
Cogent offers three levels of abstraction - choose based on your needs:
Tier 1: High-Level (String Models) ⭐ Recommended¶
The simplest way to get started. Just use model name strings:
from cogent import Agent
# Auto-resolves to gpt-4.1
agent = Agent("Helper", model="gpt4")
# Auto-resolves to gemini-2.5-flash
agent = Agent("Helper", model="gemini")
# Auto-resolves to claude-sonnet-4-6
agent = Agent("Helper", model="claude")
# Provider prefix for explicit control
agent = Agent("Helper", model="anthropic:claude-opus-4")
agent = Agent("Helper", model="openai:gpt-5.4")
# Reasoning effort with @suffix
agent = Agent("Helper", model="o3-mini@high")
agent = Agent("Helper", model="claude@high")
agent = Agent("Helper", model="gemini-pro@16k") # thinking budget
@effort Shorthand:
Append @low, @medium, @high, or @max to any model string to set reasoning effort. Use @<number> or @<number>k for a token thinking budget:
| Suffix | Meaning | Example |
|---|---|---|
@low / @medium / @high / @max |
Named effort level | "o3-mini@high" |
@8192 / @16k |
Token thinking budget | "claude-sonnet@16k" |
The suffix is mapped to the correct provider-native parameter automatically (reasoning_effort for OpenAI/xAI, adaptive_thinking + effort for Anthropic, thinking_budget for Gemini). Explicit model_kwargs override the @ suffix.
Model Aliases (core providers):
- gpt4, gpt4o, gpt4o-mini, gpt4-mini, gpt4-turbo, gpt35, gpt5, gpt5-mini, gpt5-nano
- claude, claude-sonnet, claude-opus, claude-haiku
- gemini, gemini-flash, gemini-flash-lite, gemini-pro, gemini3, gemini-3.1 ⚠️
- deepseek, deepseek-r1
- ollama
Other providers — use explicit provider:model syntax:
- groq:llama-3.3-70b-versatile, xai:grok-4.20, mistral:mistral-large-latest
- cerebras:llama3.1-8b, cohere:command-a-03-2025, openrouter:openai/gpt-4o
⚠️ = Preview model (not production-ready)
API Key Loading (Priority Order):
1. Explicit api_key= parameter (highest)
2. Environment variables (includes .env when loaded)
3. Config file cogent.toml / cogent.yaml or ~/.cogent/config.* (lowest)
Tier 2: Medium-Level (Factory Functions)¶
For when you need a model instance without an agent. Supports 4 flexible usage patterns:
from cogent.models import create_chat
# Pattern 1: Model name only (auto-detects provider)
llm = create_chat("gpt-5.4") # OpenAI
llm = create_chat("gemini-2.5-pro") # Google Gemini
llm = create_chat("claude-sonnet-4") # Anthropic
llm = create_chat("llama-3.1-8b-instant") # Groq
llm = create_chat("mistral-small-latest") # Mistral
# Pattern 2: Provider:model syntax (explicit provider prefix)
llm = create_chat("openai:gpt-5.4")
llm = create_chat("gemini:gemini-2.5-flash")
llm = create_chat("anthropic:claude-sonnet-4-20250514")
# Pattern 3: Separate provider and model arguments
llm = create_chat("openai", "gpt-5.4")
llm = create_chat("gemini", "gemini-2.5-pro")
llm = create_chat("anthropic", "claude-sonnet-4")
# Pattern 4: With additional configuration
llm = create_chat("gpt-5.4", temperature=0.7, max_tokens=1000)
llm = create_chat("openai", "gpt-5.4", api_key="sk-custom...")
# Use the model
response = await llm.ainvoke("What is 2+2?")
print(response.content)
Auto-Detection: Patterns 1 and 2 automatically detect the provider from model name prefixes:
- OpenAI: gpt-, o1-, o3-, o4-, text-embedding-, gpt-audio, gpt-realtime, sora-
- Gemini: gemini-, text-embedding-
- Anthropic: claude-
- xAI: grok-
- DeepSeek: deepseek-
- Cerebras: llama3.1- (opinionated default — use cerebras:* for explicit routing)
- Mistral: mistral-, ministral-, magistral-, devstral-, codestral-, voxtral-, ocr-
- Cohere: command-, c4ai-aya-, embed-, rerank-
- Groq: llama-, mixtral-, qwen-, gemma-
- Cloudflare: @cf/
Tier 3: Low-Level (Direct Model Classes)¶
For maximum control over model configuration:
from cogent.models import OpenAIChat, AnthropicChat, GeminiChat
# Full control over all parameters
model = OpenAIChat(
model="gpt-5.4",
temperature=0.7,
max_tokens=2000,
api_key="sk-...",
organization="org-...",
)
model = GeminiChat(
model="gemini-2.5-flash",
temperature=0.9,
api_key="...",
)
model = AnthropicChat(
model="claude-sonnet-4-20250514",
max_tokens=4096,
api_key="sk-ant-...",
)
When to Use Each Tier:
| Tier | Use Case | Example |
|---|---|---|
| Tier 1 (Strings) | Quick prototyping, simple agents | Agent(model="gpt4") |
| Tier 2 (Factory) | Reusable model instances | create_chat("claude") |
| Tier 3 (Direct) | Custom config, advanced features | OpenAIChat(temperature=0.9) |
Model Catalog¶
ModelCatalog is a queryable collection of model metadata populated by live provider API calls. There is no bundled static catalog — every fetch hits the real API and returns current model IDs, pricing, and context windows.
Fetch models from a provider¶
from cogent.models.catalog import ModelCatalog
# Fetch from one provider
catalog = await ModelCatalog.from_provider("openai")
# List active models
for m in catalog.list_models():
print(m.id, m.context_window)
# Query helpers
catalog.list_models(provider="openai", capability="tools")
catalog.get_model("gpt-5.4")
catalog.is_available("claude-sonnet-4-6")
catalog.find_latest(family="gpt-5.4")
catalog.cheapest(capability="tools", by="input")
catalog.summary() # {provider: {status: count}}
Supported provider names: "openai", "anthropic", "groq", "mistral", "gemini", "xai", "deepseek", "cerebras", "cohere", "openrouter".
Fetch from OpenRouter (all providers in one call)¶
OpenRouter exposes 200+ models from all major providers with live pricing and context-window data in a single request.
catalog = await ModelCatalog.from_openrouter()
# Models from every provider, with pricing
for m in catalog.list_models(status=None):
print(f"{m.provider}/{m.id} ${m.input_cost_per_1m}/1M in")
# Find the cheapest tool-capable model across all providers
best = catalog.cheapest(capability="tools")
print(best.id, best.input_cost_per_1m)
Cache results locally¶
Save a snapshot to disk and reload it for offline use or to avoid redundant API calls.
# Save
catalog = await ModelCatalog.from_openrouter()
catalog.save("~/.cogent/models.json")
# Load
catalog = ModelCatalog.load("~/.cogent/models.json")
print(catalog.fetched_at) # ISO-8601 timestamp of when it was fetched
The saved format is {"fetched_at": "...", "models": [...]} — a plain snapshot with no schema versioning. Use fetched_at to decide whether to refresh.
discover_models.py script¶
The scripts/discover_models.py utility probes provider APIs and prints available models. It delegates to ModelCatalog.from_provider() internally.
# Print all providers
uv run python scripts/discover_models.py
# Single provider
uv run python scripts/discover_models.py --provider anthropic
# OpenRouter (200+ models with pricing)
uv run python scripts/discover_models.py --provider openrouter
# Save to cache
uv run python scripts/discover_models.py --save ~/.cogent/models.json
Configuration¶
.env File (Recommended for Development)¶
Create a .env file in your project root:
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
GROQ_API_KEY=gsk_...
Cogent automatically loads .env files using python-dotenv.
Model Overrides (Environment + Config)¶
You can override default chat or embedding models via env vars or config files.
Environment variables (highest):
OPENAI_CHAT_MODEL=gpt-4.1
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
GEMINI_CHAT_MODEL=gemini-2.5-flash
GEMINI_EMBEDDING_MODEL=gemini-embedding-001
MISTRAL_CHAT_MODEL=mistral-small-latest
MISTRAL_EMBEDDING_MODEL=mistral-embed
GROQ_CHAT_MODEL=llama-3.1-8b-instant
COHERE_CHAT_MODEL=command-a-03-2025
COHERE_EMBEDDING_MODEL=embed-english-v3.0
CLOUDFLARE_CHAT_MODEL=@cf/meta/llama-3.1-8b-instruct
CLOUDFLARE_EMBEDDING_MODEL=@cf/baai/bge-base-en-v1.5
GITHUB_CHAT_MODEL=gpt-4.1
GITHUB_EMBEDDING_MODEL=text-embedding-3-large
OLLAMA_CHAT_MODEL=qwen2.5:7b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
Config file (fallback):
Config File (Recommended for Production)¶
Create a config file at one of these locations:
TOML Format (cogent.toml or ~/.cogent/config.toml):
[models]
default = "gpt4"
[models.openai]
api_key = "sk-..."
organization = "org-..."
[models.anthropic]
api_key = "sk-ant-..."
[models.gemini]
api_key = "..."
[models.groq]
api_key = "gsk_..."
YAML Format (cogent.yaml or ~/.cogent/config.yaml):
models:
default: gpt4
openai:
api_key: sk-...
organization: org-...
anthropic:
api_key: sk-ant-...
gemini:
api_key: ...
Environment Variables¶
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=AIza...
export GROQ_API_KEY=gsk_...
Provider Support¶
All chat models now accept multiple input formats for maximum convenience:
1. Simple String (Most Convenient)¶
2. List of Dicts (Standard Format)¶
response = await model.ainvoke([
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"},
])
3. Message Objects (Type-Safe)¶
from cogent.core.messages import SystemMessage, HumanMessage
response = await model.ainvoke([
SystemMessage(content="You are helpful"),
HumanMessage(content="Hello"),
])
Factory Function¶
Create models dynamically by provider:
from cogent.models import create_chat, create_embedding
# OpenAI
model = create_chat("openai", model="gpt-5.4")
# Azure
model = create_chat(
"azure",
deployment="gpt-5.4",
azure_endpoint="https://your-resource.openai.azure.com",
entra=AzureEntraAuth(method="default"),
)
# Anthropic
model = create_chat("anthropic", model="claude-sonnet-4-20250514")
# Groq
model = create_chat("groq", model="llama-3.3-70b-versatile")
# Gemini
model = create_chat("gemini", model="gemini-2.0-flash")
# Ollama
model = create_chat("ollama", model="llama3.2")
# xAI (Grok)
model = create_chat("xai", model="grok-4-1-fast")
# DeepSeek
model = create_chat("deepseek", model="deepseek-chat")
model = create_chat("deepseek", model="deepseek-reasoner") # Reasoning model
# Custom
model = create_chat(
"custom",
base_url="http://localhost:8000/v1",
model="my-model",
)
Mock Models¶
For testing without API calls:
from cogent.models import MockChatModel, MockEmbedding
# Predictable responses
model = MockChatModel(responses=["Hello!", "How can I help?"])
response = await model.ainvoke([{"role": "user", "content": "Hi"}])
print(response.content) # "Hello!"
response = await model.ainvoke([{"role": "user", "content": "Help"}])
print(response.content) # "How can I help?"
# Mock embeddings
embeddings = MockEmbedding(dimension=384)
vectors = await embeddings.embed_documents(["test"])
print(len(vectors[0])) # 384
Base Classes¶
BaseChatModel¶
Protocol for all chat models:
from cogent.models.base import BaseChatModel
class BaseChatModel(Protocol):
async def ainvoke(
self,
messages: list[dict],
**kwargs,
) -> AIMessage: ...
async def astream(
self,
messages: list[dict],
**kwargs,
) -> AsyncIterator[AIMessage]: ...
def bind_tools(
self,
tools: list[BaseTool],
) -> BaseChatModel: ...
AIMessage¶
Response type from chat models:
from cogent.models.base import AIMessage
@dataclass
class AIMessage:
content: str
tool_calls: list[dict] | None = None
usage: dict | None = None # {"input_tokens": ..., "output_tokens": ...}
raw: Any = None # Original provider response
BaseEmbedding¶
Standardized protocol for all embedding models:
from cogent.models.base import BaseEmbedding
from cogent.core.messages import EmbeddingResult
class BaseEmbedding(ABC):
# Primary methods - return full metadata
@abstractmethod
def embed(self, texts: list[str]) -> EmbeddingResult:
"""Embed texts synchronously with metadata."""
...
@abstractmethod
async def aembed(self, texts: list[str]) -> EmbeddingResult:
"""Embed texts asynchronously with metadata."""
...
# Convenience methods - single text, no metadata
def embed_one(self, text: str) -> list[float]:
"""Embed single text synchronously, returns vector only."""
...
async def aembed_one(self, text: str) -> list[float]:
"""Embed single text asynchronously, returns vector only."""
...
# VectorStore protocol - async, no metadata
async def embed_texts(self, texts: list[str]) -> list[list[float]]:
"""Embed texts for VectorStore (async, returns vectors only)."""
...
async def embed_query(self, text: str) -> list[float]:
"""Embed query for VectorStore (async, returns vector only)."""
...
@property
def dimension(self) -> int:
"""Return embedding dimension."""
...
All 9 providers implement this API: - OpenAIEmbedding - AzureOpenAIEmbedding - OllamaEmbedding - CohereEmbedding - GeminiEmbedding - CloudflareEmbedding - MistralEmbedding - CustomEmbedding - MockEmbedding
API Reference¶
ChatModel Aliases¶
| Alias | Actual Class |
|---|---|
ChatModel |
OpenAIChat |
EmbeddingModel |
OpenAIEmbedding |
Provider Classes¶
| Provider | Chat Class | Embedding Class |
|---|---|---|
| OpenAI | OpenAIChat |
OpenAIEmbedding |
| Azure | AzureOpenAIChat |
AzureOpenAIEmbedding |
| Anthropic | AnthropicChat |
- |
| Groq | GroqChat |
- |
| Gemini | GeminiChat |
GeminiEmbedding |
| xAI | XAIChat |
- |
| DeepSeek | DeepSeekChat |
- |
| Ollama | OllamaChat |
OllamaEmbedding |
| OpenRouter | OpenRouterChat |
- |
| Custom | CustomChat |
CustomEmbedding |
Factory Functions¶
| Function | Description |
|---|---|
create_chat(provider, **kwargs) |
Create chat model for any provider |
create_embedding(provider, **kwargs) |
Create embedding model for any provider |
Further Reading¶
- Providers — Setup for OpenAI, Anthropic, Gemini, Groq, Azure, xAI, DeepSeek, Ollama, OpenRouter, and more
- Embeddings — Standardized embedding API across 9 providers
- Reasoning & Streaming — Thinking models, streaming metadata, structured output from models