Embeddings¶
All 9 embedding providers support a standardized API with rich metadata and flexible usage patterns.
See Models Overview for the 3-tier API and factory functions.
All 9 embedding providers support a standardized API with rich metadata and flexible usage patterns:
from cogent.models import OpenAIEmbedding, GeminiEmbedding, OllamaEmbedding
embedder = OpenAIEmbedding(model="text-embedding-3-small")
# Primary API: embed() / aembed() - Returns EmbeddingResult with full metadata
result = await embedder.aembed(["Hello world", "Cogent"])
print(result.embeddings) # list[list[float]] - the actual vectors
print(result.metadata.model) # "text-embedding-3-small"
print(result.metadata.tokens) # TokenUsage(prompt=4, completion=0, total=4)
print(result.metadata.dimensions) # 1536
print(result.metadata.duration) # 0.181 seconds
print(result.metadata.num_texts) # 2
# Convenience: embed_one() / aembed_one() - Single text, returns vector only
vector = await embedder.aembed_one("Single text")
print(len(vector)) # 1536
# Sync versions
result = embedder.embed(["Text 1", "Text 2"])
vector = embedder.embed_one("Single text")
# VectorStore protocol: embed_texts() / embed_query() - Async, no metadata
vectors = await embedder.embed_texts(["Doc1", "Doc2"]) # list[list[float]]
query_vec = await embedder.embed_query("Search query") # list[float]
Standardized API Summary:
| Method | Input | Returns | Async | Metadata |
|---|---|---|---|---|
embed(texts) |
list[str] |
EmbeddingResult |
❌ | ✅ |
aembed(texts) |
list[str] |
EmbeddingResult |
✅ | ✅ |
embed_one(text) |
str |
list[float] |
❌ | ❌ |
aembed_one(text) |
str |
list[float] |
✅ | ❌ |
embed_texts(texts) |
list[str] |
list[list[float]] |
✅ | ❌ |
embed_query(text) |
str |
list[float] |
✅ | ❌ |
dimension |
property | int |
- | - |
Embedding Metadata¶
All 9 embedding providers return complete metadata:
| Provider | Token Usage | Notes |
|---|---|---|
| OpenAI | ✅ | Extracts from response.usage.prompt_tokens |
| Cohere | ✅ | Extracts from response.meta.billed_units.input_tokens |
| Mistral | ✅ | Uses OpenAI SDK, provides token counts |
| Azure OpenAI | ✅ | Extracts from response.usage like OpenAI |
| Gemini | ❌ | API doesn't provide token counts for embeddings |
| Ollama | ❌ | Local embeddings, no token tracking |
| Cloudflare | ❌ | API doesn't track tokens |
| Mock | ❌ | Test embedding, no real tokens |
| Custom | ⚡ | Conditional - depends on underlying API |
Metadata Structure:
@dataclass
class EmbeddingMetadata:
id: str # Unique request ID
timestamp: str # ISO 8601 timestamp
model: str | None # Model name/version
tokens: TokenUsage | None # Token usage (if available)
duration: float # Request duration (seconds)
dimensions: int | None # Vector dimensions
num_texts: int # Number of texts embedded
@dataclass
class EmbeddingResult:
embeddings: list[list[float]] # The actual embedding vectors
metadata: EmbeddingMetadata # Complete metadata
Usage Examples:
# Use case 1: Need metadata for cost tracking
result = await embedder.aembed(["Text 1", "Text 2"])
vectors = result.embeddings
tokens = result.metadata.tokens # Track token usage for billing
duration = result.metadata.duration # Monitor performance
# Use case 2: Simple embedding without metadata
vector = await embedder.aembed_one("Query text") # Just returns the vector
# Use case 3: VectorStore integration (protocol compliance)
# These methods are used internally by VectorStore
vectors = await embedder.embed_texts(["Document 1", "Document 2"])
query_vec = await embedder.embed_query("Search query")
# Use case 4: Sync batch embedding
result = embedder.embed(large_batch) # Sync version for compatibility
Observability Benefits:
- Cost tracking — Monitor token usage across providers
- Performance — Track request duration and batch sizes
- Debugging — Trace requests with unique IDs and timestamps
- Model versioning — Know which embedding model version was used
- Capacity planning — Understand dimensions and text counts