Vector Embeddings (pgvector)¶

HyperDjango provides first-class pgvector integration for storing and searching vector embeddings directly in PostgreSQL. Combine the full power of a relational database with sub-millisecond approximate nearest neighbor (ANN) search -- no separate vector database needed.

Why Vector Search¶

Vector embeddings are fixed-length float arrays that encode semantic meaning. Text, images, and other unstructured data can be projected into vector space by an embedding model (OpenAI, Cohere, sentence-transformers, etc.), then searched by geometric distance. Use cases include:

Semantic search -- find documents by meaning, not keywords
Retrieval-augmented generation (RAG) -- feed relevant context into an LLM
Recommendation engines -- find items similar to a user's history
Deduplication -- detect near-duplicate content
Clustering -- group similar records without manual labels

pgvector stores vectors alongside your relational data in the same transaction, same backup, same replication stream. No ETL pipeline to a separate system.

Installation¶

Enable the pgvector extension in your PostgreSQL database:

CREATE EXTENSION IF NOT EXISTS vector;

Requires PostgreSQL 15+ and pgvector 0.5+. pgvector ships with most managed PostgreSQL providers (Supabase, Neon, RDS, Cloud SQL).

HyperDjango's pg.zig driver automatically registers the vector type OID at connection time, so vector columns are decoded natively in the binary protocol path -- no Python-side parsing overhead.

VectorField¶

VectorField declares a pgvector vector(N) column on a model. Import it alongside Model and Field:

from hyperdjango import Model, Field
from hyperdjango.models import VectorField


class Document(Model):
    class Meta:
        table = "documents"

    id: int = Field(primary_key=True, auto=True)
    title: str = Field()
    content: str = Field(default="")
    embedding: list[float] = VectorField(dimensions=1536)

Constructor Parameters¶

Parameter	Type	Default	Description
`dimensions`	`int`	`1536`	Fixed vector length. Must match your embedding model.
`index_type`	`str`	`"hnsw"`	Index algorithm: `"hnsw"` or `"ivfflat"`.
`index_ops`	`str`	`"vector_cosine_ops"`	Operator class for the distance metric (see table below).
`index`	`bool`	`True`	Whether to create a vector index on this column.
`index_params`	`dict[str, int]`	`None`	Index tuning parameters for the WITH clause (see below).

Operator Classes¶

`index_ops` value	Distance metric	pgvector operator	Best for
`"vector_cosine_ops"`	Cosine distance	`<=>`	Normalized embeddings (most APIs)
`"vector_l2_ops"`	Euclidean (L2) distance	`<->`	Spatial / geometric data
`"vector_ip_ops"`	Negative inner product	`<#>`	Maximum inner product search

Common Dimension Values¶

Embedding model	Dimensions
OpenAI `text-embedding-ada-002`	1536
OpenAI `text-embedding-3-small`	1536
OpenAI `text-embedding-3-large`	3072
Cohere `embed-english-v3.0`	1024
`all-MiniLM-L6-v2` (SBERT)	384
`all-mpnet-base-v2` (SBERT)	768

# 384-dim model with L2 distance
embedding: list[float] = VectorField(
    dimensions=384,
    index_ops="vector_l2_ops",
)

# 3072-dim model, IVFFlat index, no auto-index
embedding: list[float] = VectorField(
    dimensions=3072,
    index_type="ivfflat",
    index=False,
)

# HNSW with tuning parameters (generates WITH clause in DDL)
embedding: list[float] = VectorField(
    dimensions=1536,
    index_params={"m": 16, "ef_construction": 64},
)
# → CREATE INDEX ... USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64)

Index Tuning Parameters¶

The index_params dict is passed as a WITH (...) clause on the index DDL. Common parameters:

Index type	Parameter	Default	Description
HNSW	`m`	16	Max connections per node (higher = better recall, more memory)
HNSW	`ef_construction`	64	Build-time beam width (higher = better quality, slower build)
IVFFlat	`lists`	100	Number of inverted list partitions

Distance Lookups¶

HyperDjango registers four pgvector lookups on QuerySet.filter(). Each takes a tuple of (query_vector, threshold_or_metric).

cosine_distance¶

Filter rows where cosine distance to the query vector is below a threshold. Cosine distance ranges from 0 (identical) to 2 (opposite).

query_vec = await get_embedding("machine learning tutorial")

# Find documents within cosine distance 0.3
docs = await Document.objects.filter(
    embedding__cosine_distance=(query_vec, 0.3)
).all()

Generated SQL:

SELECT * FROM documents WHERE embedding <=> $1::vector < $2
-- $1 = '[0.012, -0.045, ...]', $2 = 0.3

l2_distance¶

Filter by Euclidean (L2) distance. Useful for spatial or geometric embeddings.

nearby = await Point.objects.filter(
    embedding__l2_distance=(query_vec, 1.5)
).all()

Generated SQL:

SELECT * FROM points WHERE embedding <-> $1::vector < $2

inner_product¶

Filter by negative inner product. pgvector's <#> operator returns the negative inner product, so smaller values mean higher similarity.

similar = await Item.objects.filter(
    embedding__inner_product=(query_vec, -0.8)
).all()

Generated SQL:

SELECT * FROM items WHERE embedding <#> $1::vector < $2

nearest¶

Order results by distance for K-nearest-neighbor queries. Pair with .limit() to get the top-K results. The second element of the tuple selects the distance metric: "cosine", "l2", or "inner_product".

# Top 10 nearest by cosine distance
top_10 = await Document.objects.filter(
    embedding__nearest=(query_vec, "cosine")
).limit(10).all()

# Top 5 nearest by L2 distance
top_5 = await Document.objects.filter(
    embedding__nearest=(query_vec, "l2")
).limit(5).all()

The nearest lookup produces an IS NOT NULL condition (always true for non-null vectors) combined with an ORDER BY clause using the selected distance operator. The actual filtering happens via the LIMIT.

Vector Indexes¶

Indexing is critical for vector search performance. Without an index, pgvector scans every row (exact search). With an index, it uses approximate nearest neighbor algorithms that trade a small amount of recall for orders-of-magnitude speed improvement.

HNSW vs IVFFlat¶

Property	HNSW	IVFFlat
Query speed	Faster	Slower
Build speed	Slower	Faster
Memory usage	Higher	Lower
Recall at same speed	Higher	Lower
Insert performance	Good (no rebuild needed)	Requires periodic re-clustering
Best for	Most workloads, real-time apps	Large static datasets, bulk loads

Default recommendation: Use HNSW unless you have a specific reason to use IVFFlat (e.g., very large datasets where build time matters more than query latency).

CreateVectorIndex¶

The migration system provides CreateVectorIndex for managing vector indexes:

from hyperdjango.migrations import CreateVectorIndex

# HNSW index (default)
CreateVectorIndex(
    table="documents",
    column="embedding",
    index_type="hnsw",
    index_ops="vector_cosine_ops",
    m=16,                # Max connections per layer (default: 16)
    ef_construction=64,  # Construction search width (default: 64)
)

# IVFFlat index
CreateVectorIndex(
    table="documents",
    column="embedding",
    index_type="ivfflat",
    index_ops="vector_cosine_ops",
    lists=100,  # Number of IVF lists (default: 100)
)

CreateVectorIndex Parameters¶

Parameter	Type	Default	Description
`table`	`str`	--	Table name.
`column`	`str`	--	Column name.
`index_type`	`str`	`"hnsw"`	`"hnsw"` or `"ivfflat"`.
`index_ops`	`str`	`"vector_cosine_ops"`	Operator class matching the distance metric you query with.
`m`	`int`	`16`	HNSW only. Max connections per node per layer.
`ef_construction`	`int`	`64`	HNSW only. Size of the dynamic candidate list during build.
`lists`	`int`	`100`	IVFFlat only. Number of inverted lists (clusters).

Generated SQL for HNSW:

CREATE INDEX idx_documents_embedding_hnsw ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Generated SQL for IVFFlat:

CREATE INDEX idx_documents_embedding_ivfflat ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Rollback automatically drops the index:

DROP INDEX IF EXISTS idx_documents_embedding_hnsw;

Tuning HNSW Parameters¶

m (default 16): Higher values improve recall but increase memory and build time. Range: 4--64. Use 32--48 for high-recall requirements.
ef_construction (default 64): Higher values improve index quality at the cost of build time. Range: 16--512. Use 128--256 for production datasets.
ef_search (runtime): Set via SET hnsw.ef_search = 100 before querying. Higher values improve recall at query time. Default is 40.

Tuning IVFFlat Parameters¶

lists (default 100): Number of clusters. Rule of thumb: sqrt(num_rows) for up to 1M rows, sqrt(num_rows / 1000) for larger datasets.
probes (runtime): Set via SET ivfflat.probes = 10 before querying. Higher values check more lists, improving recall. Default is 1.

Examples¶

Semantic Search¶

Store document embeddings and search by meaning:

from hyperdjango import Model, Field
from hyperdjango.models import VectorField


class Article(Model):
    class Meta:
        table = "articles"

    id: int = Field(primary_key=True, auto=True)
    title: str = Field()
    body: str = Field(default="")
    embedding: list[float] = VectorField(dimensions=1536)


async def search_articles(query: str, limit: int = 10) -> list[Article]:
    query_vec = await embed(query)  # Your embedding function

    return await Article.objects.filter(
        embedding__nearest=(query_vec, "cosine")
    ).limit(limit).all()


async def find_similar(article: Article, limit: int = 5) -> list[Article]:
    return await Article.objects.filter(
        embedding__nearest=(article.embedding, "cosine")
    ).exclude(id=article.id).limit(limit).all()

RAG Pattern¶

Retrieve relevant context and feed it to an LLM:

async def answer_question(question: str) -> str:
    # 1. Embed the question
    query_vec = await embed(question)

    # 2. Retrieve top-5 most relevant chunks
    chunks = await Chunk.objects.filter(
        embedding__cosine_distance=(query_vec, 0.5)
    ).limit(5).all()

    # 3. Build context
    context = "\n\n".join(chunk.text for chunk in chunks)

    # 4. Generate answer with context
    return await llm_complete(
        system="Answer based on the provided context.",
        user=f"Context:\n{context}\n\nQuestion: {question}",
    )

Recommendation Engine¶

Recommend products based on a user's purchase history:

class Product(Model):
    class Meta:
        table = "products"

    id: int = Field(primary_key=True, auto=True)
    name: str = Field()
    description: str = Field(default="")
    embedding: list[float] = VectorField(dimensions=768)


async def recommend_for_user(user_id: int, limit: int = 10) -> list[Product]:
    # Get embeddings of user's recent purchases
    purchased = await db.query(
        "SELECT p.embedding FROM products p "
        "JOIN orders o ON o.product_id = p.id "
        "WHERE o.user_id = $1 ORDER BY o.created_at DESC LIMIT 5",
        user_id,
    )

    if not purchased:
        return []

    # Average the purchase embeddings into a user preference vector
    dims = len(purchased[0]["embedding"])
    avg_vec = [0.0] * dims
    for row in purchased:
        for i, v in enumerate(row["embedding"]):
            avg_vec[i] += v
    avg_vec = [v / len(purchased) for v in avg_vec]

    # Find nearest products (excluding already purchased)
    purchased_ids = await db.query(
        "SELECT product_id FROM orders WHERE user_id = $1", user_id
    )
    exclude_ids = [r["product_id"] for r in purchased_ids]

    candidates = await Product.objects.filter(
        embedding__nearest=(avg_vec, "cosine")
    ).limit(limit + len(exclude_ids)).all()

    return [p for p in candidates if p.id not in exclude_ids][:limit]

Performance¶

SIMD Binary Decode¶

pg.zig decodes vector columns directly from the PostgreSQL binary wire format using SIMD instructions. A 1536-dimensional vector (6 KB on the wire) is decoded without Python-side float parsing -- the raw IEEE 754 float32 bytes are read directly into a Python list.

Vector Formatting¶

Vectors are formatted for PostgreSQL using _format_vector(), which converts a Python list[float] to pgvector's text format: [0.1,0.2,0.3]. String vectors pass through unchanged.

Index Tuning Tips¶

Always index vector columns in production. Without an index, every query is an O(n) sequential scan. VectorField creates an index by default (index=True).
Match your operator class to your queries. If you query with cosine_distance, index with vector_cosine_ops. Mismatched operator classes force sequential scans.
Build IVFFlat indexes after loading data. IVFFlat clusters the data during index creation. Building on an empty table produces a useless index. HNSW does not have this limitation.
Increase ef_search for higher recall. The default hnsw.ef_search = 40 is tuned for speed. For applications where missing a relevant result is costly (RAG, medical search), increase to 100--200:

SET hnsw.ef_search = 200;

Use EXPLAIN ANALYZE to verify index usage. If you see Seq Scan instead of Index Scan, check that your operator class matches and that the table has been VACUUMed after bulk inserts.
Normalize your vectors. Cosine distance on normalized (unit-length) vectors is equivalent to inner product distance, letting you use either operator interchangeably. Most embedding APIs return normalized vectors by default.
Batch your embedding API calls. The database query is fast (sub-millisecond with an index). The bottleneck is usually the embedding API call. Batch multiple texts into a single API request where possible.