DuckDB Vector Search Extension

HNSW

Graph-based ANN

Hierarchical Navigable Small World: a multi-layer proximity graph. Upper layers give long-range shortcuts, lower layers refine locally — log-scale search with state-of-the-art recall.

Recall5/5

Latency5/5

Build2/5

Memory1/5

Scale3/5

Writes4/5

Best-in-class recall/latency trade-off for in-memory workloads
Simple online inserts — no rebuild needed
Tunable via a single knob (ef_search) at query time

Graph must fit in RAM — billion-scale needs sharding
Build throughput slower than IVF (global graph, not independent cells)
Delete is a tombstone; heavy churn benefits from periodic rebuild

Inverted File / Coarse Partitioning

IVF

Inverted File index: k-means partitions the space into nlist cells, queries probe the nprobe closest centroids. Cheap to build, recall/speed are tunable in one knob.

Recall3/5

Latency3/5

Build5/5

Memory4/5

Scale4/5

Writes4/5

Very fast build: cells are independent, embarrassingly parallel
Memory overhead is tiny (just the centroids)
Pairs naturally with PQ/RaBitQ for billion-scale

Boundary vectors can be missed if nprobe is too small
Recall/latency curve is less steep than HNSW
nlist must be re-tuned if data distribution drifts

DiskANN

Out-of-core Vamana Graph

Vamana graph with codes stored out-of-band — the graph blocks evict through the DuckDB buffer pool so the index can exceed RAM while keeping billion-scale recall.

Recall4/5

Latency3/5

Build2/5

Memory5/5

Scale5/5

Writes2/5

Index size can exceed RAM — DuckDB buffer pool evicts graph blocks
Codes are out-of-band → graph block is tiny, cache-friendly
Recall rivals HNSW at billion scale with PQ / RaBitQ

Requires a compressing quantizer (FLAT is rejected at bind time)
Slower build: Vamana does two refinement passes
Query tail latency is SSD-bound when hot set is cold

SPANN

IVF + Closure Replicas

SPANN augments IVF with closure-based replica writes: boundary points are copied into every cell inside closure_factor × d_best, so a single-cell probe still finds them.

Recall4/5

Latency4/5

Build3/5

Memory4/5

Scale5/5

Writes2/5

Recall of an in-memory index at billion scale
Single-cell probe is enough — no expensive multi-probe
Posting lists are independently cacheable

Writes amplify by replica_count (typically 4–8×)
Closure assignment requires global centroid pass
Still bounded by IVF quality in pathological cases

Quantizers

FLAT

Uncompressed float32

No compression — each vector stored as the original FLOAT[d]. Baseline for recall; use when memory isn't the bottleneck.

Compress0/5

Fidelity5/5

Encode5/5

Train5/5

Headroom5/5

SIMD4/5

Recall ceiling — no approximation error
Zero training cost, zero codebook state
Fastest encode / decode path

Memory = 4 bytes × dim × N — blows up past ~10M rows
Rejected by DiskANN (defeats the >RAM layout)
No rerank needed, but also no headroom for tuning

RaBitQ

Bit-packed Quantizer

Randomized rotation + per-dimension bit packing with a provable distance-error bound. 3-bit codes hit >99% Recall@10 on SIFT1M when paired with a rerank pass.

Compress5/5

Fidelity4/5

Encode5/5

Train4/5

Headroom5/5

SIMD5/5

Provable unbiased error bound — predictable recall
SIMD-friendly: distance is popcount + dot
1–4 bits per dim; 3 bits already matches PQ m=16

Needs a rerank pass for full recall
Random rotation matrix adds 4·d² bytes of state
IP metric requires vectors to be normalized

PQ

Product Quantization

Classical product quantization: each vector split into m sub-vectors, each independently k-means quantized. Compact codes, ADC-based distance lookup.

Compress4/5

Fidelity3/5

Encode4/5

Train2/5

Headroom4/5

SIMD4/5

Battle-tested — underpins every billion-scale ANN system since 2011
Memory scales in m·log2(k) bits — very flexible
ADC distance lookup is cache-friendly

Training is m independent k-means — costs O(N·k·d) per segment
Isotropic L2 loss leaks error into query-parallel direction
Asymmetric distance is biased without rerank

Anisotropic Vector Quantization

ScaNN

ScaNN weights quantization errors parallel to the query-vector direction more heavily than orthogonal ones — producing more accurate inner-product estimates than classical PQ.

Compress4/5

Fidelity5/5

Encode4/5

Train1/5

Headroom4/5

SIMD4/5

More accurate inner-product estimates than classical PQ at same memory
Anisotropy knob (eta) trades fidelity for training cost
Same code layout as PQ — drop-in swap