-- Create a table with vector columnsCREATE TABLE items (id INTEGER, embedding FLOAT[1536]);-- Create an approximate-nearest-neighbor index using HNSW + RaBitQCREATE INDEX idx ON items USING hnsw(embedding)WITH ( metric = 'cosine', quantizer = 'rabitq', bits = 3, rerank = 10, m = 16, ef_construction = 128, ef_search = 64);-- Perform approximate nearest-neighbor searchSELECT id, array_distance(embedding, [0.1, 0.2, ...]::FLOAT[1536]) AS distFROM itemsORDER BY dist ASCLIMIT 10;HNSW
Hierarchical Navigable Small World: a multi-layer proximity graph. Upper layers give long-range shortcuts, lower layers refine locally — log-scale search with state-of-the-art recall.
- Best-in-class recall/latency trade-off for in-memory workloads
- Simple online inserts — no rebuild needed
- Tunable via a single knob (ef_search) at query time
- Graph must fit in RAM — billion-scale needs sharding
- Build throughput slower than IVF (global graph, not independent cells)
- Delete is a tombstone; heavy churn benefits from periodic rebuild
IVF
Inverted File index: k-means partitions the space into nlist cells, queries probe the nprobe closest centroids. Cheap to build, recall/speed are tunable in one knob.
- Very fast build: cells are independent, embarrassingly parallel
- Memory overhead is tiny (just the centroids)
- Pairs naturally with PQ/RaBitQ for billion-scale
- Boundary vectors can be missed if nprobe is too small
- Recall/latency curve is less steep than HNSW
- nlist must be re-tuned if data distribution drifts
DiskANN
Vamana graph with codes stored out-of-band — the graph blocks evict through the DuckDB buffer pool so the index can exceed RAM while keeping billion-scale recall.
- Index size can exceed RAM — DuckDB buffer pool evicts graph blocks
- Codes are out-of-band → graph block is tiny, cache-friendly
- Recall rivals HNSW at billion scale with PQ / RaBitQ
- Requires a compressing quantizer (FLAT is rejected at bind time)
- Slower build: Vamana does two refinement passes
- Query tail latency is SSD-bound when hot set is cold
SPANN
SPANN augments IVF with closure-based replica writes: boundary points are copied into every cell inside closure_factor × d_best, so a single-cell probe still finds them.
- Recall of an in-memory index at billion scale
- Single-cell probe is enough — no expensive multi-probe
- Posting lists are independently cacheable
- Writes amplify by replica_count (typically 4–8×)
- Closure assignment requires global centroid pass
- Still bounded by IVF quality in pathological cases
FLAT
No compression — each vector stored as the original FLOAT[d]. Baseline for recall; use when memory isn't the bottleneck.
- Recall ceiling — no approximation error
- Zero training cost, zero codebook state
- Fastest encode / decode path
- Memory = 4 bytes × dim × N — blows up past ~10M rows
- Rejected by DiskANN (defeats the >RAM layout)
- No rerank needed, but also no headroom for tuning
RaBitQ
Randomized rotation + per-dimension bit packing with a provable distance-error bound. 3-bit codes hit >99% Recall@10 on SIFT1M when paired with a rerank pass.
- Provable unbiased error bound — predictable recall
- SIMD-friendly: distance is popcount + dot
- 1–4 bits per dim; 3 bits already matches PQ m=16
- Needs a rerank pass for full recall
- Random rotation matrix adds 4·d² bytes of state
- IP metric requires vectors to be normalized
PQ
Classical product quantization: each vector split into m sub-vectors, each independently k-means quantized. Compact codes, ADC-based distance lookup.
- Battle-tested — underpins every billion-scale ANN system since 2011
- Memory scales in m·log2(k) bits — very flexible
- ADC distance lookup is cache-friendly
- Training is m independent k-means — costs O(N·k·d) per segment
- Isotropic L2 loss leaks error into query-parallel direction
- Asymmetric distance is biased without rerank
ScaNN
ScaNN weights quantization errors parallel to the query-vector direction more heavily than orthogonal ones — producing more accurate inner-product estimates than classical PQ.
- More accurate inner-product estimates than classical PQ at same memory
- Anisotropy knob (eta) trades fidelity for training cost
- Same code layout as PQ — drop-in swap
- Cosine metric is unsupported — must normalize + use IP
- Lloyd loop needs the eta-weighted loss; slower than PQ training
- Benefit shrinks when the data is near-isotropic