vindex

DuckDB Vector Search Extension

High-performance vector similarity search natively integrated into DuckDB. Supporting HNSW, IVF, DiskANN, and SPANN with advanced quantization layers.

GitHub
Quick Install
Version v0.1.0
Hierarchy & Architecture
SQL Implementation Example
-- Create a table with vector columnsCREATE TABLE items (id INTEGER, embedding FLOAT[1536]);-- Create an approximate-nearest-neighbor index using HNSW + RaBitQCREATE INDEX idx ON items USING hnsw(embedding)WITH (  metric = 'cosine',  quantizer = 'rabitq',  bits = 3,  rerank = 10,  m = 16,  ef_construction = 128,  ef_search = 64);-- Perform approximate nearest-neighbor searchSELECT id, array_distance(embedding, [0.1, 0.2, ...]::FLOAT[1536]) AS distFROM itemsORDER BY dist ASCLIMIT 10;
Feature Details
Algorithms
Algorithm

HNSW

Graph-based ANN

Hierarchical Navigable Small World: a multi-layer proximity graph. Upper layers give long-range shortcuts, lower layers refine locally — log-scale search with state-of-the-art recall.

Typical Recall@10 ceilingRecallSingle-query speedLatencyIndex construction throughputBuildRAM efficiency per vectorMemoryMax dataset size supportedScaleInsert / delete costWrites
Recall5/5
Latency5/5
Build2/5
Memory1/5
Scale3/5
Writes4/5
Strengths
  • Best-in-class recall/latency trade-off for in-memory workloads
  • Simple online inserts — no rebuild needed
  • Tunable via a single knob (ef_search) at query time
Trade-offs
  • Graph must fit in RAM — billion-scale needs sharding
  • Build throughput slower than IVF (global graph, not independent cells)
  • Delete is a tombstone; heavy churn benefits from periodic rebuild
Algorithm

IVF

Inverted File / Coarse Partitioning

Inverted File index: k-means partitions the space into nlist cells, queries probe the nprobe closest centroids. Cheap to build, recall/speed are tunable in one knob.

Typical Recall@10 ceilingRecallSingle-query speedLatencyIndex construction throughputBuildRAM efficiency per vectorMemoryMax dataset size supportedScaleInsert / delete costWrites
Recall3/5
Latency3/5
Build5/5
Memory4/5
Scale4/5
Writes4/5
Strengths
  • Very fast build: cells are independent, embarrassingly parallel
  • Memory overhead is tiny (just the centroids)
  • Pairs naturally with PQ/RaBitQ for billion-scale
Trade-offs
  • Boundary vectors can be missed if nprobe is too small
  • Recall/latency curve is less steep than HNSW
  • nlist must be re-tuned if data distribution drifts
Algorithm

DiskANN

Out-of-core Vamana Graph

Vamana graph with codes stored out-of-band — the graph blocks evict through the DuckDB buffer pool so the index can exceed RAM while keeping billion-scale recall.

Typical Recall@10 ceilingRecallSingle-query speedLatencyIndex construction throughputBuildRAM efficiency per vectorMemoryMax dataset size supportedScaleInsert / delete costWrites
Recall4/5
Latency3/5
Build2/5
Memory5/5
Scale5/5
Writes2/5
Strengths
  • Index size can exceed RAM — DuckDB buffer pool evicts graph blocks
  • Codes are out-of-band → graph block is tiny, cache-friendly
  • Recall rivals HNSW at billion scale with PQ / RaBitQ
Trade-offs
  • Requires a compressing quantizer (FLAT is rejected at bind time)
  • Slower build: Vamana does two refinement passes
  • Query tail latency is SSD-bound when hot set is cold
Algorithm

SPANN

IVF + Closure Replicas

SPANN augments IVF with closure-based replica writes: boundary points are copied into every cell inside closure_factor × d_best, so a single-cell probe still finds them.

Typical Recall@10 ceilingRecallSingle-query speedLatencyIndex construction throughputBuildRAM efficiency per vectorMemoryMax dataset size supportedScaleInsert / delete costWrites
Recall4/5
Latency4/5
Build3/5
Memory4/5
Scale5/5
Writes2/5
Strengths
  • Recall of an in-memory index at billion scale
  • Single-cell probe is enough — no expensive multi-probe
  • Posting lists are independently cacheable
Trade-offs
  • Writes amplify by replica_count (typically 4–8×)
  • Closure assignment requires global centroid pass
  • Still bounded by IVF quality in pathological cases
Quantizers
Quantizer

FLAT

Uncompressed float32

No compression — each vector stored as the original FLOAT[d]. Baseline for recall; use when memory isn't the bottleneck.

Bytes saved vs float32CompressDistance-estimate accuracy pre-rerankFidelityEncode throughputEncodeTraining cost (higher = cheaper)TrainRecall ceiling with rerankHeadroomSIMD-friendliness of distance kernelSIMD
Compress0/5
Fidelity5/5
Encode5/5
Train5/5
Headroom5/5
SIMD4/5
Strengths
  • Recall ceiling — no approximation error
  • Zero training cost, zero codebook state
  • Fastest encode / decode path
Trade-offs
  • Memory = 4 bytes × dim × N — blows up past ~10M rows
  • Rejected by DiskANN (defeats the >RAM layout)
  • No rerank needed, but also no headroom for tuning
Quantizer

RaBitQ

Bit-packed Quantizer

Randomized rotation + per-dimension bit packing with a provable distance-error bound. 3-bit codes hit >99% Recall@10 on SIFT1M when paired with a rerank pass.

Bytes saved vs float32CompressDistance-estimate accuracy pre-rerankFidelityEncode throughputEncodeTraining cost (higher = cheaper)TrainRecall ceiling with rerankHeadroomSIMD-friendliness of distance kernelSIMD
Compress5/5
Fidelity4/5
Encode5/5
Train4/5
Headroom5/5
SIMD5/5
Strengths
  • Provable unbiased error bound — predictable recall
  • SIMD-friendly: distance is popcount + dot
  • 1–4 bits per dim; 3 bits already matches PQ m=16
Trade-offs
  • Needs a rerank pass for full recall
  • Random rotation matrix adds 4·d² bytes of state
  • IP metric requires vectors to be normalized
Quantizer

PQ

Product Quantization

Classical product quantization: each vector split into m sub-vectors, each independently k-means quantized. Compact codes, ADC-based distance lookup.

Bytes saved vs float32CompressDistance-estimate accuracy pre-rerankFidelityEncode throughputEncodeTraining cost (higher = cheaper)TrainRecall ceiling with rerankHeadroomSIMD-friendliness of distance kernelSIMD
Compress4/5
Fidelity3/5
Encode4/5
Train2/5
Headroom4/5
SIMD4/5
Strengths
  • Battle-tested — underpins every billion-scale ANN system since 2011
  • Memory scales in m·log2(k) bits — very flexible
  • ADC distance lookup is cache-friendly
Trade-offs
  • Training is m independent k-means — costs O(N·k·d) per segment
  • Isotropic L2 loss leaks error into query-parallel direction
  • Asymmetric distance is biased without rerank
Quantizer

ScaNN

Anisotropic Vector Quantization

ScaNN weights quantization errors parallel to the query-vector direction more heavily than orthogonal ones — producing more accurate inner-product estimates than classical PQ.

Bytes saved vs float32CompressDistance-estimate accuracy pre-rerankFidelityEncode throughputEncodeTraining cost (higher = cheaper)TrainRecall ceiling with rerankHeadroomSIMD-friendliness of distance kernelSIMD
Compress4/5
Fidelity5/5
Encode4/5
Train1/5
Headroom4/5
SIMD4/5
Strengths
  • More accurate inner-product estimates than classical PQ at same memory
  • Anisotropy knob (eta) trades fidelity for training cost
  • Same code layout as PQ — drop-in swap
Trade-offs
  • Cosine metric is unsupported — must normalize + use IP
  • Lloyd loop needs the eta-weighted loss; slower than PQ training
  • Benefit shrinks when the data is near-isotropic