vindex

DuckDB Vector Search Extension

SPANN augments IVF with closure-based replica writes: boundary points are copied into every cell inside closure_factor × d_best, so a single-cell probe still finds them.

GitHub
Quick Install
Version v0.1.0
← Back to overview
Algorithm

SPANN

SPANN augments IVF with closure-based replica writes: boundary points are copied into every cell inside closure_factor × d_best, so a single-cell probe still finds them.

Reference Paper — Credits

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

Chen, Q.; Zhao, B.; Wang, H.; Li, M.; Liu, C.; Li, Z.; Yang, M.; Wang, J.

NeurIPS 2021

Memory-disk hybrid inverted-index ANN with closure-replica assignment and query-aware posting-list pruning — matches in-memory recall at billion scale.

Read paper at official venue →

At a glance

SPANN is IVF with a twist: boundary points — vectors that sit near multiple centroids — are written into every cell within closure_factor × d_best, capped at replica_count. A single-cell probe then still finds them, which closes the classic IVF recall gap at low nprobe. The closure strategy is the paper's key contribution; query-time dedup removes the extra hits.

Parameters

OptionDefaultNotes
nlist / nprobe1024 / 32Same meaning as IVF.
replica_count8Hard cap on cells per vector. 8 matches the paper.
closure_factor1.1Distance threshold multiplier. Must be ≥ 1.0.

Example

CREATE INDEX docs_idx ON docs USING SPANN (embedding)
    WITH (metric='cosine', quantizer='rabitq', bits=3, rerank=10,
          nlist=1024, nprobe=32,
          replica_count=8, closure_factor=1.1);