SPANN
SPANN augments IVF with closure-based replica writes: boundary points are copied into every cell inside closure_factor × d_best, so a single-cell probe still finds them.
SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
Chen, Q.; Zhao, B.; Wang, H.; Li, M.; Liu, C.; Li, Z.; Yang, M.; Wang, J.
NeurIPS 2021
Memory-disk hybrid inverted-index ANN with closure-replica assignment and query-aware posting-list pruning — matches in-memory recall at billion scale.
Read paper at official venue →At a glance
SPANN is IVF with a twist: boundary points — vectors that sit near multiple centroids — are
written into every cell within closure_factor × d_best,
capped at replica_count. A single-cell probe then
still finds them, which closes the classic IVF recall gap at low nprobe.
The closure strategy is the paper's key contribution; query-time dedup removes the extra hits.
Parameters
| Option | Default | Notes |
|---|---|---|
| nlist / nprobe | 1024 / 32 | Same meaning as IVF. |
| replica_count | 8 | Hard cap on cells per vector. 8 matches the paper. |
| closure_factor | 1.1 | Distance threshold multiplier. Must be ≥ 1.0. |
Example
CREATE INDEX docs_idx ON docs USING SPANN (embedding)
WITH (metric='cosine', quantizer='rabitq', bits=3, rerank=10,
nlist=1024, nprobe=32,
replica_count=8, closure_factor=1.1);