DiskANN
Vamana graph with codes stored out-of-band — the graph blocks evict through the DuckDB buffer pool so the index can exceed RAM while keeping billion-scale recall.
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
Subramanya, S. J.; Devvrit, F.; Simhadri, H. V.; Krishnaswamy, R.; Kadekodi, R.
NeurIPS 2019
Presents the Vamana graph-construction algorithm plus an SSD-resident layout enabling billion-scale ANN search on one commodity machine.
Read paper at official venue →At a glance
DiskANN builds a Vamana graph tuned for billion-scale on-disk ANN. Unlike HNSW, the graph's neighbor blocks do not inline the vector codes — codes live in a separate, always-resident array, and the graph blocks evict through DuckDB's buffer pool. This decoupling lets the index grow past RAM while maintaining >95% Recall@10 at L_search=100.
Parameters
| Option | Default | Notes |
|---|---|---|
| diskann_r | 64 | Max out-degree per node (paper's R). |
| diskann_l | 100 | Beam size for both build and query (paper's L / Lsearch). |
| diskann_alpha | 1.2 | Robust-prune parameter — higher prefers long-range edges. |
| quantizer | required | Must be pq or rabitq. flat is rejected — the out-of-band-code layout assumes compression. |
Example
CREATE INDEX docs_idx ON docs USING DISKANN (embedding)
WITH (metric='cosine', quantizer='pq', m=16, bits=8, rerank=10,
diskann_r=64, diskann_l=100);