vindex

DuckDB Vector Search Extension

Vamana graph with codes stored out-of-band — the graph blocks evict through the DuckDB buffer pool so the index can exceed RAM while keeping billion-scale recall.

GitHub
Quick Install
Version v0.1.0
← Back to overview
Algorithm

DiskANN

Vamana graph with codes stored out-of-band — the graph blocks evict through the DuckDB buffer pool so the index can exceed RAM while keeping billion-scale recall.

Reference Paper — Credits

DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Subramanya, S. J.; Devvrit, F.; Simhadri, H. V.; Krishnaswamy, R.; Kadekodi, R.

NeurIPS 2019

Presents the Vamana graph-construction algorithm plus an SSD-resident layout enabling billion-scale ANN search on one commodity machine.

Read paper at official venue →

At a glance

DiskANN builds a Vamana graph tuned for billion-scale on-disk ANN. Unlike HNSW, the graph's neighbor blocks do not inline the vector codes — codes live in a separate, always-resident array, and the graph blocks evict through DuckDB's buffer pool. This decoupling lets the index grow past RAM while maintaining >95% Recall@10 at L_search=100.

Parameters

OptionDefaultNotes
diskann_r64Max out-degree per node (paper's R).
diskann_l100Beam size for both build and query (paper's L / Lsearch).
diskann_alpha1.2Robust-prune parameter — higher prefers long-range edges.
quantizerrequiredMust be pq or rabitq. flat is rejected — the out-of-band-code layout assumes compression.

Example

CREATE INDEX docs_idx ON docs USING DISKANN (embedding)
    WITH (metric='cosine', quantizer='pq', m=16, bits=8, rerank=10,
          diskann_r=64, diskann_l=100);