vindex

DuckDB Vector Search Extension

Inverted File index: k-means partitions the space into nlist cells, queries probe the nprobe closest centroids. Cheap to build, recall/speed are tunable in one knob.

GitHub
Quick Install
Version v0.1.0
← Back to overview
Algorithm

IVF

Inverted File index: k-means partitions the space into nlist cells, queries probe the nprobe closest centroids. Cheap to build, recall/speed are tunable in one knob.

Reference Paper — Credits

Product Quantization for Nearest Neighbor Search

Jégou, H.; Douze, M.; Schmid, C.

IEEE TPAMI 2011

Formalizes the IVFADC structure (coarse quantizer + residual PQ) that underpins every modern billion-scale ANN system.

Read paper at official venue →

At a glance

Inverted File index partitions the vector space with k-means into nlist Voronoi cells. Each vector is assigned to its closest centroid and stored in that cell's posting list. At query time we probe the nprobe closest centroids and scan their posting lists. Build is cheap, recall/speed is tunable with a single knob, and any of the quantizers plug in orthogonally (IVF-Flat / IVF-RaBitQ / IVF-PQ / IVF-ScaNN).

Parameters

OptionDefaultNotes
nlist1024Number of Voronoi cells. Rule of thumb: √N for small datasets, 4√N at scale.
nprobe32Cells visited per query. Higher recall ↔ higher latency.
quantizer'flat'flat / rabitq / pq / scann all supported.

Example

CREATE INDEX docs_idx ON docs USING IVF (embedding)
    WITH (metric='cosine', quantizer='rabitq', bits=3, rerank=10,
          nlist=1024, nprobe=32);