Why HNSW is not the answer and disk-based alternatives might be more practical

(blog.pgvecto.rs)

131 points by kevlened a day ago | 60 comments

SIMD-accelerated search over binary vectors is very fast and you can fit a lot in memory. Then rerank over f32 from disk.

I tried a few alternatives and found that SQLite + usearch extension wins for my use cases (< 1 million records), as measured by latency and simplicity. I believe this approach can scale up to hundreds of millions of records.

wild_egg 13 hours ago | root | parent |

I've been using DuckDB similarly and loving the simplicity. What sort of latency are you seeing with your setup? I'm hitting 300ms to query against 10M vectors and could see that becoming a bottleneck if going into the hundreds of millions.

ashvardanian 8 hours ago | root | parent | next |

DuckDB also uses USearch and the underlying SimSIMD library for kernels, but both were originally designed for a Billion+ scale.

Depending on how the wrapper is implemented you can get different numbers. With the raw libraries, on larger machines, I'd expect 200'000 requests per second for 1 Billion entries.

VoVAllen 13 hours ago | root | parent | prev |

Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 60ms-level latency when using SSD with limited memory.

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

generall 20 hours ago | prev | next |

IVF, unfortunately, is barely compatible with filtered search. It have to rely on post-filtering and retrieve more and more candidates if the result set is not big enough. If the query is in some form correlated with the filter, this approach quickly degrades to brute-force.

Surprised that the article doesn't mention filtering use-case at all.

VoVAllen 13 hours ago | root | parent |

Hi, I'm the author of the article. I actually think the opposite of what you mentioned. IVF is more suitable for prefiltering compared to HNSW. For prefiltering in HNSW, there is a certain requirement for the filtering rate—it can't be too low, or the entire graph may become disconnected. For instance, with the commonly used parameter m=16, each node can have at most 16 neighbors. If the filtering rate is below 5%, it can directly result in no neighbors meeting the condition, causing the algorithm to fail entirely. This is why the academic community has proposed alternatives like ACRON[1]. On the other hand, IVF doesn't have this problem at all—you can check whether a candidate meets the condition before calculating distances.

[1] https://arxiv.org/abs/2403.04871

generall 8 hours ago | root | parent |

In IVF you can start checking conditions only in the final bucket. There are no guarantees if the bucket has any acceptable value, and there are no procedures to find the bucket which has acceptable values before scanning it.

VoVAllen 7 hours ago | root | parent |

I'm not sure I fully understand your point. This can be implemented quite easily by organizing each posting list with specific attributes to locate the values efficiently. For example, you could build each posting list into a B-tree, or similar to a column-oriented format that stores attribute statistics to enable skip scanning. Could you elaborate more?

tveita 20 hours ago | prev | next |

> For example, the typical distance computation complexity between two D-dimensional vectors is O(D^2), but compressing floats into bits reduces this by a factor of 1024 (32x32).

This isn't right, cosine distance between two vectors is definitely O(D)…

Of course replacing float multiplications with xor+popcount is a nice speedup in computation, but assuming you're memory bandwidth limited, speedup should be linear.

VoVAllen 13 hours ago | root | parent | next |

Hi, I'm the author of the article. I think you're right, and we’ll update the description in the blog. Since binary operations are simpler than floating-point operations, the speedup could indeed be greater than 32x.

DoctorOetker 16 hours ago | root | parent | prev |

Some snippets:

> Leverages concentration of measure phenomena

> Uses anisotropic vector quantization to optimize inner product accuracy by penalizing errors in directions that impact high inner products, achieving superior recall and speed.

I only skimmed the article, but the 2 words I emphasized seems to imply they apply a quadratic metric for distance, i.e. they assume the data coordinates are with respect to non-orthogonal basis vectors, resulting in off-diagonal distance metric terms.

VoVAllen 13 hours ago | root | parent |

Hi, I'm the author of the article. We actually rely on a new quantization method called RaBitQ instead of ScaNN. You can read more about it at https://dev.to/gaoj0017/quantization-in-the-counterintuitive....

ahevenor 2 hours ago | prev | next |

HNSW can be used for large indexes when combined with an effective caching strategy. Aerospike uses a distributed cache with query steering approach which allows you to have many indexes, including large ones and load them into memory as your application needs them.

ahevenor 2 hours ago | root | parent |

https://aerospike.com/docs/vector/learn/caching

ayende 20 hours ago | prev | next |

The problem with IVF is that you need to find the right centroids. And that doesn't work well if your data grow and mutate over time.

Splitting a centroid is a pretty complex issue.

As are clustering in an area. For example, let's assume that you hold StackOverflow questions & answers. Now you have a massive amount of additional data (> 25% of the existing dataset) that talks about Rust.

You either need to re-calculate the centroids globally, or find a good way to split.

The posting list are easy to use, but if you are unbalanced, it gets really bad.

VoVAllen 13 hours ago | root | parent |

Hi, I'm the author of the article. Meta have conducted some experiments on dynamic IVF with datasets of several hundred million records. The conclusion was that recall can be maintained through simple repartitioning and rebalancing strategies. You can find more details here: DEDRIFT: Robust Similarity Search under Content Drift https://arxiv.org/pdf/2308.02752. Additionally, with the help of GPUs, KMeans can be computed quickly, making the cost of rebuilding the entire index acceptable in many cases.

PaulHoule 21 hours ago | prev | next |

I've been doing vector search since 2002 or so and it's amazing how far you can get keeping vectors in RAM and using primitive search algorithms, enough that I'm afraid the market for vector databases is 1% of what VC's think it is. (e.g. full scan was good enough for a search engine of international patents and non-patent literature)

montebicyclelo 20 hours ago | root | parent | next |

This 100%. Vector DBs have been heavily pushed by vector DB companies and cloud providers, and despite companies often having mere MBs of documents to search through for their use cases, amounts that trivially fit into RAM / can be searched via dot product in milliseconds, managers and engineers less familiar with the space think people are doing it wrong if they don't use a vector db. So.. vector DBs end up getting used when actually a simpler in-memory, non-approximate, solution would be fine, (but less monetizable)

This is so true. A plain old exhaustive SIMD-optimized similarity search will do just fine in many cases and not have any of the approximation tradeoffs of HNSW.

PhilippGille 21 hours ago | root | parent |

In chromem-go [1] I'm searching through 100,000 vectors in 40ms on a mid-range laptop CPU, even without SIMD. It's quick enough for many use cases.

[1] https://github.com/philippgille/chromem-go

PaulHoule 21 hours ago | root | parent |

It would be very hard to find a problem that has better mechanical sympathy than full-scan similarity search. Even if the operation count of some other algorithm was 1/10 on paper it might not be faster if the prefetcher and branch predictor aren't running at their best.

People who want to start with RAG should not start with a vector search engine but instead download

https://sbert.net/

and start messing around w/ Jupyter notebooks and maybe FAISS. People who think I'm going to upload vectors to an expensive cloud service over a slow ADSL connection are delusional.

rockwotj 20 hours ago | root | parent | next |

Swap out FAISS with usearch, you get all the awesome SIMD acceleration (via dynamic dispatch), optional compression. Not affiliated but really cool tech.

https://github.com/unum-cloud/usearch

vegabook 17 hours ago | root | parent | next |

Just tried usearch against ol’ faithful np.dot, and found the latter to be 8x faster than usearch on 10m brute force scan as described in their readme [1] for top 50 matches. Identical output result. 1.74 seconds for numpy and around 12 seconds for usearch on an M2 max with enough ram to hold the vectors without swapping.

[1] https://github.com/unum-cloud/usearch?tab=readme-ov-file#exa...

ashvardanian 8 hours ago | root | parent |

Author here :)

This might not be an apples-to-apples comparison. NumPy uses BLAS for matrix multiplication, which benefits from tiling to make better use of CPU caches.

USearch, on the other hand, computes L2 distance directly (not the dot product) and supports a variety of metrics. It doesn't use tiling, so it's expected to be slower than BLAS GEMM routines for single or double-precision vectors.

Things might get more interesting with half-precision, brain-float16, or integer representations, where the trade-offs are less straightforward. Let me know if you decide to try it with those — I'd love to hear how it performs.

PS: You may find related benchmarks here: https://github.com/ashvardanian/SimSIMD

vegabook 6 hours ago | root | parent |

It turns out, my bad and I apologise, that although 10e6 x 1e3 FP32 fits well within 96GB of RAM, during the np.random.rand initialization phase intermediate allocations mean we go to about 32GB of swap files. These only get cleared if more ram is demanded and that happens on the first bench run. So whichever gets run first, np or usearch, gets penalised bigtime. So now I have re-run with sizes that never reach swap threshold, and the results are MUCH more impressive for usearch. Basically usearch is twice as fast. 7e6x1e3 scan for 1e3 top 50 is 1.32 seconds for numpy and 0.633 seconds for usearch. Swapped the order of benchmarks as well to rerun, and results check out. Nice work. usearch is now in my toolkit and I apologise again for the misleading comment.

As an aside, it's kind of amazing how it takes essentially just over half a second to scan 7m 1032-size vectors for semantic similarity, on a (beefy but not extraordinary) desktop computer. Modern hardware is so awesome. And I'm guessing I could get another order of magnitude or two speedup if I got Metal involved.

EDIT: Linux on tiny el-cheapo 100 dollar Intel n95 mini PC with 32GIG of (single channel) RAM, and dropping size to 3mx1024: usearch: 0.65 seconds numpy: 0.99 seconds. Amazing.

ashvardanian 5 hours ago | root | parent |

Oh, epic! Thanks for taking the time to double check :)

zackangelo 18 hours ago | root | parent | prev |

fwiw faiss, although a bit unwieldy, has an optimized full scan search built into it as well

ashvardanian 5 hours ago | root | parent |

Yes, it does, but it may be somewhat limited. If you need AVX2 float32 kernels for cosine distance, you are in luck. If you want dynamic dispatch from 200+ SIMD kernels across 4 generations of AVX-512 (Skylake-X, Ice Lake, Sapphire Rapids, and AMD Genoa), AVX2, Arm NEON, Arm SVE, and SVE2 for different mixed-precision distance functions, you may find other options more capable :)

zackangelo 3 hours ago | root | parent |

lol good to know! I've had the luxury of only needing the first one :)

abhgh 20 hours ago | root | parent | prev |

I strongly advocate this! If you're starting off in this space, check if this barebones implementation isn't all you need. You can't beat the accuracy of a full search; in theory, you're trading off scalability, but validate if you need the scale where this tradeoff begins to show. And yes, sbert is great, and it gives you options to choose [1] between accuracy (MPNet) and speed (MiniLM). There are also multi-lingual options. And remember, you can also fine-tune MPNet with SetFit. And there are always new and interesting embeddings being released, so remember to re-assess the fitment of embeddings once in a while against what you're using, e.g., LLM2vec or ModernBERT. A good idea would be to keep checking MTEB [2].

[1] https://www.sbert.net/docs/sentence_transformer/pretrained_m...

[2] https://huggingface.co/spaces/mteb/leaderboard

VoVAllen 13 hours ago | root | parent |

Hi, I'm the author of the article. I agree with your point. The model from https://www.mixedbread.ai/blog/mxbai-embed-xsmall-v1 also looks great, though I haven’t had the chance to try it yet.

LunaSea 21 hours ago | root | parent | prev |

VC are terrible at technical due diligence so the money will keep pouring in.

However, CTOs are also terrible at assessing their technical necessities and vector databases will have customers the same way web scale databases did.

antirez 17 hours ago | prev | next |

Quantization can be applied exactly in the same way in HNSW. I'm using quantization in the implementation of Redis vector sets and it works very well. I have very big issues with the other points as well, but I prefer to reply to these concerns with the implementation I hope to have into Redis in a short time (early 2025).

About insertion / deletion cost. Sure, they are costly, but if instead of grabbing one of the available implementations you start from scratch, extend the paper in sensible ways, and experiment with it, I think it is possible to do much better than one could initially believe.

binarymax 21 hours ago | prev | next |

HNSW became the defacto default specifically because you don’t need to precalculate the index and it updates as writes come in.

This is a great article, and all the points are there, but the truth is that most teams running million scale vectors don’t want the operational complexity of quantizing offline in some frequency. They’ll gladly outsource the costs to paying for RAM instead of some IVFPQ calculation.

However if there were a solution that “just worked” to handle PQ for shards in near real time for updates that also had sane filtering, that would be really nice.

bddicken 20 hours ago | root | parent | next |

At PlanetScale, we have a really nice solution for this in MySQL using SPANN + SPFresh. The way we've implemented it allows for pre and post filtering, full compatibility with where clause filtering, and has the same kind of acid compliance you'd expect from a relational database. You can read about it here:

https://planetscale.com/blog/announcing-planetscale-vectors-...

Hi, I'm the author of the article. In our product, VectorChord, we use a quantization algorithm called RaBitQ, which doesn’t require a separate codebook. Unlike IVFPQ, it avoids the need to maintain and update the corresponding codebook, so the update issue you mentioned is not a problem. Regarding filtering, I’m not sure which specific scenario you’re referring to, but we currently support iterative post-filtering and are technically capable of perfectly supporting pre-filtering as well.

binarymax 6 hours ago | root | parent |

Pre and post filtering are both not great. Some HNSW implementations in products like Vespa and Qdrant have filter-during-search.

This remains an unsolved problem in cluster-based indices.

nostrebored 20 hours ago | root | parent | prev |

Yes, real time updates, filtering, and multi vector support make most of these on device, in memory approaches untenable. If you really are just doing a similarity search against a fixed set of things, often you know the queries ahead of time and can just make a lookup table.

srajabi 17 hours ago | prev | next |

What about DiskANN? https://milvus.io/docs/disk_index.md

rekoros a day ago | prev | next |

Yep, I believe it

HNSW has been great for relatively small datasets (a website with 30K pages), but it’s a dangerous stretch for anything bigger

bluecoconut 21 hours ago | root | parent | next |

I don’t quite understand this - by 30k pages, is this the number of entries in your index? Did you mean 30M?

At the <100k scale I just full compute / inner product directly, and I don’t mess with vector stores or added complexity. No ANN algo needed — they’ll all be slower than actual exact kNN re ranking. (10k7684 =30MB, a scan over it and a sort is on the ~100us or faster). frankly, I’ve even sent at the 5k scale to client and done that client side in JS.

Often, I find i use an ANN algo / index to get me my nearest 10k then I do final re ranking with more expensive algorithms/compute in that reduced space.

The original HNSW paper was testing/benchmarking at the 5M-15M scales. That’s where it shines compared to alternatives.

When pushing to the 1B scale (I have an instance at 200M) the memory consumption does become a frustration (100GB of ram usage). Needing to vertically scale nodes that use the index. But it’s still very fast and good. I wouldn’t call it “dangerous” just “expensive”.

Interestingly though, I found that usearch package worked great and let me split and offload indexes into separate files on disk, greatly lowered ram usage and latency is still quite good on average, but has some spikes (eg. sometimes when doing nearest 10k though can be ~1-3 seconds on the 200M dataset)

VoVAllen 12 hours ago | root | parent | next |

Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It's based on RabitQ (a new quantization method) + IVF. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 100ms-level latency when using SSD with limited memory.

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

rekoros 12 hours ago | root | parent | prev |

You're dealing with much larger datasets than I have, so far - mine is only a few million vectors. I have a hard constraint on resources, so had to get things working quickly in a relatively gutless environment.

aabhay a day ago | root | parent | prev |

What systems would you recommend for larger deployments?

rekoros 21 hours ago | root | parent | next |

I ended up breaking up/sharding HNSW across multiple tables, but I'm dealing with many distinct datasets, each one just small enough to make HNSW effective in terms of index build/rebuild performance.

The article suggests IVF for larger datasets - this is the direction I'd certainly explore, but I've not personally had to deal with it. HNSW sharding/partitioning might actually work even for a very large - sharded/partitioned - dataset, where each query is a parallelized map/reduce operation.

VoVAllen 12 hours ago | root | parent | next |

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

nostrebored 20 hours ago | root | parent | prev |

What are you using for HNSW? Is the implementation handwritten? I’ve seen people using it well over XXm at full precision vectors with real time updates

rekoros 12 hours ago | root | parent |

pgvector - I wasn't able to get HNSW to build/rebuild quickly [enough] with a few million vectors. Very possibly I was doing something wrong, but fun research time ran out and I needed to get back to building features.

redskyluan 16 hours ago | root | parent | prev |

why not try milvus? you get multiple index types, SIMD based brute force search, IVF, HNSW and DiskANN and you never bother to scale

nostrebored 20 hours ago | prev | next |

Nit: the drawback of “not working well in disk based systems” isn’t a drawback unless you’re already using disk based systems.

The difference in recall is also significant — what you really get with HNSW is a system made to give good cost:approximation quality. These IVFPQ based systems are ones I’ve seen people rip and replace if the use case is high value.

I really don’t understand the push to make pg do everything. It wasn’t designed for search, and trying to shove these features into the platform feels like some misguided cost optimization that puts all of your data infrastructure on the same critical path.

VoVAllen 13 hours ago | root | parent |

Hi, I'm the author of the article. In our actual product, VectorChord, we adopted a new quantization algorithm called RaBitQ. The accuracy has not been compromised by the quantization process. We’ve provided recall-QPS comparison curves against HNSW, which you can find in our blog: https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1....

Many users choose PostgreSQL because they want to query their data across multiple dimensions, including leveraging time indexes, inverted indexes, geographic indexes, and more, while also being able to reuse their existing operational experiences. From my perspective, vector search in PostgreSQL does not have any disadvantages compared to specialized vector databases so fat.

cratermoon 17 hours ago | prev | next |

"Its reliance on frequent random access patterns makes it incompatible with the sequential access nature of disk storage."

Is this true for SSD storage, or does it only apply to spinning metal platters?

VoVAllen 13 hours ago | root | parent |

Hi, I'm the author of the article. The sequential access pattern of IVF makes prefetching and large block sequential reads much easier, whereas it's almost impossible for HNSW to achieve efficient prefetching.

weiichongy24 16 hours ago | prev | next |

[dead]

634636346 19 hours ago | prev | next |

[dead]

tw04 20 hours ago | prev |

> Its reliance on frequent random access patterns makes it incompatible with the sequential access nature of disk storage.

So use SSD? Are people seriously still trying to use spinning disk for database workloads in 2024?

jandrewrogers 19 hours ago | root | parent | next |

An SSD does not solve the problem of page fault chasing, it just makes it slightly less bad. This is fundamentally a software architecture problem.

This is solved with latency-hiding I/O schedulers, which don’t rely on cache locality for throughput.

Even SSD won't be fast enough for most indexes due to the random access nature. I've seen more than 1M iops on a huge nvme disk when use DiskANN index

JoeAltmaier 20 hours ago | root | parent | prev |

Data farms are all about cost per GB. Spinning media is a fraction of the cost per.