> The Hierarchical navigable small world (HNSW) algorithm is a graph-based approximate nearest neighbor search technique used in many vector databases.[1] Nearest neighbor search without an index involves computing the distance from the query to each point in the database, which for large datasets is computationally prohibitive. For high-dimensional data, tree-based exact vector search techniques such as the k-d tree and R-tree do not perform well enough because of the curse of dimensionality.
At very high scale, there's less usage of graphs. Or there's a set of clustering on top of graphs.
Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.
Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.
HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.
Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.
There is an entire section of the post about that. I believe that's more the illusion of a problem because of product design issues than a real challenge since far results that match the filter are totally useless.
I'm facing the problem you describe daily. It's especially bad because it's very difficult for me to predict if the set of filters will reduce the dataset by ~1% (in which case following the original vector index is fine) or by 99.99% (in which case you just want to brute force the remaining vectors).
Tried a million different things, but haven't heard of Turbopuffer yet. Any references on how they perform with such additional filters?
Lucene and ES implement a shortcut for filters that are restrictive enough. Since it's already optimized for figuring out if something falls into your filter set, you first determine the size of that. You traverse the HNSW normally, then if you have traversed more nodes than your filter set's cardinality, you just switch to brute forcing your filter set distance comparisons. So worst case scenario is you do 2x your filter set size vector distance operations. Quite neat.
Oh that's nice! Any references on this shortcut? How do you activate that behavior? I was playing around with ES, but the only suggestion I found was to use `count` on filters before deciding (manually) which path to take.
Just curious what the state of the art around filtered vector search results is? I took a quick look at the SPFresh paper and didn't see it specifically address filtering.
This is well worth reading in full. The section about threading is particularly interesting: most of Redis is single-threaded, but antirez decided to use threads for the HNSW implementation and explains why.
> many programmers are smart, and if instead of creating a magic system they have no access to, you show them the data structure, the tradeoffs, they can build more things, and model their use cases in specific ways. And your system will be simpler, too.
Basically my entire full-time job is spent prosecuting this argument. It is indeed true that many programmers are smart, but it is equally true that many programmers _are not_ smart, and those programmers have to contribute too. More hands is usually better than simpler systems for reasons that have nothing to do with technical proficiency.
>> More hands is usually better than simpler systems for reasons that have nothing to do with technical proficiency.
If you are working on open source databases, or something close to the metal I agree with antirez, if you are working at some established tech business (e.g: a very old ecommerce site), I agree with you
To be clear, I'm not disagreeing with antirez at all. I feel his argument in my bones. I am a smart programmer. I want simple, powerful systems that leave the kid gloves in the drawer.
The unfortunate reality is that a large cadre of people cannot handle such tools, and those people still have extremely valuable contributions to make.
I say this as a full-time research engineer at a top-10 university. We are not short on talent, new problems, or funding. There is ample opportunity to make our systems as simple/"pure" as possible, and I make that case vigorously. The fact remains that intentionally limiting scope for the sake of the many is often better than cultivating an elite few.
From Wikipedia:
> The Hierarchical navigable small world (HNSW) algorithm is a graph-based approximate nearest neighbor search technique used in many vector databases.[1] Nearest neighbor search without an index involves computing the distance from the query to each point in the database, which for large datasets is computationally prohibitive. For high-dimensional data, tree-based exact vector search techniques such as the k-d tree and R-tree do not perform well enough because of the curse of dimensionality.
At very high scale, there's less usage of graphs. Or there's a set of clustering on top of graphs.
Graphs can be complex to build and rebalance. Graph-like data structures with a thing, then a pointer out to another thing, aren't that cache friendly.
Add to that, people almost always want to *filter* vector search results. And this is a huge blindspot for consumers and providers. It's where the ugly performance surprises come from. Filtered HNSW isn't straightforward, and requires you to just keep traversing the graph looking for results that satisfy your filter.
HNSW came out of a benchmark regime where we just indexed some vectors and tried to only maximize recall for query latency. It doesn't take into account the filtering / indexing almost everyone wants.
Turbopuffer, for example, doesn't use graphs at all, it uses SPFresh. And they recently got 200ms latency on 100B vectors.
https://turbopuffer.com/docs/vector
There is an entire section of the post about that. I believe that's more the illusion of a problem because of product design issues than a real challenge since far results that match the filter are totally useless.
I'm facing the problem you describe daily. It's especially bad because it's very difficult for me to predict if the set of filters will reduce the dataset by ~1% (in which case following the original vector index is fine) or by 99.99% (in which case you just want to brute force the remaining vectors).
Tried a million different things, but haven't heard of Turbopuffer yet. Any references on how they perform with such additional filters?
Lucene and ES implement a shortcut for filters that are restrictive enough. Since it's already optimized for figuring out if something falls into your filter set, you first determine the size of that. You traverse the HNSW normally, then if you have traversed more nodes than your filter set's cardinality, you just switch to brute forcing your filter set distance comparisons. So worst case scenario is you do 2x your filter set size vector distance operations. Quite neat.
Oh that's nice! Any references on this shortcut? How do you activate that behavior? I was playing around with ES, but the only suggestion I found was to use `count` on filters before deciding (manually) which path to take.
Here you go https://github.com/apache/lucene/pull/656 - no need to do anything from the user side to trigger it as far as I know.
Just curious what the state of the art around filtered vector search results is? I took a quick look at the SPFresh paper and didn't see it specifically address filtering.
Hybrid search with vector similarity and filtering I think has mostly been solved by Vespa and not even recently.
https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-sear...
For sure. But its "solved" differently by every vector database. You have to pay attention to how its solved.
Full text search has this same issue.
This is well worth reading in full. The section about threading is particularly interesting: most of Redis is single-threaded, but antirez decided to use threads for the HNSW implementation and explains why.
Thanks! Appreciate your words.
> many programmers are smart, and if instead of creating a magic system they have no access to, you show them the data structure, the tradeoffs, they can build more things, and model their use cases in specific ways. And your system will be simpler, too.
Basically my entire full-time job is spent prosecuting this argument. It is indeed true that many programmers are smart, but it is equally true that many programmers _are not_ smart, and those programmers have to contribute too. More hands is usually better than simpler systems for reasons that have nothing to do with technical proficiency.
>> More hands is usually better than simpler systems for reasons that have nothing to do with technical proficiency.
If you are working on open source databases, or something close to the metal I agree with antirez, if you are working at some established tech business (e.g: a very old ecommerce site), I agree with you
To be clear, I'm not disagreeing with antirez at all. I feel his argument in my bones. I am a smart programmer. I want simple, powerful systems that leave the kid gloves in the drawer.
The unfortunate reality is that a large cadre of people cannot handle such tools, and those people still have extremely valuable contributions to make.
I say this as a full-time research engineer at a top-10 university. We are not short on talent, new problems, or funding. There is ample opportunity to make our systems as simple/"pure" as possible, and I make that case vigorously. The fact remains that intentionally limiting scope for the sake of the many is often better than cultivating an elite few.