59 comments

  • intalentive a day ago

    SIMD-accelerated search over binary vectors is very fast and you can fit a lot in memory. Then rerank over f32 from disk.

    I tried a few alternatives and found that SQLite + usearch extension wins for my use cases (< 1 million records), as measured by latency and simplicity. I believe this approach can scale up to hundreds of millions of records.

    • wild_egg 14 hours ago

      I've been using DuckDB similarly and loving the simplicity. What sort of latency are you seeing with your setup? I'm hitting 300ms to query against 10M vectors and could see that becoming a bottleneck if going into the hundreds of millions.

      • ashvardanian 9 hours ago

        DuckDB also uses USearch and the underlying SimSIMD library for kernels, but both were originally designed for a Billion+ scale.

        Depending on how the wrapper is implemented you can get different numbers. With the raw libraries, on larger machines, I'd expect 200'000 requests per second for 1 Billion entries.

      • VoVAllen 14 hours ago

        Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 60ms-level latency when using SSD with limited memory.

        [1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

  • generall a day ago

    IVF, unfortunately, is barely compatible with filtered search. It have to rely on post-filtering and retrieve more and more candidates if the result set is not big enough. If the query is in some form correlated with the filter, this approach quickly degrades to brute-force.

    Surprised that the article doesn't mention filtering use-case at all.

    • VoVAllen 15 hours ago

      Hi, I'm the author of the article. I actually think the opposite of what you mentioned. IVF is more suitable for prefiltering compared to HNSW. For prefiltering in HNSW, there is a certain requirement for the filtering rate—it can't be too low, or the entire graph may become disconnected. For instance, with the commonly used parameter m=16, each node can have at most 16 neighbors. If the filtering rate is below 5%, it can directly result in no neighbors meeting the condition, causing the algorithm to fail entirely. This is why the academic community has proposed alternatives like ACRON[1]. On the other hand, IVF doesn't have this problem at all—you can check whether a candidate meets the condition before calculating distances.

      [1] https://arxiv.org/abs/2403.04871

      • generall 10 hours ago

        In IVF you can start checking conditions only in the final bucket. There are no guarantees if the bucket has any acceptable value, and there are no procedures to find the bucket which has acceptable values before scanning it.

        • VoVAllen 8 hours ago

          I'm not sure I fully understand your point. This can be implemented quite easily by organizing each posting list with specific attributes to locate the values efficiently. For example, you could build each posting list into a B-tree, or similar to a column-oriented format that stores attribute statistics to enable skip scanning. Could you elaborate more?

  • tveita a day ago

    > For example, the typical distance computation complexity between two D-dimensional vectors is O(D^2), but compressing floats into bits reduces this by a factor of 1024 (32x32).

    This isn't right, cosine distance between two vectors is definitely O(D)…

    Of course replacing float multiplications with xor+popcount is a nice speedup in computation, but assuming you're memory bandwidth limited, speedup should be linear.

    • VoVAllen 15 hours ago

      Hi, I'm the author of the article. I think you're right, and we’ll update the description in the blog. Since binary operations are simpler than floating-point operations, the speedup could indeed be greater than 32x.

    • DoctorOetker 17 hours ago

      Some snippets:

      > Leverages concentration of measure phenomena

      > Uses anisotropic vector quantization to optimize inner product accuracy by penalizing errors in directions that impact high inner products, achieving superior recall and speed.

      I only skimmed the article, but the 2 words I emphasized seems to imply they apply a quadratic metric for distance, i.e. they assume the data coordinates are with respect to non-orthogonal basis vectors, resulting in off-diagonal distance metric terms.

  • ayende a day ago

    The problem with IVF is that you need to find the right centroids. And that doesn't work well if your data grow and mutate over time.

    Splitting a centroid is a pretty complex issue.

    As are clustering in an area. For example, let's assume that you hold StackOverflow questions & answers. Now you have a massive amount of additional data (> 25% of the existing dataset) that talks about Rust.

    You either need to re-calculate the centroids globally, or find a good way to split.

    The posting list are easy to use, but if you are unbalanced, it gets really bad.

    • VoVAllen 15 hours ago

      Hi, I'm the author of the article. Meta have conducted some experiments on dynamic IVF with datasets of several hundred million records. The conclusion was that recall can be maintained through simple repartitioning and rebalancing strategies. You can find more details here: DEDRIFT: Robust Similarity Search under Content Drift https://arxiv.org/pdf/2308.02752. Additionally, with the help of GPUs, KMeans can be computed quickly, making the cost of rebuilding the entire index acceptable in many cases.

  • PaulHoule a day ago

    I've been doing vector search since 2002 or so and it's amazing how far you can get keeping vectors in RAM and using primitive search algorithms, enough that I'm afraid the market for vector databases is 1% of what VC's think it is. (e.g. full scan was good enough for a search engine of international patents and non-patent literature)

    • montebicyclelo a day ago

      This 100%. Vector DBs have been heavily pushed by vector DB companies and cloud providers, and despite companies often having mere MBs of documents to search through for their use cases, amounts that trivially fit into RAM / can be searched via dot product in milliseconds, managers and engineers less familiar with the space think people are doing it wrong if they don't use a vector db. So.. vector DBs end up getting used when actually a simpler in-memory, non-approximate, solution would be fine, (but less monetizable)

    • zackangelo a day ago

      This is so true. A plain old exhaustive SIMD-optimized similarity search will do just fine in many cases and not have any of the approximation tradeoffs of HNSW.

      • PhilippGille a day ago

        In chromem-go [1] I'm searching through 100,000 vectors in 40ms on a mid-range laptop CPU, even without SIMD. It's quick enough for many use cases.

        [1] https://github.com/philippgille/chromem-go

        • PaulHoule a day ago

          It would be very hard to find a problem that has better mechanical sympathy than full-scan similarity search. Even if the operation count of some other algorithm was 1/10 on paper it might not be faster if the prefetcher and branch predictor aren't running at their best.

          People who want to start with RAG should not start with a vector search engine but instead download

          https://sbert.net/

          and start messing around w/ Jupyter notebooks and maybe FAISS. People who think I'm going to upload vectors to an expensive cloud service over a slow ADSL connection are delusional.

          • rockwotj a day ago

            Swap out FAISS with usearch, you get all the awesome SIMD acceleration (via dynamic dispatch), optional compression. Not affiliated but really cool tech.

            https://github.com/unum-cloud/usearch

            • vegabook 19 hours ago

              Just tried usearch against ol’ faithful np.dot, and found the latter to be 8x faster than usearch on 10m brute force scan as described in their readme [1] for top 50 matches. Identical output result. 1.74 seconds for numpy and around 12 seconds for usearch on an M2 max with enough ram to hold the vectors without swapping.

              [1] https://github.com/unum-cloud/usearch?tab=readme-ov-file#exa...

              • ashvardanian 9 hours ago

                Author here :)

                This might not be an apples-to-apples comparison. NumPy uses BLAS for matrix multiplication, which benefits from tiling to make better use of CPU caches.

                USearch, on the other hand, computes L2 distance directly (not the dot product) and supports a variety of metrics. It doesn't use tiling, so it's expected to be slower than BLAS GEMM routines for single or double-precision vectors.

                Things might get more interesting with half-precision, brain-float16, or integer representations, where the trade-offs are less straightforward. Let me know if you decide to try it with those — I'd love to hear how it performs.

                PS: You may find related benchmarks here: https://github.com/ashvardanian/SimSIMD

                • vegabook 7 hours ago

                  It turns out, my bad and I apologise, that although 10e6 x 1e3 FP32 fits well within 96GB of RAM, during the np.random.rand initialization phase intermediate allocations mean we go to about 32GB of swap files. These only get cleared if more ram is demanded and that happens on the first bench run. So whichever gets run first, np or usearch, gets penalised bigtime. So now I have re-run with sizes that never reach swap threshold, and the results are MUCH more impressive for usearch. Basically usearch is twice as fast. 7e6x1e3 scan for 1e3 top 50 is 1.32 seconds for numpy and 0.633 seconds for usearch. Swapped the order of benchmarks as well to rerun, and results check out. Nice work. usearch is now in my toolkit and I apologise again for the misleading comment.

                  As an aside, it's kind of amazing how it takes essentially just over half a second to scan 7m 1032-size vectors for semantic similarity, on a (beefy but not extraordinary) desktop computer. Modern hardware is so awesome. And I'm guessing I could get another order of magnitude or two speedup if I got Metal involved.

                  EDIT: Linux on tiny el-cheapo 100 dollar Intel n95 mini PC with 32GIG of (single channel) RAM, and dropping size to 3mx1024: usearch: 0.65 seconds numpy: 0.99 seconds. Amazing.

                  • ashvardanian 6 hours ago

                    Oh, epic! Thanks for taking the time to double check :)

            • zackangelo 20 hours ago

              fwiw faiss, although a bit unwieldy, has an optimized full scan search built into it as well

              • ashvardanian 6 hours ago

                Yes, it does, but it may be somewhat limited. If you need AVX2 float32 kernels for cosine distance, you are in luck. If you want dynamic dispatch from 200+ SIMD kernels across 4 generations of AVX-512 (Skylake-X, Ice Lake, Sapphire Rapids, and AMD Genoa), AVX2, Arm NEON, Arm SVE, and SVE2 for different mixed-precision distance functions, you may find other options more capable :)

                • zackangelo 5 hours ago

                  lol good to know! I've had the luxury of only needing the first one :)

          • abhgh a day ago

            I strongly advocate this! If you're starting off in this space, check if this barebones implementation isn't all you need. You can't beat the accuracy of a full search; in theory, you're trading off scalability, but validate if you need the scale where this tradeoff begins to show. And yes, sbert is great, and it gives you options to choose [1] between accuracy (MPNet) and speed (MiniLM). There are also multi-lingual options. And remember, you can also fine-tune MPNet with SetFit. And there are always new and interesting embeddings being released, so remember to re-assess the fitment of embeddings once in a while against what you're using, e.g., LLM2vec or ModernBERT. A good idea would be to keep checking MTEB [2].

            [1] https://www.sbert.net/docs/sentence_transformer/pretrained_m...

            [2] https://huggingface.co/spaces/mteb/leaderboard

    • LunaSea a day ago

      VC are terrible at technical due diligence so the money will keep pouring in.

      However, CTOs are also terrible at assessing their technical necessities and vector databases will have customers the same way web scale databases did.

  • antirez 19 hours ago

    Quantization can be applied exactly in the same way in HNSW. I'm using quantization in the implementation of Redis vector sets and it works very well. I have very big issues with the other points as well, but I prefer to reply to these concerns with the implementation I hope to have into Redis in a short time (early 2025).

    About insertion / deletion cost. Sure, they are costly, but if instead of grabbing one of the available implementations you start from scratch, extend the paper in sensible ways, and experiment with it, I think it is possible to do much better than one could initially believe.

  • binarymax a day ago

    HNSW became the defacto default specifically because you don’t need to precalculate the index and it updates as writes come in.

    This is a great article, and all the points are there, but the truth is that most teams running million scale vectors don’t want the operational complexity of quantizing offline in some frequency. They’ll gladly outsource the costs to paying for RAM instead of some IVFPQ calculation.

    However if there were a solution that “just worked” to handle PQ for shards in near real time for updates that also had sane filtering, that would be really nice.

    • bddicken a day ago

      At PlanetScale, we have a really nice solution for this in MySQL using SPANN + SPFresh. The way we've implemented it allows for pre and post filtering, full compatibility with where clause filtering, and has the same kind of acid compliance you'd expect from a relational database. You can read about it here:

      https://planetscale.com/blog/announcing-planetscale-vectors-...

    • VoVAllen 14 hours ago

      Hi, I'm the author of the article. In our product, VectorChord, we use a quantization algorithm called RaBitQ, which doesn’t require a separate codebook. Unlike IVFPQ, it avoids the need to maintain and update the corresponding codebook, so the update issue you mentioned is not a problem. Regarding filtering, I’m not sure which specific scenario you’re referring to, but we currently support iterative post-filtering and are technically capable of perfectly supporting pre-filtering as well.

      • binarymax 8 hours ago

        Pre and post filtering are both not great. Some HNSW implementations in products like Vespa and Qdrant have filter-during-search.

        This remains an unsolved problem in cluster-based indices.

    • nostrebored a day ago

      Yes, real time updates, filtering, and multi vector support make most of these on device, in memory approaches untenable. If you really are just doing a similarity search against a fixed set of things, often you know the queries ahead of time and can just make a lookup table.

  • ahevenor 4 hours ago

    HNSW can be used for large indexes when combined with an effective caching strategy. Aerospike uses a distributed cache with query steering approach which allows you to have many indexes, including large ones and load them into memory as your application needs them.

  • srajabi 19 hours ago
  • rekoros a day ago

    Yep, I believe it

    HNSW has been great for relatively small datasets (a website with 30K pages), but it’s a dangerous stretch for anything bigger

    • bluecoconut a day ago

      I don’t quite understand this - by 30k pages, is this the number of entries in your index? Did you mean 30M?

      At the <100k scale I just full compute / inner product directly, and I don’t mess with vector stores or added complexity. No ANN algo needed — they’ll all be slower than actual exact kNN re ranking. (10k7684 =30MB, a scan over it and a sort is on the ~100us or faster). frankly, I’ve even sent at the 5k scale to client and done that client side in JS.

      Often, I find i use an ANN algo / index to get me my nearest 10k then I do final re ranking with more expensive algorithms/compute in that reduced space.

      The original HNSW paper was testing/benchmarking at the 5M-15M scales. That’s where it shines compared to alternatives.

      When pushing to the 1B scale (I have an instance at 200M) the memory consumption does become a frustration (100GB of ram usage). Needing to vertically scale nodes that use the index. But it’s still very fast and good. I wouldn’t call it “dangerous” just “expensive”.

      Interestingly though, I found that usearch package worked great and let me split and offload indexes into separate files on disk, greatly lowered ram usage and latency is still quite good on average, but has some spikes (eg. sometimes when doing nearest 10k though can be ~1-3 seconds on the 200M dataset)

      • VoVAllen 14 hours ago

        Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It's based on RabitQ (a new quantization method) + IVF. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 100ms-level latency when using SSD with limited memory.

        [1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

      • rekoros 14 hours ago

        You're dealing with much larger datasets than I have, so far - mine is only a few million vectors. I have a hard constraint on resources, so had to get things working quickly in a relatively gutless environment.

    • aabhay a day ago

      What systems would you recommend for larger deployments?

      • rekoros a day ago

        I ended up breaking up/sharding HNSW across multiple tables, but I'm dealing with many distinct datasets, each one just small enough to make HNSW effective in terms of index build/rebuild performance.

        The article suggests IVF for larger datasets - this is the direction I'd certainly explore, but I've not personally had to deal with it. HNSW sharding/partitioning might actually work even for a very large - sharded/partitioned - dataset, where each query is a parallelized map/reduce operation.

        • VoVAllen 14 hours ago

          Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It's based on RabitQ (a new quantization method) + IVF. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 100ms-level latency when using SSD with limited memory.

          [1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

        • nostrebored a day ago

          What are you using for HNSW? Is the implementation handwritten? I’ve seen people using it well over XXm at full precision vectors with real time updates

          • rekoros 14 hours ago

            pgvector - I wasn't able to get HNSW to build/rebuild quickly [enough] with a few million vectors. Very possibly I was doing something wrong, but fun research time ran out and I needed to get back to building features.

      • redskyluan 18 hours ago

        why not try milvus? you get multiple index types, SIMD based brute force search, IVF, HNSW and DiskANN and you never bother to scale

  • nostrebored 21 hours ago

    Nit: the drawback of “not working well in disk based systems” isn’t a drawback unless you’re already using disk based systems.

    The difference in recall is also significant — what you really get with HNSW is a system made to give good cost:approximation quality. These IVFPQ based systems are ones I’ve seen people rip and replace if the use case is high value.

    I really don’t understand the push to make pg do everything. It wasn’t designed for search, and trying to shove these features into the platform feels like some misguided cost optimization that puts all of your data infrastructure on the same critical path.

    • VoVAllen 14 hours ago

      Hi, I'm the author of the article. In our actual product, VectorChord, we adopted a new quantization algorithm called RaBitQ. The accuracy has not been compromised by the quantization process. We’ve provided recall-QPS comparison curves against HNSW, which you can find in our blog: https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1....

      Many users choose PostgreSQL because they want to query their data across multiple dimensions, including leveraging time indexes, inverted indexes, geographic indexes, and more, while also being able to reuse their existing operational experiences. From my perspective, vector search in PostgreSQL does not have any disadvantages compared to specialized vector databases so fat.

      • nostrebored 34 minutes ago

        But why are you benchmarking against pgvector HNSW, which is known to struggle with recall and performance at large numbers of vectors?

        Why is the graph measuring precision and not recall?

        The feature dump is entirely a subset of Vespa features.

        This is just an odd benchmark. I can tell you in the wild, for revenue attached use cases, I saw _zero_ companies choose pg for embedding retrieval.

  • cratermoon 18 hours ago

    "Its reliance on frequent random access patterns makes it incompatible with the sequential access nature of disk storage."

    Is this true for SSD storage, or does it only apply to spinning metal platters?

    • VoVAllen 14 hours ago

      Hi, I'm the author of the article. The sequential access pattern of IVF makes prefetching and large block sequential reads much easier, whereas it's almost impossible for HNSW to achieve efficient prefetching.

  • tw04 21 hours ago

    > Its reliance on frequent random access patterns makes it incompatible with the sequential access nature of disk storage.

    So use SSD? Are people seriously still trying to use spinning disk for database workloads in 2024?

    • jandrewrogers 20 hours ago

      An SSD does not solve the problem of page fault chasing, it just makes it slightly less bad. This is fundamentally a software architecture problem.

      This is solved with latency-hiding I/O schedulers, which don’t rely on cache locality for throughput.

    • VoVAllen 14 hours ago

      Hi, I'm the author of the article. The sequential access pattern of IVF makes prefetching and large block sequential reads much easier, whereas it's almost impossible for HNSW to achieve efficient prefetching.

    • redskyluan 18 hours ago

      Even SSD won't be fast enough for most indexes due to the random access nature. I've seen more than 1M iops on a huge nvme disk when use DiskANN index

    • JoeAltmaier 21 hours ago

      Data farms are all about cost per GB. Spinning media is a fraction of the cost per.