Ask HN: Best Embedding Models?

14 points | by devstein 12 hours ago ago

11 comments

  • pstorm 4 minutes ago

    Just fyi, for RAG/similarity search, adding a reranker was much bigger pay off than switching embedding models.

  • emschwartz 3 hours ago

    I’ve been using MixedBread, which is a pretty old model at this point. Recently, I tried comparing it to some newer models and was disappointed that the results weren’t dramatically and uniformly better.

    You probably can’t go wrong if you pick a recent one that scores decently well on benchmarks and is at the right price point (or memory requirement) for whatever you’re trying to do.

  • rapatel0 12 hours ago

    I've liked qwen and embeddinggemma for local search. Qwen because 32K is enough to basically fit a whole page into the context window and embeddiggemma because it's crazy efficient.

  • LogicCraft678 4 hours ago

    Feels like embeddings are underrated compared to LLM's hype, but they doing great.

    • Alifatisk 24 minutes ago

      Why do you feel like embeddings are underrated? What is it with embeddings that deserves more attention?

  • PhilippGille 9 hours ago

    Benchmarks only paint part of the picture, but it's still a decent place to start looking into recent models:

    https://huggingface.co/spaces/mteb/leaderboard

  • didgeoridoo 5 hours ago

    I’m partial to jina.ai — they have open models for code and prose, all easily runnable locally.

  • Yogeshshirsath 2 hours ago

    E5 (Microsoft)

  • jayshah5696 10 hours ago

    embeddings are easy to fine tune. Try modern bert.

  • frederickabrah 4 hours ago

    who knows a tool for rug check in crypto

  • halvorbuilds 4 hours ago

    gemma4