1 comments

  • Taranovski 10 hours ago

    EmbeddingAdapters is a Python library for translating between embedding model vector spaces.

    It provides plug-and-play adapters that map embeddings produced by one model into the vector space of another — locally or via provider APIs — enabling cross-model retrieval, routing, interoperability, and migration without re-embedding an existing corpus.

    If a vector index is already built using one embedding model, embedding-adapters allows it to be queried using another, without rebuilding the index.

    GitHub: https://github.com/PotentiallyARobot/EmbeddingAdapters/

    PyPI: https://pypi.org/project/embedding-adapters/

    Example Generate an OpenAI embedding locally from minilm+adapter:

    pip install embedding-adapters

    embedding-adapters embed \ --source sentence-transformers/all-MiniLM-L6-v2 \ --target openai/text-embedding-3-small \ --flavor large \ --text "where are restaurants with a hamburger near me" The command returns:

    an embedding in the target (OpenAI) space

    a confidence / quality score estimating adapter reliability

    Model Input At inference time, the adapter’s only input is an embedding vector from a source model. No text, tokens, prompts, or provider embeddings are used.

    A pure vector → vector mapping is sufficient to recover most of the retrieval behavior of larger proprietary embedding models for in-domain queries.

    Benchmark results Dataset: SQuAD (8,000 Q/A pairs)

    Latency (answer embeddings):

    MiniLM embed: 1.08 s

    Adapter transform: 0.97 s

    OpenAI API embed: 40.29 s

    ≈ 70× faster for local MiniLM + adapter vs OpenAI API calls.

    Retrieval quality (Recall@10):

    MiniLM → MiniLM: 10.32%

    Adapter → Adapter: 15.59%

    Adapter → OpenAI: 16.93%

    OpenAI → OpenAI: 18.26%

    Bootstrap difference (OpenAI − Adapter → OpenAI): ~1.34%

    For in-domain queries, the MiniLM → OpenAI adapter recovers ~93% of OpenAI retrieval performance and substantially outperforms MiniLM-only baselines.

    How it works (high level) Each adapter is trained on a restricted domain, allowing it to specialize in interpreting the semantic signals of smaller models and projecting them into higher-dimensional provider spaces while preserving retrieval-relevant structure.

    A quality score is provided to determine whether an input is well-covered by the adapter’s training distribution.

    Practical uses in Python applications Query an existing vector index built with one embedding model using another

    Operate mixed vector indexes and route queries to the most effective embedding space

    Reduce cost and latency by embedding locally for in-domain queries

    Evaluate embedding providers before committing to a full re-embed

    Gradually migrate between embedding models

    Handle provider outages or rate limits gracefully

    Run RAG pipelines in air-gapped or restricted environments

    Maintain a stable “canonical” embedding space while changing edge models

    Supported adapters MiniLM ↔ OpenAI

    OpenAI ↔ Gemini

    E5 ↔ MiniLM

    E5 ↔ OpenAI

    E5 ↔ Gemini

    MiniLM ↔ Gemini

    The project is under active development, with ongoing work on additional adapter pairs, domain specialization, evaluation tooling, and training efficiency.

    Please Like/Upvote if you found this interesting