Yes, we initially started with Paradedb but moved to Typesense for a search-as-you-type experience. We also have an additional layer for query transformation using an LLM though. (only when query is "question-like".
e.g:
if you go to 'https://docs.litellm.ai/' and search for 'how to limit API cost,' it will map the query to 'budget.'
It's really useful to be able to specify the search space for a specific query (example: Canary allows search for the query "sagemaker" on our docs or on our github issues )
Just played around with tinkerbird on Tinkerboard[0]... it doesn't seem to get good results with the provided example data. Why do you think a support for it would be worthwhile?
Getting good results involves tuning, good models, and well defined prompts, the demo not implementing a good RAG has nothing to do with its vector search performance. I suggest reading up on how the technology works.
I have to say Algolia is underwhelming (even after all these years). Perhaps I'm using it wrong, but I often more quickly find the comment or story I'm searching for via a targeted search using Google. I should give Bing a try as I've been been getting better finance related results there lately--especially when trying to locate ratings and / or other docs related to newly issued securities.
I had to use Algolia in a recent ecommerce solution, I think e-commerce really is the sweet spot for what Algolia offers, quick setup not a lot of need to mess around with your rankings etc. with very simple content.
I'm used to Solr and ElasticSearch for most sites I've ran, which tend to be information sites dense where you need to be able to control rankings to get the best results, which HN is much closer to than to an e-commerce site.
Agreed. I dread having to use Algolia search on documentation these days. The search results feel pretty naively selected, and the UI is pretty poor. I get that people want to deploy static sites, but can we please find a way to bring back search _pages_?
Can you talk about how you implemented search-as-you-type? Doing so with semantic search seems tricked given the roundtrips needed to compute embeddings on the fly (assuming the use of OpenAI embeddings)
sure - implementing a search-as-you-type experience with an ai-powered feature was what i wanted to do as well. it doesn't use embeddings at the moment. when you type a short query like 'openai,' it simply runs a basic query using Typesense. however, if you enter a question-like query, such as 'how to llimit api cost,' it transforms it into multiple queries, like 'budget' and 'limit.'
in the self-hosted version, it use the CHAT_COMPLETION_MODEL env variable for selecting the llm model. in our cloud version, we use a fine-tuned version of 4o-mini that we will eventually move to a smaller model like llama8b or even 1b.
Got it! I saw this in the code and assumed you were using embeddings
def evaluate(input: shared.EvaluationInput):
ds = Dataset.from_list(input.dataset)
metrics = [metric_map[metric] for metric in input.metrics]
Been looking for something like this! Doc search just hasn't kept up with what's possible now and is such a hassle to get the indexing to work properly. Will try it out!
Glean is used for searching the workspace (AFAIK, for internal use). Canary is used for searching technical documentation, GitHub issues, etc., and is intended for the users of the project.
The name Canary is a bit confusing, since a lot of companies already use Canary to indicate symptoms of issues (re: canary in coal mine). However the app doesn't fulfill this need.
Took me a little poking around to figure out what the underlying search engine was: it's https://typesense.org/ hosted in a Docker container.
Yes, we initially started with Paradedb but moved to Typesense for a search-as-you-type experience. We also have an additional layer for query transformation using an LLM though. (only when query is "question-like".
e.g:
if you go to 'https://docs.litellm.ai/' and search for 'how to limit API cost,' it will map the query to 'budget.'
Oh neat, is that this bit? https://raw.githubusercontent.com/fastrepl/canary/c1f03cbbee...
yes :)
Canary is awesome! we use Canary for our doc search at LiteLLM (you can see it here: https://docs.litellm.ai/docs/)
It's really useful to be able to specify the search space for a specific query (example: Canary allows search for the query "sagemaker" on our docs or on our github issues )
The search modal says, "Search by Algolia".
click cute yellow bird next to the searchbar.
You should add support for tinkerbird, so the index can be statically generated and queried without a backend.
https://github.com/wizenheimer/tinkerbird
Just played around with tinkerbird on Tinkerboard[0]... it doesn't seem to get good results with the provided example data. Why do you think a support for it would be worthwhile?
[0]: https://tinkerboard.vercel.app/
Getting good results involves tuning, good models, and well defined prompts, the demo not implementing a good RAG has nothing to do with its vector search performance. I suggest reading up on how the technology works.
I have to say Algolia is underwhelming (even after all these years). Perhaps I'm using it wrong, but I often more quickly find the comment or story I'm searching for via a targeted search using Google. I should give Bing a try as I've been been getting better finance related results there lately--especially when trying to locate ratings and / or other docs related to newly issued securities.
I had to use Algolia in a recent ecommerce solution, I think e-commerce really is the sweet spot for what Algolia offers, quick setup not a lot of need to mess around with your rankings etc. with very simple content.
I'm used to Solr and ElasticSearch for most sites I've ran, which tend to be information sites dense where you need to be able to control rankings to get the best results, which HN is much closer to than to an e-commerce site.
Have you tried Vertex AI Search for Retail?
no
Agreed. I dread having to use Algolia search on documentation these days. The search results feel pretty naively selected, and the UI is pretty poor. I get that people want to deploy static sites, but can we please find a way to bring back search _pages_?
> I dread having to use Algolia search on documentation these days.
agreed.
> but can we please find a way to bring back search _pages_?
could you please explain what do you mean?
In Firefox the "Search for anything" input does not get focused after opening the search dialog.
nice catch! just downloaded firework to test it :) will fix it shortly
Can you talk about how you implemented search-as-you-type? Doing so with semantic search seems tricked given the roundtrips needed to compute embeddings on the fly (assuming the use of OpenAI embeddings)
sure - implementing a search-as-you-type experience with an ai-powered feature was what i wanted to do as well. it doesn't use embeddings at the moment. when you type a short query like 'openai,' it simply runs a basic query using Typesense. however, if you enter a question-like query, such as 'how to llimit api cost,' it transforms it into multiple queries, like 'budget' and 'limit.'
in the self-hosted version, it use the CHAT_COMPLETION_MODEL env variable for selecting the llm model. in our cloud version, we use a fine-tuned version of 4o-mini that we will eventually move to a smaller model like llama8b or even 1b.
Got it! I saw this in the code and assumed you were using embeddings def evaluate(input: shared.EvaluationInput): ds = Dataset.from_list(input.dataset) metrics = [metric_map[metric] for metric in input.metrics]
that piece of code is for llm response evaluation, but we are not really using it at the moment.
This is sweet. I do think the styling on the component could be a bit cleaner though.
Thanks! Could you point out any specific part of the UI that you think could be improved?
Opening up the search needs a softer animation. Take https://ui.shadcn.com/ as an example
got it. I will add some animation in default `canary-modal` implementation.
just FYI - i's very easy to implement custom modal component and swap out the default one.
https://github.com/fastrepl/canary/blob/72723b0/js/apps/docs...
Been looking for something like this! Doc search just hasn't kept up with what's possible now and is such a hassle to get the indexing to work properly. Will try it out!
please let me know how it goes! we have Discord link in the top navbar: https://getcanary.dev/
Does it have the same API? Have been looking for a way to mock the service in development
No, it use different API with Algolia DocSearch.
> Have been looking for a way to mock the service in development
This is what I wanted too! So we have multiple providers, including `canary-provider-mock`. So you can use it to mock it in the dev.
example:
https://github.com/noxify/renoun-docs-template/blob/055b54/s...
Ignorant question: how difficult would it be to integrate this with docusaurus?
It is not that difficult :)
If you want pagefind-based local search:
Doc: https://getcanary.dev/docs/local/integrations/docusaurus Example PR: https://github.com/microsoft/fast/pull/7031/files
If you want hosted-search: Doc: https://getcanary.dev/docs/cloud/integrations/docusaurus Example PR: https://github.com/BerriAI/litellm/pull/6160/files
looks interesting. there is a typo in the headline ("techincal docs") on https://getcanary.dev/
thanks, fixed!
How does it compare to Glean?
Glean is used for searching the workspace (AFAIK, for internal use). Canary is used for searching technical documentation, GitHub issues, etc., and is intended for the users of the project.
+1 on the github issues. It's very useful to have this on the litellm docs
nice! lmk if you have any feedback while using it in the litellm docs.
How did you manage to keep the size so small :O
thanks for noticing!
1. All UI components are written using Lit (lit.dev).
2. I put a lot of effort into making the components as composable as possible, so you can load only what you need.
For anyone interested, we have a chart here: https://getcanary.dev/docs/why#tiny-components-that-work-any...
The name Canary is a bit confusing, since a lot of companies already use Canary to indicate symptoms of issues (re: canary in coal mine). However the app doesn't fulfill this need.
I will give it a try, impressive compression
that's a fair point. I don't think I can rename it at this point, but I'll keep in mind that some people might be confused by the name.
please do try it out, and come to Discord if you want to chat.