Google quietly announced that Programmable Search (ex-Custom Search) won’t allow new engines to “search the entire web” anymore. New engines are capped at searching up to 50 domains, and existing full-web engines have until Jan 1, 2027 to transition.
If you actually need whole-web search, Google now points you to an “interest form” for enterprise solutions (Vertex AI Search etc.), with no public pricing and no guarantee they’ll even reply.
This seems like it effectively ends the era of indie / niche search engines being able to build on Google’s index. Anything that looks like general web search is getting pushed behind enterprise gates.
I haven’t seen much discussion about this yet, but for anyone who built a small search product on Programmable Search, this feels like a pretty big shift.
Curious if others here are affected or already planning alternatives.
UPDATE: I logged into Programmable Search and the message is even more explicit: Full web search via the "Search the entire web" feature will be discontinued within the next year. Please update your search engine to specify specific sites to search. With this link: https://support.google.com/programmable-search/answer/123971...
> Kagi
This seems to be true, but more indirectly. From Kagi’s blog [0] which is a follow up to a Kagi blog post from last year [1].
[0]> Google: Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.[^1]
[0]> The current interim approach
(current as of Jan 21, 2026)
[0]> Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results (SERP meaning search engine results page). These providers serve major enterprises (according to their websites) including Nvidia, Adobe, Samsung, Stanford, DeepMind, Uber, and the United Nations.
I’m an avid Kagi user, and it seems like Kagi and some other notable interested parties have _already_ been unable to do get what they want/need with Google’s index.
[0]> The fact that we - and companies like Stanford, Nvidia, Adobe, and the United Nations - have had to rely on third-party vendors is a symptom of the closed ecosystem, not a preference.
Hopefully someone here can clarify for me, or enumerate some of these “third-party vendors” who seem like they will/might/could be directly affected by this.
[0] antibabelic > relevant https://blog.kagi.com/waiting-dawn-search
[1] https://blog.kagi.com/dawn-new-era-search
> [^1]: A note on Google’s existing APIs: Google offers PSE, designed for adding search boxes to websites. It can return web results, but with reduced scope and terms tailored for that narrow use case. More recently, Google offers Grounding with Google Search through Vertex AI, intended for grounding LLM responses. Neither is general-purpose index access. Programmable Search Engine is not designed for building competitive search. Grounding with Google Search is priced at $35 per 1,000 requests - economically unviable for search at scale, and structured as an AI add-on rather than standalone index syndication. These are not the FRAND terms the market needs
I believe they try to indirectly say they are using SerpApi or a similar product that scrapes Google search results to use them. And other big ones use it too so it must be ok...
That must be the reason why they limit the searches you can do in the starter plan. Every SerpApi call costs money.
And I can't prove correlation but they refused to index one of my domains and I think it _might_ be because we had some content on there about how to use SerpAPI
Kagi does not use Google's search index. From their post which made the front page of HN yesterday [1]:
> Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.
They then go on to say that they pay a 3rd party company to scrape Google results (and serve those scraped results to their users). So their search engine is indeed based on unauthorized and uncompensated use of Google's index.
But since they're not using/paying for a supported API but just taking what they want, they indeed are unlikely to be impacted by this API turndown.
Google is a monopoly across several broad categories. They're also a taxation enterprise.
Google Search took over as the URL bar for 91% of all web users across all devices.
Since this intercepts trademarks and brand names, Google gets to tax all businesses unfairly.
Tell your legislators in the US and the EU that Google shouldn't be able to sell ads against registered trademarks (+/- some edit distance). They re-engineered the web to be a taxation system for all businesses across all categories.
Searching for Claude -> Ads in first place
Searching for ChatGPT -> Ads in first place
Searching for iPhone -> Ads in first place
This is inexcusable.
Only searches for "ChatGPT versus", "iPhone reviews", or "Nintendo game comparison" should allow ads. And one could argue that the "URL Bar" shouldn't auto suggest these either when a trademark is in the URL bar.
If Google won't play fair, we have to kill 50% of their search revenue for being egregiously evil.
If you own a trademark, Google shouldn't be able to sell ads against you.
--
Google's really bad. Ideally we'd get an antitrust breakup. They're worse than Ma Bell. I wouldn't even split Google into multiple companies by division - I'd force them to be multiple copies of the same exact entity that then have to compete with each other:
Bell Systems -> {BellSouth, Bell Atlantic, Southwestern Bell, ...}
Google -> {GoogleA, GoogleB, GoogleC, ...}
They'd each have cloud, search, browser, and YouTube. But new brand names for new parent companies. That would create all-out war and lead to incredible consumer wins.
Could probably argued that search access is an essential facility[1], though it doesn't appear antitrust law has anywhere near the same sort of enforcement it did in the past.
what stops Kagi from indexing internet and makes them pay some guys to scrape search results from Google? one guy at Marginalia can do it and entire dev team at a PAID search engine can't?
TIL they allowed that before. It sounds a bit crazy. Like Google is inviting people to repackage google search itself and sell it / serve with their own ads.
I tried it and contributed to searx. It didn't give the same result as Google, and it also have 10k request rate limit (per month I believe). More than that you'll have to "contact us"
This is pretty cool. Don't let the naysayers stop you. Taking a stab at beating Google at their core product is bravery in my book. The best of luck to you!
The input on the results page doesn't work, you always need to return to the start page on which the browser history is disabled. That's just confusing behaviour.
Google's entire (initial) claim-to-fame was "PageRank", referring both to the ranking of pages and co-founder Larry Page, which strongly prioritised a relevance attribute over raw keyword findings (which then-popular alternatives such as Alta Vista, Yahoo, AskJeeves, Lycos, Infoseek, HotBot, etc., relied on, or the rather more notorious paid-rankings schemes in which SERP order was effectively sold). When it was first introduced, Google Web Search was absolutely worlds ahead of any competition. I remember this well having used them previously and adopted Google quite early (1998/99).
Even with PageRank result prioritisation is highly subject to gaming. Raw keyword search is far more so (keyword stuffing and other shenanigans), moreso as any given search engine begins to become popular and catch the attention of publishers.
Google now applies other additional ordering factors as well. And of course has come to dominate SERP results with paid, advertised, listings, which are all but impossible to discern from "organic" search results.
(I've not used Google Web Search as my primary tool for well over a decade, and probably only run a few searches per month. DDG is my primary, though I'll look at a few others including Kagi and Marginalia, though those rarely.)
PageRank was an innovative idea in the early days of the Internet when trust was high, but yes it's absolutely gamed now and I would be surprised if Google still relies on it.
Fair play to them though, it enabled them to build a massive business.
Unfortunately this is the bulk of search engine work. Recursive scraping is easy in comparison, even with CAPTCHA bypassing. You either limit the index to only highly relevant sites (as Marginalia does) or you must work very hard to separate the spam from the ham. And spam in one search may be ham in another.
What do you mean they're not relevant? The top result you linked contained the word stackoverflow didn't it? It's showing you exactly what you searched for. Why would you need a search engine at all if you already know the name of the thing? Just type stackoverflow.com into your address bar.
I feel like Google-style "search" has made people really dumb and unable to help themselves.
the query is just to highlight that relevance is a complex topic. few people would consider "perl blog posts from 2016 that have the stack overflow tag" as the most relevant result for that query.
Unfortunately the index is the easy part. Transforming user input into a series of tokens which get used to rank possible matches and return the top N, based on likely relevence, is the hard part and I'm afraid this doesn't appear to do an acceptable job with any of the queries I tested.
There's a reason Google became so popular as quickly as it did. It's even harder to compete in this space nowadays, as the volume of junk and SEO spam is many orders of magnitude worse as a percentage of the corpus than it was back then.
It's been clear for the last decade that we have to wean ourselves off of centralized search indexes if only to innoculate the Net against censorship/politically motivated black holing.
I can only weep at this point, as the heroes that were the Silent and Greatest generations (in the U.S.), who fought hard to pass on as much institutional knowledge as possible through hardcore organization and distribution via public and University library, have had that legacy shit on by these ad obsessed cretins. The entirety of human published understanding; and we make it nigh impossible for all but the most determined to actually avail themselves of it.
If I understand Kagi's blog post correctly, then here's what happened, chronologically:
Kagi makes deals with many search engines so they can have raw search results in exchange for money.
Google says: no, you can't have raw search results because only whales can get those. Only thing we can offer you is search results riddled with ads and we won't allow you to reorder or filter them.
Kagi thinks Google's offer is unacceptable, so Kagi goes to a third party SERP API, which scrapes Google at scale and sells the raw search results to Kagi and others.
August 2024: Court says Google is breaking the law by selling raw search results only to whales.
December 2025: Court orders that for the next six years, 1. Google must no longer exclude non-whales from buying raw search results, 2. Google must offer the raw search results for a reasonable price, and 3. Google can no longer force partners to bundle the results with ads.
December 2025: Google sues the third-party scraping companies.
January 2026: Google says "hey, the old search offering is going to go away, there's going to be a new API by 2027, stay tuned."
This will significantly impact (quite possibly kill) Startpage and Ecosia, who are effectively white-label Google, right?
What alternatives are there besides Bing? Is it really so hard that it’s not considered worth doing? Some of the AI companies (Perplexity, Anthropic) seem to have managed to get their own indexing up and running.
> The French index is at an advanced stage of completion, we have started creating the German language index, and the English one should start shortly. All progress is quickly integrated into the Qwant STAAN API.
Seriously though. Five years ago Google already became unusable without "site:reddit.com" which is actually hilarious for a search engine that's supposed to search the entire internet. Nowadays reddit is also shit, which means that the only use case for me to use Google or any search engine is to find products that for some reason I don't want to buy on Amazon.
Internet isn't a global village, it's a global ghetto, and it's becoming increasingly true that the only way not to lose is not to play.
The old internet is still there. It hasn‘t gone away; it‘s just undiscoverable with ad-based search. The more slop there is, the more necessary it is to have good search engines.
Recently, I set up a fresh system on a laptop. Ahahahaaa, how utterly crap Google search results now are! It fills me with some stress and disgust to use that. Now one of the first things I do, right after emergency using duckduckgo to search for uBlock Origin and NoScript, is to get Kagi search installed as default search. Then I can continue setting things up more calmly.
The 'Google Graveyard is real' sentiment captures something important: every dependency on a large platform is a loan that can be called in. The 34-million-document indie index project someone mentioned is the right response - own your core infrastructure. Easier said than done for whole-web search, but the same principle applies everywhere.
Google has consistently ruined its search engine in the
last (almost) 10 years. You can find numerous articles
about this, as well as videos on youtube (which is also
controlled by google).
Not long ago they ruined ublock origin (for chrome; ublock origin
lite is nowhere near as good and effective, from my own experience
here).
Now Google is also committing towards more evil and trying to
ruin things for more - people, competitors, you name it. We
can not allow Google to continue on its wiched path here. It'll
just further erode the quality. There is a reason why "killed
by google" is more than a mere meme - a graveyard of things killed
by google.
We need alternatives, viable ones, for ALL Google services. Let's
all work to make this world better - a place without Google.
Are competing search indexes (Bing, Ecosia/Qwant, etc) objectively worse in significant ways, or is Google just so entrenched that people don't want to "risk it" with another provider (and/or preferences and/or inertia).
I suppose I'm asking whether this is actually a _good thing_ in that it will stimulate competition in the space, or if it's just a case that Google's index is now too good for anyone to reasonably catch up at this point.
The beauty about Google Programmable Search across the entire web is that it's free and users can make money by linking it their Adsense account.
Bing charge per query for the average user. Ecosia and Qwant use Bing to power their results, probably under some type of license, which results in them paying much less per query than a normal user.
I can manage fine with other search indexes for English language searches; weather that is because others got better or google got worse i cannot tell, though I suspect the latter.
But for searching in more niche languages google is usually the only decent option and I have little hope that others will ever reach the scale where they could compete.
Bing's index is smaller than Google's, and anecdotally I get fewer relevant results when using it, particularly from sites like Reddit that have exclusive search deals with Google.
"Google will discontinue third-party niche search engine access to full-web search" would be far clearer.
Given that the title supplied is effectively editorialised, and the original article's title is effectively content-free ("Updates to our Web Search Products & Programmable Search Engine Capabilities"), my rewording would be at least as fair.
HN's policy is to try to use text from the article itself where the article title is clickbait, sensational, vague, etc., however. I suspect Google's blog authors are aware of this, and they've carefully avoided any readily-extracted clear statements, though I'll take a stab...
Here's the most direct 'graph from TFA:
Custom Search JSON API: Vertex AI Search is a favorable alternative for up to 50 domains. Alternatively, if your use case necessitates full web search, contact us to express your interest in and get more information about our full web search solution. Your transition to an alternative solution needs to be completed by January 1, 2027.
We can get a clearer, 80-character head that's somewhat faithful to that with:
"Google Search API alternative Vertex AI Search limited to 50 domains" (70 chars).
That's still pretty loosely adherent, though it (mostly) uses words from the original article. I'm suggesting it to mods via email at hn@ycominator.com, others may wish to suggest their own formulations.
Kagi doesn't have a partnership with Google - they work under adversarial interoperability, stealing results from Google against their will, and paying some third-party to enable this. They'd like to simply pay Google, but Google doesn't want their money.
Yeah that's where I started out in 2021. Been at it for almost 5 years now, last three of which full time. I'm indexing about 1.1 billion documents now off a single server.
Hard part is doing it at any sort of scale and producing useful results. It's easy to build something that indexes a few million documents. Pushing into billions is a bigger challenge, as you start needing a lot of increasingly intricate bespoke solutions.
(... though it operates a bit sub-optimally now as I'm using a ton of CPU cores to migrate the index to use postings lists compression, will take about 4-5 days I think).
Might find YaCy interesting. It’s meant to be a decentralised search engine where users scrape the internet and can search other users indexes in a kind of torrent like way.
I found it didn’t really work as a real search engine but it was interesting.
Well you'll get blocked some places but it's not too big of a deal. If you're running an above board operation, you can surprisingly often successfully just email the admin explaining what you're doing, and ask to be unblocked.
Never build a product with core feature depending on a third-party, you will eventually get fucked up for sure. always have a 70:30 rule for revenue where 70% is core independent features.
Soon you'll find that you cannot exist on the web without relying on third parties. Sometimes you'll even have trouble getting paid thanks to the painful existence of payment processors.
True, you can’t exist without 3rd parties. But you shouldn’t let them be your core moat/USP. Jasper is a great example, they depended too much on LLM access, then ChatGPT launched and ate the value. Using third party APIs is fine, but building a product whose core depends on them is suicide.
This is the type of monopoly abuse these laws were designed to target, and antitrust laws actually do work against large companies.
If you actually enforce them.
Unfortunately, during the Reagan administration, political sentiment toward monopolies shifted and since then antitrust law has been a paper tiger at best.
Kind of, however the Google Search Bar present on website is usually there to search across their domain, the search results are limited to their domain e.g example.com/page1, example.com/page2. Google will carry on supporting this.
What they are ending is their support for websites to search across the entire web. The websites that search across the entire web are usually niche search engine websites.
There is literally thousands of independent search engines that use Programmable search to search the entire web. Many ISP providers use it on their homepage, kids-based search engines like wackysafe.com use it, also search engines that focus on privacy like gprivate.com etc
Also LLM tools. Programmable Search Engine API was a way to give third-party LLM frontends the ability to give LLMs a web search tool. Notably, this was a common practice long before any of the major LLM providers added search capabilities to their frontents.
Google quietly announced that Programmable Search (ex-Custom Search) won’t allow new engines to “search the entire web” anymore. New engines are capped at searching up to 50 domains, and existing full-web engines have until Jan 1, 2027 to transition.
If you actually need whole-web search, Google now points you to an “interest form” for enterprise solutions (Vertex AI Search etc.), with no public pricing and no guarantee they’ll even reply.
This seems like it effectively ends the era of indie / niche search engines being able to build on Google’s index. Anything that looks like general web search is getting pushed behind enterprise gates.
I haven’t seen much discussion about this yet, but for anyone who built a small search product on Programmable Search, this feels like a pretty big shift.
Curious if others here are affected or already planning alternatives.
UPDATE: I logged into Programmable Search and the message is even more explicit: Full web search via the "Search the entire web" feature will be discontinued within the next year. Please update your search engine to specify specific sites to search. With this link: https://support.google.com/programmable-search/answer/123971...
What are some of the niche search engines build on Google's index affected by this?
Kagi
> Kagi This seems to be true, but more indirectly. From Kagi’s blog [0] which is a follow up to a Kagi blog post from last year [1].
[0]> Google: Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.[^1]
[0]> The current interim approach (current as of Jan 21, 2026)
[0]> Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results (SERP meaning search engine results page). These providers serve major enterprises (according to their websites) including Nvidia, Adobe, Samsung, Stanford, DeepMind, Uber, and the United Nations.
I’m an avid Kagi user, and it seems like Kagi and some other notable interested parties have _already_ been unable to do get what they want/need with Google’s index.
[0]> The fact that we - and companies like Stanford, Nvidia, Adobe, and the United Nations - have had to rely on third-party vendors is a symptom of the closed ecosystem, not a preference.
Hopefully someone here can clarify for me, or enumerate some of these “third-party vendors” who seem like they will/might/could be directly affected by this.
[0] antibabelic > relevant https://blog.kagi.com/waiting-dawn-search [1] https://blog.kagi.com/dawn-new-era-search > [^1]: A note on Google’s existing APIs: Google offers PSE, designed for adding search boxes to websites. It can return web results, but with reduced scope and terms tailored for that narrow use case. More recently, Google offers Grounding with Google Search through Vertex AI, intended for grounding LLM responses. Neither is general-purpose index access. Programmable Search Engine is not designed for building competitive search. Grounding with Google Search is priced at $35 per 1,000 requests - economically unviable for search at scale, and structured as an AI add-on rather than standalone index syndication. These are not the FRAND terms the market needs
I believe they try to indirectly say they are using SerpApi or a similar product that scrapes Google search results to use them. And other big ones use it too so it must be ok...
That must be the reason why they limit the searches you can do in the starter plan. Every SerpApi call costs money.
Google is also suing SerpAPI
And I can't prove correlation but they refused to index one of my domains and I think it _might_ be because we had some content on there about how to use SerpAPI
They published this the other day:
https://blog.kagi.com/waiting-dawn-search
Which saw some discussion on HN.
> some discussion
~450 score, ~247 comments and still on /best ("Most-upvoted stories of the last 48 hours"):
https://news.ycombinator.com/item?id=46708678 - "Waiting for dawn in search: Search index, Google rulings and impact on Kagi"
Kagi does not use Google's search index. From their post which made the front page of HN yesterday [1]:
> Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.
[1]: https://news.ycombinator.com/item?id=46708678
They then go on to say that they pay a 3rd party company to scrape Google results (and serve those scraped results to their users). So their search engine is indeed based on unauthorized and uncompensated use of Google's index.
But since they're not using/paying for a supported API but just taking what they want, they indeed are unlikely to be impacted by this API turndown.
I think Kagi buys search engine results from SERP vendors who typically scrape Google’s results and offer an API experience on top of it.
No wonder Kagi is angry.
Google is a monopoly across several broad categories. They're also a taxation enterprise.
Google Search took over as the URL bar for 91% of all web users across all devices.
Since this intercepts trademarks and brand names, Google gets to tax all businesses unfairly.
Tell your legislators in the US and the EU that Google shouldn't be able to sell ads against registered trademarks (+/- some edit distance). They re-engineered the web to be a taxation system for all businesses across all categories.
Searching for Claude -> Ads in first place
Searching for ChatGPT -> Ads in first place
Searching for iPhone -> Ads in first place
This is inexcusable.
Only searches for "ChatGPT versus", "iPhone reviews", or "Nintendo game comparison" should allow ads. And one could argue that the "URL Bar" shouldn't auto suggest these either when a trademark is in the URL bar.
If Google won't play fair, we have to kill 50% of their search revenue for being egregiously evil.
If you own a trademark, Google shouldn't be able to sell ads against you.
--
Google's really bad. Ideally we'd get an antitrust breakup. They're worse than Ma Bell. I wouldn't even split Google into multiple companies by division - I'd force them to be multiple copies of the same exact entity that then have to compete with each other:
Bell Systems -> {BellSouth, Bell Atlantic, Southwestern Bell, ...}
Google -> {GoogleA, GoogleB, GoogleC, ...}
They'd each have cloud, search, browser, and YouTube. But new brand names for new parent companies. That would create all-out war and lead to incredible consumer wins.
Could probably argued that search access is an essential facility[1], though it doesn't appear antitrust law has anywhere near the same sort of enforcement it did in the past.
[1] https://en.wikipedia.org/wiki/Essential_facilities_doctrine
what stops Kagi from indexing internet and makes them pay some guys to scrape search results from Google? one guy at Marginalia can do it and entire dev team at a PAID search engine can't?
> “search the entire web”
TIL they allowed that before. It sounds a bit crazy. Like Google is inviting people to repackage google search itself and sell it / serve with their own ads.
I tried it and contributed to searx. It didn't give the same result as Google, and it also have 10k request rate limit (per month I believe). More than that you'll have to "contact us"
You know, back in the days, the web used to be more open. Also - just because you CAN do something, doesn't mean you HAVE to.
It basically means that Google is now transitioning into a private web.
Others have to replace Google. We need access to public information. States can not allow corporations to hold us here hostage.
I built my own web search index on bare metal, index now up to 34m docs: https://greppr.org/
People rely too much on other people's infra and services, which can be decommissioned anytime. The Google Graveyard is real.
This is pretty cool. Don't let the naysayers stop you. Taking a stab at beating Google at their core product is bravery in my book. The best of luck to you!
The input on the results page doesn't work, you always need to return to the start page on which the browser history is disabled. That's just confusing behaviour.
Number of docs isn’t the limiting factor.
I just searched for “stackoverflow” and the first result was this: https://www.perl.com/tags/stackoverflow/
The actual Stackoverflow site was ranked way down, below some weird twitter accounts.
I don't weight home pages in any way yet to bump them up, it's just raw search on keyword relevance.
Google's entire (initial) claim-to-fame was "PageRank", referring both to the ranking of pages and co-founder Larry Page, which strongly prioritised a relevance attribute over raw keyword findings (which then-popular alternatives such as Alta Vista, Yahoo, AskJeeves, Lycos, Infoseek, HotBot, etc., relied on, or the rather more notorious paid-rankings schemes in which SERP order was effectively sold). When it was first introduced, Google Web Search was absolutely worlds ahead of any competition. I remember this well having used them previously and adopted Google quite early (1998/99).
Even with PageRank result prioritisation is highly subject to gaming. Raw keyword search is far more so (keyword stuffing and other shenanigans), moreso as any given search engine begins to become popular and catch the attention of publishers.
Google now applies other additional ordering factors as well. And of course has come to dominate SERP results with paid, advertised, listings, which are all but impossible to discern from "organic" search results.
(I've not used Google Web Search as my primary tool for well over a decade, and probably only run a few searches per month. DDG is my primary, though I'll look at a few others including Kagi and Marginalia, though those rarely.)
<https://en.wikipedia.org/wiki/PageRank>
"The anatomy of a large-scale hypertextual Web search engine" (1998) <http://infolab.stanford.edu/pub/papers/google.pdf> (PDF)
Early (1990s) search engines: <https://en.wikipedia.org/wiki/Search_engine#1990s:_Birth_of_...>.
PageRank was an innovative idea in the early days of the Internet when trust was high, but yes it's absolutely gamed now and I would be surprised if Google still relies on it.
Fair play to them though, it enabled them to build a massive business.
Anchor text information is arguably a better source for relevance ranking in my experience.
I publish exports of the ones Marginalia is aware of[1] if you want to play with integrating them.
[1] https://downloads.marginalia.nu/exports/ grab 'atags-25-04-20.parquet'
Very interesting, and it is very kind of you to share your data like that. Will review!
Sure, but the point is results are not relevant at all?
It’s cool though, and really fast
I'll work on that adjustment, it's fair feedback thanks!
Unfortunately this is the bulk of search engine work. Recursive scraping is easy in comparison, even with CAPTCHA bypassing. You either limit the index to only highly relevant sites (as Marginalia does) or you must work very hard to separate the spam from the ham. And spam in one search may be ham in another.
I limit it to highly relevant curated seed sites, and don't allow public submissions. I'd rather have a small high-quality index.
You are absolutely right, it is the hardest part!
What do you mean they're not relevant? The top result you linked contained the word stackoverflow didn't it? It's showing you exactly what you searched for. Why would you need a search engine at all if you already know the name of the thing? Just type stackoverflow.com into your address bar.
I feel like Google-style "search" has made people really dumb and unable to help themselves.
the query is just to highlight that relevance is a complex topic. few people would consider "perl blog posts from 2016 that have the stack overflow tag" as the most relevant result for that query.
Unfortunately the index is the easy part. Transforming user input into a series of tokens which get used to rank possible matches and return the top N, based on likely relevence, is the hard part and I'm afraid this doesn't appear to do an acceptable job with any of the queries I tested.
There's a reason Google became so popular as quickly as it did. It's even harder to compete in this space nowadays, as the volume of junk and SEO spam is many orders of magnitude worse as a percentage of the corpus than it was back then.
I tested it using a local keyword, as I normally do, and it took me to a Wikipedia page I didn’t know existed. So thanks for that.
It will throw up weird and interesting results sometimes ;-)
I made also something for my own search needs. It's just an SQLite table of domains, and places. I have your search engine there also ;-)
https://github.com/rumca-js/Internet-Places-Database
Demo for most important ones https://rumca-js.github.io/search
Thank you, will check it out!
Lol, a GooglePlus URL was mentionned on a webpage i browsed this week.#blastFromThePast
It's been clear for the last decade that we have to wean ourselves off of centralized search indexes if only to innoculate the Net against censorship/politically motivated black holing.
I can only weep at this point, as the heroes that were the Silent and Greatest generations (in the U.S.), who fought hard to pass on as much institutional knowledge as possible through hardcore organization and distribution via public and University library, have had that legacy shit on by these ad obsessed cretins. The entirety of human published understanding; and we make it nigh impossible for all but the most determined to actually avail themselves of it.
Relevant: Waiting for dawn in search: Search index, Google rulings and impact on Kagi https://news.ycombinator.com/item?id=46708678
This might be me reading it wrong, but isn't shutting down the full-web search going against the ruling mentioned in the Kagi post?
> Google must provide Web Search Index data (URLs, crawl metadata, spam scores) at marginal cost.
Maybe they're shutting down the good integration and then Kagi, Ecosia and others can buy index data in an inconvenient way going forward?
If I understand Kagi's blog post correctly, then here's what happened, chronologically:
Kagi makes deals with many search engines so they can have raw search results in exchange for money.
Google says: no, you can't have raw search results because only whales can get those. Only thing we can offer you is search results riddled with ads and we won't allow you to reorder or filter them.
Kagi thinks Google's offer is unacceptable, so Kagi goes to a third party SERP API, which scrapes Google at scale and sells the raw search results to Kagi and others.
August 2024: Court says Google is breaking the law by selling raw search results only to whales.
December 2025: Court orders that for the next six years, 1. Google must no longer exclude non-whales from buying raw search results, 2. Google must offer the raw search results for a reasonable price, and 3. Google can no longer force partners to bundle the results with ads.
December 2025: Google sues the third-party scraping companies.
January 2026: Google says "hey, the old search offering is going to go away, there's going to be a new API by 2027, stay tuned."
This will significantly impact (quite possibly kill) Startpage and Ecosia, who are effectively white-label Google, right?
What alternatives are there besides Bing? Is it really so hard that it’s not considered worth doing? Some of the AI companies (Perplexity, Anthropic) seem to have managed to get their own indexing up and running.
Excuse the self-promotion but Mojeek offers a web search API (>9 billion pages): https://www.mojeek.com/services/search/web-search-api/
Meanwhile in Europe: Qwant and Ecosia team up to build their own search index: https://blog.ecosia.org/eusp/
It's a noble effort, but they're so late to the game that it's hard to see them making a significant dent. I hope I'm wrong.
They were:
> aiming to serve 30% of French search queries [by end of 2025]
https://blog.ecosia.org/launching-our-european-search-index/
Better late than never.
> The French index is at an advanced stage of completion, we have started creating the German language index, and the English one should start shortly. All progress is quickly integrated into the Qwant STAAN API.
https://noc.social/@327ppm/115934198650900394
They can build whatever they want with lots of #hashtags and public money, but that doesn't mean they'll get 30% of French people to use it.
But of course they managed to cut themselves a nice salary with EU funds, paid in part by me and you, so that's all that matters.
I feel like soon there won’t even be a point having a search engine since almost the entire internet will be useless AI slop.
A search engine doesn't have to search the entire internet. Most of them are extremely opinionated about what they index.
It's as though full-text search of websites you've never heard of was a mistake :)
PageRank wouldn't exist without webrings, directories, and forums you could only search individually, and we thrived on that Internet.
Welcome back, ye olde Internet.
Seriously though. Five years ago Google already became unusable without "site:reddit.com" which is actually hilarious for a search engine that's supposed to search the entire internet. Nowadays reddit is also shit, which means that the only use case for me to use Google or any search engine is to find products that for some reason I don't want to buy on Amazon.
Internet isn't a global village, it's a global ghetto, and it's becoming increasingly true that the only way not to lose is not to play.
The old internet is still there. It hasn‘t gone away; it‘s just undiscoverable with ad-based search. The more slop there is, the more necessary it is to have good search engines.
If you haven't tried Marginalia Search yet, do so. It's a small web search.
Recently, I set up a fresh system on a laptop. Ahahahaaa, how utterly crap Google search results now are! It fills me with some stress and disgust to use that. Now one of the first things I do, right after emergency using duckduckgo to search for uBlock Origin and NoScript, is to get Kagi search installed as default search. Then I can continue setting things up more calmly.
The 'Google Graveyard is real' sentiment captures something important: every dependency on a large platform is a loan that can be called in. The 34-million-document indie index project someone mentioned is the right response - own your core infrastructure. Easier said than done for whole-web search, but the same principle applies everywhere.
Much easier said than done, especially if you are serving users on scale.
Google has consistently ruined its search engine in the last (almost) 10 years. You can find numerous articles about this, as well as videos on youtube (which is also controlled by google).
Not long ago they ruined ublock origin (for chrome; ublock origin lite is nowhere near as good and effective, from my own experience here).
Now Google is also committing towards more evil and trying to ruin things for more - people, competitors, you name it. We can not allow Google to continue on its wiched path here. It'll just further erode the quality. There is a reason why "killed by google" is more than a mere meme - a graveyard of things killed by google.
We need alternatives, viable ones, for ALL Google services. Let's all work to make this world better - a place without Google.
Are competing search indexes (Bing, Ecosia/Qwant, etc) objectively worse in significant ways, or is Google just so entrenched that people don't want to "risk it" with another provider (and/or preferences and/or inertia).
I suppose I'm asking whether this is actually a _good thing_ in that it will stimulate competition in the space, or if it's just a case that Google's index is now too good for anyone to reasonably catch up at this point.
The beauty about Google Programmable Search across the entire web is that it's free and users can make money by linking it their Adsense account.
Bing charge per query for the average user. Ecosia and Qwant use Bing to power their results, probably under some type of license, which results in them paying much less per query than a normal user.
Bing recently shut down their API product, which was already very expensive.
If you want programmatic access to search results there aren't really many options left.
I can manage fine with other search indexes for English language searches; weather that is because others got better or google got worse i cannot tell, though I suspect the latter.
But for searching in more niche languages google is usually the only decent option and I have little hope that others will ever reach the scale where they could compete.
Bing's index is smaller than Google's, and anecdotally I get fewer relevant results when using it, particularly from sites like Reddit that have exclusive search deals with Google.
I had misread the title as "Google is ending (full-web search) for [aka in favour of] (niche search engines)"
The correct parsing is: "Google is ending (full-web search for niche search engines)"
"Google will discontinue third-party niche search engine access to full-web search" would be far clearer.
Given that the title supplied is effectively editorialised, and the original article's title is effectively content-free ("Updates to our Web Search Products & Programmable Search Engine Capabilities"), my rewording would be at least as fair.
HN's policy is to try to use text from the article itself where the article title is clickbait, sensational, vague, etc., however. I suspect Google's blog authors are aware of this, and they've carefully avoided any readily-extracted clear statements, though I'll take a stab...
Here's the most direct 'graph from TFA:
Custom Search JSON API: Vertex AI Search is a favorable alternative for up to 50 domains. Alternatively, if your use case necessitates full web search, contact us to express your interest in and get more information about our full web search solution. Your transition to an alternative solution needs to be completed by January 1, 2027.
We can get a clearer, 80-character head that's somewhat faithful to that with:
"Google Search API alternative Vertex AI Search limited to 50 domains" (70 chars).
That's still pretty loosely adherent, though it (mostly) uses words from the original article. I'm suggesting it to mods via email at hn@ycominator.com, others may wish to suggest their own formulations.
Are search engines like Kagi completely screwed by this or is there a way for them to keep operating?
Kagi doesn't have a partnership with Google - they work under adversarial interoperability, stealing results from Google against their will, and paying some third-party to enable this. They'd like to simply pay Google, but Google doesn't want their money.
I'm curious about what it would take to build my own "toy" search engine with its own index. Anyone ever tried this?
Yeah that's where I started out in 2021. Been at it for almost 5 years now, last three of which full time. I'm indexing about 1.1 billion documents now off a single server.
Hard part is doing it at any sort of scale and producing useful results. It's easy to build something that indexes a few million documents. Pushing into billions is a bigger challenge, as you start needing a lot of increasingly intricate bespoke solutions.
Devlog here:
https://www.marginalia.nu/tags/search-engine/
And search engine itself:
https://marginalia-search.com/
(... though it operates a bit sub-optimally now as I'm using a ton of CPU cores to migrate the index to use postings lists compression, will take about 4-5 days I think).
Curious on what (how much) hardware your running this.
Currently running off
AMD EPYC 7543 x2 for 64 cores/128 threads
512 GB RAM
~ 90 TB of PM9A3 SSDs across 12 physical devices
Storage is not very full though. I'm probably using about a third of it at this point.
Might find YaCy interesting. It’s meant to be a decentralised search engine where users scrape the internet and can search other users indexes in a kind of torrent like way.
I found it didn’t really work as a real search engine but it was interesting.
Good luck scraping websites without being blocked, if you're not Google.
Well you'll get blocked some places but it's not too big of a deal. If you're running an above board operation, you can surprisingly often successfully just email the admin explaining what you're doing, and ask to be unblocked.
Never build a product with core feature depending on a third-party, you will eventually get fucked up for sure. always have a 70:30 rule for revenue where 70% is core independent features.
Soon you'll find that you cannot exist on the web without relying on third parties. Sometimes you'll even have trouble getting paid thanks to the painful existence of payment processors.
True, you can’t exist without 3rd parties. But you shouldn’t let them be your core moat/USP. Jasper is a great example, they depended too much on LLM access, then ChatGPT launched and ate the value. Using third party APIs is fine, but building a product whose core depends on them is suicide.
That's why I eschew HTTPS.
Antitrust do not work against large companies.
Just dissolve them in acid.
This is the type of monopoly abuse these laws were designed to target, and antitrust laws actually do work against large companies.
If you actually enforce them.
Unfortunately, during the Reagan administration, political sentiment toward monopolies shifted and since then antitrust law has been a paper tiger at best.
I heard when Bush came to power, the antitrust complaint against Microsoft monopoly driven by the government was dropped.
Is this about the little Google Search Bar that is present on some websites? Or am I mistaking something
Kind of, however the Google Search Bar present on website is usually there to search across their domain, the search results are limited to their domain e.g example.com/page1, example.com/page2. Google will carry on supporting this.
What they are ending is their support for websites to search across the entire web. The websites that search across the entire web are usually niche search engine websites.
Ahh; so that's the difference. Thanks!
This underscores the core value of cross-platform unified memory.
What examples are there of people using this?
There is literally thousands of independent search engines that use Programmable search to search the entire web. Many ISP providers use it on their homepage, kids-based search engines like wackysafe.com use it, also search engines that focus on privacy like gprivate.com etc
Also LLM tools. Programmable Search Engine API was a way to give third-party LLM frontends the ability to give LLMs a web search tool. Notably, this was a common practice long before any of the major LLM providers added search capabilities to their frontents.
Exactly, Google want every one depended on Gemini.
Is this perhaps to prevent ChatGPT, Claude and Grok to use Google Search? It would make sense for Google to keep that ability for Gemini.
They'll go adversarial interop through SerpAPI, just like Kagi does. SerpAPI will get the money instead of Google getting it.
I suspect its going to hurt the indie developers and small start-ups who do not have special licensing agreements.