Following a suggestion from @irskep, I've added CLI command support for the search and download features.
This raised a valid concern - while we're focused on building MCP servers, we shouldn't overlook whether users already have preferred (T/G)UIs available. When they don't exist, we should consider user experience and make our functionality accessible through multiple interfaces beyond just MCP.
What advantage do you get from this being an MCP server rather than simply a command line tool? Genuinely curious, as I'm trying to develop my mental model of when to use one or the other.
I justified the hours I invested by thinking I could search, download, and explore books directly from Claude Desktop. While the initial steps are achievable with a CLI tool, the integration opens up new possibilities.
Some general thoughts:
- You’ll find the MCP mental model similar to the API one.
- MCP integrations make it easier for non-technical users to access tools that were previously too technical.
- An MCP integration implicitly respects a contract, unlike CLIs and GUIs which involve human aspects (aesthetics, information organisation, etc.).
- MCP is an excuse for people to democratize data access. I wrote about this aspect here: https://x.com/iosifache/status/1941049600162574676?s=46
And BTW, that’s a good idea! The functionality should probably also be exposed via CLI.
An MCP server provides enough metadata and self-documentation that it's quite straightforward to make a MCP-agnostic CLI client that adapts an arbitrary MCP server into a set of flags that allow you to call its explicit tools with explicit arguments - without ever needing to involve an LLM in the mix! You could even have that CLI tool launch the MCP server as a local subprocess, if you wanted - again, all deterministically.
And if you want to have an SDK in any language under the sun, once you have an MCP outputting reasonable tool descriptions, any LLM could make a best-in-class SDK for you in a heartbeat following that language's best practices.
So it's not unreasonable for someone working on a greenfield project to make an MCP server first nowadays!
MCP gives the AI agent a list of commands and instructions on how/when to use them in a standard way.
A CLI tool, while it can do the same, doesn't do that in a standard way, so if it's a tool not in the agents training set or in its context it won't know how to use the tool.
This is pretty naive based on my experience. I find it hard to get the LLM to consistently use the tools defined, so "always" is doing some questionable lifting here. I've had easy better luck with pretty simple, linear workflows, so telling your LLM to "know use this clip tool to..." Might be more effective than an agent that has a bunch of tools and might use the expected one. YMMV
> COMMAND LINE: You have to instruct your AI what this tool is and what it should be used for.
Or—and please bear with me, I know this may sound insane—you call the command-line tool yourself, reliably, fast, with little overhead, just like it has worked for decades.
I know, I know, soon most people won’t even know how to unzip their pants without spending unnecessary amounts of electricity and waiting for several seconds for an LLM response, but believe me that just using your hands is a solution worth checking out.
I think the goal of this project is not to simplify the use of Annas Archive for humans (as they could just use the Website anyways), but to allow AI "Agents" to automatically source information out of books while researching without requiring user interaction.
The LLM's default web-browsing tool probably won't or can't download books from AA while looking for information on a subject. This enables it to do so.
i wonder if you're considering the huge difference between:
- individuals pirating copies of things for sole consumption or relatively-miniscule distribution; and
- large highly-funded institutions that pirate content for the sole purpose of generating revenue from it
...and why that might lead people to feel differently about one and the other (not to mention the outsized punitive response to the former compared to the latter).
Lots of things are more important than the evolution of intelligence. This blind faith in technological progress is becoming grossly incompatible with the interests of ordinary people.
TBH, I'm not super impressed with "ordinary people" lately. Most of the time, I don't spare "ordinary people" a thought. Lately, I have, though. The thought is primarily something along the lines of, "How can these people be stopped before they hose us all?"
Edit: Would you accept a PR to override the search and download endpoint hostnames with env vars? For someone who has their own copy and ES index, it might be helpful to support overriding the public endpoint hostnames (/internal/anna/anna.go#L22-L23).
It's doable, as you'll also find MCP servers for reading files [1].
Claude Desktop also has a built-in file reader [2], so you can ask it to read the file and process the content (e.g., generate a summary or even a meta-summary [3]).
My understanding of Anna's Archive is that one has to download large zip files (>10 Gb) containing thousands of books even if one wants only a single book.
Am I correct here?
Does this MCP server allow one to download just a single book?
I remember once using an IPFS based tool to download a single 200-year-old, out-of-copyright copy of "Last of the Mohicans" from Anna's Archive. It worked, but was very very complicated to figure out how to make it work.
I've downloaded single books several times recently (annas-archive.org in the browser):
- search for book
- tap a result
- see a list of links to download mirrors (under 'slow downloads'), tap a link
- get a countdown timer
- timer expires, download links appear
- click a link, book downloads just like any other download
The waiting part is nonexistent if you have an active donation (which is also required by this MCP server for API access). The fast downloads mean you request a book and start downloading it immediately.
You can just go on the website, search for the book you want, and download it. I'd think this is pretty obvious, as if you visit https://annas-archive.org, there are multiple search boxes there that lead directly to downloads.
You are incorrect in your assumption. Though I would also like you to search for "IRC books reddit". Unlike Anna's Archive, you get high quality books with fast download speeds.
Let's say you want to use the tool `ssh` and let's imagine ssh has just been released and it's not in the AI agents training set yet.
It won't know in this case how to use the ssh command line tool.
MCP provides a standard way to tell an AI agent how and when to use tools. So if you had an SSH MCP server, you'd simply plug that in, now your AI agent automagically has SSH capabilities.
> This software does not endorse unauthorized acquisition of copyrighted content and should be regarded solely as a utility. Users are urged to respect the intellectual property rights of authors and acknowledge the considerable effort invested in document creation.
I just provide a hammer. Users decide whether they're hitting their own nail or the metal one.
The comparison might be loose, but the problem is similar to releasing a browser. Do you prevent users from accessing websites you think are malicious or illegal? Or do you delegate that responsibility?
I was hesitant about releasing the MCP server as open source software, but I hope (1) it proves useful for others and (2) people understand that the authors of the books they're reading need money to eat, live, and support their families.
Oh come on. We all know there’s pretty much every novel you’ll find at Barnes&Noble on Anna's Archive as a pirated copy, not just scientific papers.
At least be honest; it’s as much a mundane piracy tool as it is a knowledge repository.
Phrasing "here's a way to pirate novels using an LLM" as "I'm on a mission to grant everyone access to the wealth of humanity's knowledge" is just disingenuous. Sure do novels count as knowledge, but there's a moral difference between making scientific content available to researchers for use in research versus saving money by pirating books.
> The comparison might be loose, but the problem is similar to releasing a browser. Do you prevent users from accessing websites you think are malicious or illegal? Or do you delegate that responsibility?
I might liken the situation more to releasing a browser and setting thepiratebay as the homepage.
That would imply constantly reminding users of an available action, which isn't the case since the MCP server is just a dormant capability that needs to be triggered.
I recently downloaded a number of magazines froma now defunct publication from late eighties. No onesells them and annas archive was the only place i found them. its not exclusivly pirating, its a source for a lot of out of print.
The real crime is an economic system that limits the spread of knowledge and access to other "human rights" by requiring everyone to hustle to survive (and, if possible, increase capital gains for the financial overlords) when we would already be technologically equipped to feed and house well all of mankind, instead of letting thousands of children starve to death each day and restricting access to education so that billions miss out on their intellectual development - a void easily filled with addictive media full of rage and distraction. Pirating books is just a symptom of this wretched system. And it is not enough - RISE, HN! .. towards RBE & beyond..
Interesting project! I’m a little surprised that Claude is willing to call these functions. The demo screenshot is downloading a public domain work, I wonder if it would also happily go along with requests for Harry Potter or other copyrighted material?
> I wonder if it would also happily go along with requests for Harry Potter or other copyrighted material?
There's no way to protect against this. Anna's Archive doesn't include licence information in their data fields. It would be helpful to integrate with another data source that could warn MCP server users when they're attempting potentially risky actions. Please let me know if you have ideas on how to achieve this.
Estimated 25-50 million books are orphaned works, and their copyright holder may step forward at any time after you've treated it as unlicensed, it's perpetually uncertain (but showing due diligence in finding and contacting the rights holder is considered adequate).
For US works published after 1977 and most works between 1898-1945 the US copyright office has a database:
The above is specifically about book copyright on orphaned works.. AFAIK there is no copyrighting of the contents of a blockchain that they could defend (I'm assuming that's what you're referring to). That copyright would hypothetically cover redistribution, which is kind of a necessary aspect of the shared ledger in the first place.
The more appropriate coverage for that would be patents, which protect a process while making its methods public. Since Bitcoin is open and not patented, that isn't a concern either. There are, however, methods of using blockchains that are patented.
Satoshi couldn't come forth and say that all bitcoin is violating a patent, but it's possible that some aspect of other blockchains or some aspect of an application layer built on blockchain is covered by a patent. For more details consult a lawyer specializing in IP law (and btw I'm not a lawyer, in case I accidentally gave that impression).
To complicate things further with patents, be careful about reading patents if you plan on possibly building anything related. If it can be shown that you were aware of a (US) patent that you're violating, then any related suit that comes before you may receive triple the damages that were decided in the case. So, it's very likely that there are some applications violating blockchain-related patents. Someone could suddenly come forward and sue a lot of people who were being willfully ignorant about it.
Hopefully that comes close to answering your question, this space is far too small for all the nuances and I'm not an expert on licenses.
I think (also not an expert) that depends on jurisdiction. I.e. in some places you can download but not distribute, while in others you can’t even download. Others still turn a blind eye at both (there’s a reason Russian trackers/teams are so prevalent).
UPDATE:
Following a suggestion from @irskep, I've added CLI command support for the search and download features.
This raised a valid concern - while we're focused on building MCP servers, we shouldn't overlook whether users already have preferred (T/G)UIs available. When they don't exist, we should consider user experience and make our functionality accessible through multiple interfaces beyond just MCP.
https://github.com/iosifache/annas-mcp/releases/tag/v0.0.2
What advantage do you get from this being an MCP server rather than simply a command line tool? Genuinely curious, as I'm trying to develop my mental model of when to use one or the other.
Lovely project!
Cheers, glad you like it!
I justified the hours I invested by thinking I could search, download, and explore books directly from Claude Desktop. While the initial steps are achievable with a CLI tool, the integration opens up new possibilities.
Some general thoughts:
- You’ll find the MCP mental model similar to the API one. - MCP integrations make it easier for non-technical users to access tools that were previously too technical. - An MCP integration implicitly respects a contract, unlike CLIs and GUIs which involve human aspects (aesthetics, information organisation, etc.). - MCP is an excuse for people to democratize data access. I wrote about this aspect here: https://x.com/iosifache/status/1941049600162574676?s=46
And BTW, that’s a good idea! The functionality should probably also be exposed via CLI.
https://neon.com/blog/building-a-cli-client-for-model-contex... might be of interest.
An MCP server provides enough metadata and self-documentation that it's quite straightforward to make a MCP-agnostic CLI client that adapts an arbitrary MCP server into a set of flags that allow you to call its explicit tools with explicit arguments - without ever needing to involve an LLM in the mix! You could even have that CLI tool launch the MCP server as a local subprocess, if you wanted - again, all deterministically.
And if you want to have an SDK in any language under the sun, once you have an MCP outputting reasonable tool descriptions, any LLM could make a best-in-class SDK for you in a heartbeat following that language's best practices.
So it's not unreasonable for someone working on a greenfield project to make an MCP server first nowadays!
Agreed on all of this! I'm expecting MCP server creation to be natively supported by API libraries. The abstractions are very similar.
FastAPI -> MCP is like 1 LoC.
Thank you, sir!
https://news.ycombinator.com/item?id=44518393
MCP gives the AI agent a list of commands and instructions on how/when to use them in a standard way.
A CLI tool, while it can do the same, doesn't do that in a standard way, so if it's a tool not in the agents training set or in its context it won't know how to use the tool.
COMMAND LINE: You have to instruct your AI what this tool is and what it should be used for.
MCP: You paste one command to register the MCP and your AI will always know what it is and where/why it should be used.
This is pretty naive based on my experience. I find it hard to get the LLM to consistently use the tools defined, so "always" is doing some questionable lifting here. I've had easy better luck with pretty simple, linear workflows, so telling your LLM to "know use this clip tool to..." Might be more effective than an agent that has a bunch of tools and might use the expected one. YMMV
> COMMAND LINE: You have to instruct your AI what this tool is and what it should be used for.
Or—and please bear with me, I know this may sound insane—you call the command-line tool yourself, reliably, fast, with little overhead, just like it has worked for decades.
I know, I know, soon most people won’t even know how to unzip their pants without spending unnecessary amounts of electricity and waiting for several seconds for an LLM response, but believe me that just using your hands is a solution worth checking out.
I think the goal of this project is not to simplify the use of Annas Archive for humans (as they could just use the Website anyways), but to allow AI "Agents" to automatically source information out of books while researching without requiring user interaction.
The LLM's default web-browsing tool probably won't or can't download books from AA while looking for information on a subject. This enables it to do so.
Are you sure? From OPs comment:
>I justified the hours I invested by thinking I could search, download, and explore books directly from Claude Desktop.
Although you're right that it can be used for use cases like you're describing.
Insane how HN cheers on piracy nowadays, all because it helps them train their LLMs
This isn’t about training LLMs at all.
Also, HN like the rest of the world was always pro-piracy and getting the fruits of your labor without paying for it.
The only time I’ve seen anti-piracy comments has been been wrt LLM training. Suddenly people pretend to care but it feels performative.
i wonder if you're considering the huge difference between:
- individuals pirating copies of things for sole consumption or relatively-miniscule distribution; and
- large highly-funded institutions that pirate content for the sole purpose of generating revenue from it
...and why that might lead people to feel differently about one and the other (not to mention the outsized punitive response to the former compared to the latter).
I only see comments complaining about imaginary property "violations" on ai threads funnily enough.
The evolution of intelligence and its intersection with universal access to knowledge is more important than copyright.
To the extent copyright interests want to pick a fight with AI, they must lose, and decisively so.
Lots of things are more important than the evolution of intelligence. This blind faith in technological progress is becoming grossly incompatible with the interests of ordinary people.
TBH, I'm not super impressed with "ordinary people" lately. Most of the time, I don't spare "ordinary people" a thought. Lately, I have, though. The thought is primarily something along the lines of, "How can these people be stopped before they hose us all?"
The only sector of our society that needs to be stopped before they hose us all is the AI labs IMHO.
IMHO the whole annas archive is a false flag op by a major AI startup
Information wants to be free.
love this. god bless anna's archive
Cheers!
Edit: Would you accept a PR to override the search and download endpoint hostnames with env vars? For someone who has their own copy and ES index, it might be helpful to support overriding the public endpoint hostnames (/internal/anna/anna.go#L22-L23).
> local copy of this corpus
Are you referring to the JSON index (https://annas-archive.org/faq#api)?
I'm an LLM noob, but how feasible it is to make a research agent that can not only download articles, but read and reference them in it's process?
It's doable, as you'll also find MCP servers for reading files [1].
Claude Desktop also has a built-in file reader [2], so you can ask it to read the file and process the content (e.g., generate a summary or even a meta-summary [3]).
[1] https://github.com/sylphxltd/pdf-reader-mcp [2] https://github.com/modelcontextprotocol/servers/tree/main/sr... [3] https://x.com/iosifache/status/1942247320302547175
Firecrawl -> Rag -> mcp is the general path
My understanding of Anna's Archive is that one has to download large zip files (>10 Gb) containing thousands of books even if one wants only a single book.
Am I correct here?
Does this MCP server allow one to download just a single book?
I remember once using an IPFS based tool to download a single 200-year-old, out-of-copyright copy of "Last of the Mohicans" from Anna's Archive. It worked, but was very very complicated to figure out how to make it work.
I've downloaded single books several times recently (annas-archive.org in the browser):
The waiting part is nonexistent if you have an active donation (which is also required by this MCP server for API access). The fast downloads mean you request a book and start downloading it immediately.
"Donation". Let's call it what it is - a subscription.
[flagged]
So just 8 or 9 steps to grab a e-book, instead of just letting the LLM recall it from training data... /s
You can just go on the website, search for the book you want, and download it. I'd think this is pretty obvious, as if you visit https://annas-archive.org, there are multiple search boxes there that lead directly to downloads.
You are incorrect in your assumption. Though I would also like you to search for "IRC books reddit". Unlike Anna's Archive, you get high quality books with fast download speeds.
[dead]
Why do people keep building servers for such a silly protocol?
It's easy and has the word AI in it.
It's infinitely useful for people who's workflows involve LLM agents.
Let's say you want to use the tool `ssh` and let's imagine ssh has just been released and it's not in the AI agents training set yet.
It won't know in this case how to use the ssh command line tool.
MCP provides a standard way to tell an AI agent how and when to use tools. So if you had an SSH MCP server, you'd simply plug that in, now your AI agent automagically has SSH capabilities.
People may do things you have no need for or don't agree with.
Welcome to Planet Earth.
> This software does not endorse unauthorized acquisition of copyrighted content and should be regarded solely as a utility. Users are urged to respect the intellectual property rights of authors and acknowledge the considerable effort invested in document creation.
How sincere is that statement?
I just provide a hammer. Users decide whether they're hitting their own nail or the metal one.
The comparison might be loose, but the problem is similar to releasing a browser. Do you prevent users from accessing websites you think are malicious or illegal? Or do you delegate that responsibility?
I was hesitant about releasing the MCP server as open source software, but I hope (1) it proves useful for others and (2) people understand that the authors of the books they're reading need money to eat, live, and support their families.
IMO, you're needlessly taking a defensive stand. It's ok to take a forward looking stand on how access to knowledge should be.
Oh come on. We all know there’s pretty much every novel you’ll find at Barnes&Noble on Anna's Archive as a pirated copy, not just scientific papers. At least be honest; it’s as much a mundane piracy tool as it is a knowledge repository.
I was absolutely saying that. Novels are part of the knowledge too - a scientific paper and a novel have equal weight
Phrasing "here's a way to pirate novels using an LLM" as "I'm on a mission to grant everyone access to the wealth of humanity's knowledge" is just disingenuous. Sure do novels count as knowledge, but there's a moral difference between making scientific content available to researchers for use in research versus saving money by pirating books.
Why is scientific content not saving money, and why can't novels be useful?
> The comparison might be loose, but the problem is similar to releasing a browser. Do you prevent users from accessing websites you think are malicious or illegal? Or do you delegate that responsibility?
I might liken the situation more to releasing a browser and setting thepiratebay as the homepage.
That would imply constantly reminding users of an available action, which isn't the case since the MCP server is just a dormant capability that needs to be triggered.
Hilariously disingenuous.
As sincere as LLM providers not wanting to get sued for the copyrighted content they used.
I'd bet you won't find a single string containing "acquire copyrighted content arrr!", so pretty sincere. The software doesn't endorse it.
As sincere as the user takes it.
[flagged]
I recently downloaded a number of magazines froma now defunct publication from late eighties. No onesells them and annas archive was the only place i found them. its not exclusivly pirating, its a source for a lot of out of print.
The real crime is an economic system that limits the spread of knowledge and access to other "human rights" by requiring everyone to hustle to survive (and, if possible, increase capital gains for the financial overlords) when we would already be technologically equipped to feed and house well all of mankind, instead of letting thousands of children starve to death each day and restricting access to education so that billions miss out on their intellectual development - a void easily filled with addictive media full of rage and distraction. Pirating books is just a symptom of this wretched system. And it is not enough - RISE, HN! .. towards RBE & beyond..
Ai is dependent on piracy .. and apparently that's fine now.
It's not piracy if you download the books to train your LLM.... Apparently.
https://hn.algolia.com/?q=anna%27s+archive
Interesting project! I’m a little surprised that Claude is willing to call these functions. The demo screenshot is downloading a public domain work, I wonder if it would also happily go along with requests for Harry Potter or other copyrighted material?
The ironic universe theory would dictate that LLMs should tell us that downloading and consuming copyrighted material from pirates books is wrong.
Technically, those are already in Claude...
Just give the AI something worse that would happen if it doesn’t call these functions.
Aren’t they all trained on copyrighted material, and lobbying governments to make that legal? Should copyright law only apply to the plebs?
>Aren’t they all trained on copyrighted material, and lobbying governments to make that legal?
Only for them, not for you.
>Should copyright law only apply to the plebs?
In a practical sense, hasn't that been the case for most laws?
Last I checked downloading isn't the issue. It's distributing. Not an expert though.
> I wonder if it would also happily go along with requests for Harry Potter or other copyrighted material?
There's no way to protect against this. Anna's Archive doesn't include licence information in their data fields. It would be helpful to integrate with another data source that could warn MCP server users when they're attempting potentially risky actions. Please let me know if you have ideas on how to achieve this.
On a related note, please see this reply:
https://news.ycombinator.com/reply?id=44515205
It's worse than you'd expect, there's a sizable subset of works with unknown or uncertain license status.
https://wikipedia.org/wiki/Orphan_works_in_the_United_States
https://martincopenhaver.github.io/files/orphanworks.pdf [PDF]
Estimated 25-50 million books are orphaned works, and their copyright holder may step forward at any time after you've treated it as unlicensed, it's perpetually uncertain (but showing due diligence in finding and contacting the rights holder is considered adequate).
For US works published after 1977 and most works between 1898-1945 the US copyright office has a database:
https://publicrecords.copyright.gov/
but I don't know a good catalog for non-US.
Ah, wow, thank you for the links and additional context. This is new information to me.
> Their copyright holder may step forward at any time after you've treated it as unlicensed.
Does this mean that Satoshi can just come and claim that a whole industry is using his/her copyrighted material?
The above is specifically about book copyright on orphaned works.. AFAIK there is no copyrighting of the contents of a blockchain that they could defend (I'm assuming that's what you're referring to). That copyright would hypothetically cover redistribution, which is kind of a necessary aspect of the shared ledger in the first place.
The more appropriate coverage for that would be patents, which protect a process while making its methods public. Since Bitcoin is open and not patented, that isn't a concern either. There are, however, methods of using blockchains that are patented.
Satoshi couldn't come forth and say that all bitcoin is violating a patent, but it's possible that some aspect of other blockchains or some aspect of an application layer built on blockchain is covered by a patent. For more details consult a lawyer specializing in IP law (and btw I'm not a lawyer, in case I accidentally gave that impression).
To complicate things further with patents, be careful about reading patents if you plan on possibly building anything related. If it can be shown that you were aware of a (US) patent that you're violating, then any related suit that comes before you may receive triple the damages that were decided in the case. So, it's very likely that there are some applications violating blockchain-related patents. Someone could suddenly come forward and sue a lot of people who were being willfully ignorant about it.
Hopefully that comes close to answering your question, this space is far too small for all the nuances and I'm not an expert on licenses.
Thanks, good context.
I think (also not an expert) that depends on jurisdiction. I.e. in some places you can download but not distribute, while in others you can’t even download. Others still turn a blind eye at both (there’s a reason Russian trackers/teams are so prevalent).