Site-Llms.xml – An XML Sitemap Standard for AI-Friendly ECommerce Data

(github.com)

1 points | by nicola_alessi 2 months ago ago

4 comments

mubou 2 months ago

I don't get why people keep trying to make llms.txt happen. Use the standards that already exist.

1. sitemap.xml says /foo exists

2. LLM requests /foo with:

    Accept: text/markdown, text/html;q=0.9

3. Site responds with a markdown rendering of /foo

Done. Alternatively, use <link rel="alternate">. This is a solved problem, and the tools that are already available are more flexible, don't require specific URLs, and aren't LLM-specific.

[-]

nicola_alessi 2 months ago

You're absolutely right that existing standards like sitemap.xml + Accept headers could work in theory, but here's why we built this for eCommerce specifically:

The HTML-to-Markdown Problem Even with Accept: text/markdown, most eCommerce sites will return HTML (then converted server-side). This means:

Scripts/popups in <div> hell ("Subscribe to newsletter!" embedded in product specs)

Ad fragments ("Customers also bought...") polluting context windows

Layout cruft (header/footer markup in every response)

llms.txt files are handcrafted Markdown – no noise, just atomic product data.

Control Over Exposure Retailers want to:

Expose only approved fields (e.g., hide "Compare at $X" prices)

Sanitize dynamically (e.g., remove out-of-stock variants)

Avoid scrapers misusing their HTML endpoints

/site-llms.xml lets them curate what LLMs see, separate from human-facing HTML.

Performance at Scale For catalogs with 100K+ products:

Generating Markdown per-request via Accept headers is expensive

Pre-rendered llms.txt files can be CDN-cached

Sitemap indexes (>50K URLs) are already battle-tested

We’re not replacing sitemap.xml – we’re extending it for a specific use case where clean, pre-processed data matters more than flexibility.

[-]

mubou 2 months ago

Come on, this is clearly copy-pasted from chatgpt. You can plainly see where the headings and bullet points were. And it's just rehashing the benefits of markdown, anyway, which isn't relevant. Did you even bother to read this slop?

> Accept: text/markdown, most eCommerce sites will return HTML

Which means the site doesn't have markdown. So, add it? There are plenty of ways to tackle this, even if you can't modify the server code.

> Generating Markdown per-request via Accept headers is expensive

No one's saying the markdown can't be pre-rendered.

> Pre-rendered llms.txt files can be CDN-cached

Every CDN I've used has a way to vary by Accept. You can even have it redirect to a different url, or use a <link> tag that points to a markdown file. Which might even be called "llms.txt", who knows, who cares. That's the beauty of the existing standards: they're flexible.

Good god. I'm not going to debate against an AI, so don't bother generating a reply.

2 months ago

[deleted]