So You Want to Define a Well-Known URI

(mnot.net)

80 points | by ingve 6 hours ago ago

41 comments

  • reddalo 4 hours ago

    I wish people would follow this, instead of coming up with new standards in the root namespace. "llms.txt" [1] comes to mind, for example.

    Let's stop polluting the root of a domain!

    [1] https://llmstxt.org/

    • rickette 4 hours ago

      LLMs.txt is also nonsense since it isn't adopted by any of the major AI players.

      • networked 3 hours ago

        Google has recently added `llms.txt` to Chrome's Lighthouse check for agentic browsing (https://searchengineland.com/google-llms-txt-chrome-lighthou...), so adoption may be coming. Admittedly, I put more faith in

          <link rel="alternate" type="text/markdown" href="https://example.com/foo.md" title="Markdown version of the &lt;Foo&gt; page">
        
        that I copied from Gwern.net. This convention is discoverable (just read the HTML) and naturally adapts to any website size and structure.

        I have created an `llms.txt` for my website anyhow. I use a fixed LLM prompt to generate it from the internal links in `index.md`.

        • iamacyborg 2 hours ago

          Giving a markdown version of a page seems like an interesting choice instead of just embedding a schema marked up one

          • vidarh an hour ago

            Every page on code.claude.com has a markdown version available by just appending ".md", and Claude Code knows about it. E.g:

            https://code.claude.com/docs/en/overview and

            https://code.claude.com/docs/en/overview.md

            • 9dev 44 minutes ago

              After some consideration, I also applied this convention to every site I build - including content negotiation: Clients can either send an Accept header with their preference, or append an explicit extension (.md|.markdown for Markdown, .json for JSON API responses, or .html for the human HTML page). Together with the content negotiation part, it feels very much like HTTP was intended to work - especially the fact that API clients, AI agents, and humans all use the same URLs, but get the content in the shape they need.

              • vidarh 12 minutes ago

                I've done this off and on for various sites over the years too, and probably should be more consistent about it. A number of sites do or used to do some variation of this, and I wish it was more widespread. E.g. Reddit will serve up a json version of a sub-dreddit if you do /r/subreddit.json

      • dspillett 2 hours ago

        The same could be said of robots.txt

        And anything else that might tell them not to access something.

  • sandblast 4 hours ago

    No, in fact I don't. But this post wouldn't be of any help anyway. It feels like it's about nothing, there is no substance, just stating some obvious facts. Without examples that lead to some real recommendations, this whole expertise claimed by the author is of no use.

    • wseqyrku 3 hours ago

      The point of the post was that you need to add robots.txt (or similar) because it's a thing, and also tell us where they are.

      • Geezus_42 an hour ago

        > add a robots.txt

        Which bots will then ignore.

  • jiggunjer 15 minutes ago

    Title says uri but post only about urls, a type of uri

  • einpoklum 4 hours ago

    How well-known are those URIs though? :-\

    • eschatology 4 hours ago

      I spent 10 minutes searching for one in the article, in the RFC, in the wikipedia page, on google, to search for a .well-known example. Couldn't find one.

      I did read one before while working with github oidc, and I did find it very useful.

      What is it with technical documentations that go deep describing what it is in plenty words but refusing to give a single example? This far from the first case I've ran into either.

    • reddalo 4 hours ago

      There's an interesting list on Wikipedia: https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-kn...

      • eschatology 4 hours ago

        Not one of them links to the actual well-known resource, only pdf specifications. And several I picked randomly leads to dead ends.

        Here's one I could find: https://accounts.google.com/.well-known/openid-configuration

        But how does one even find this?

        • masklinn 3 hours ago

          well-known is for programmatic access, it either namespaces something you’re told to look for (e.g. various types of domain markers) or it lets you discover a feature / endpoint.

          In the latter case you just probe, for instance if you’re a password manager and you have a password for site A you hit A/.well-known/change-password and if they returns something you can surface a change password link to your user.

          The one you found is for OIDC provider discovery (https://openid.net/specs/openid-connect-discovery-1_0.html#P...) so someone tells you they want to log in via Google, you hit that endpoint, and it lets you setup Google as an oidc provider rather without needing to hard-code providers. Even if you just want to support Google as a provider, you hit that and you get the entire configuration rather than have to hunt down the same information in the docs.

          • eschatology 3 hours ago

            Thank you, that it is part of OIDC provider discovery spec explains a lot.

            That said, I still find it very bizzare that it's so hard to find a tangible example to see how it is in practice.

            The rfc has none. Another spec including the use of it has none. In the end only completed service provider/implementers show it.

            Before programmatic access happens, it needs to be written by a human. Yet the whole thing feels so human-unfriendly.

            Perhaps I am biased robots.txt sets a high bar on how easy it is to find and work with?

    • timwis 4 hours ago

      I agree. I was hoping for a few positive examples, but didn't see any. The only one I know of is the OIDC discovery endpoint.

    • ano-ther 3 hours ago
  • jvuygbbkuurx 4 hours ago

    Why are they so specific?

    Why password-reset instead of a more generic link tree?

    Why discord domain verification instead of domain-verifications with a dynamic list on entries?

    Seems like a waste of time. I would just define my own spec outside of well known for my use case.

    • reddalo 4 hours ago

      Your own spec wouldn't be used by anyone else.

      The password-reset well-known endpoint is used by password managers to show a "Change password..." button in their interface, which magically links to the password change page described in that well-known file.

      • jvuygbbkuurx 4 hours ago

        If the website implements it. What about email preferences? Removing account links? There are many use-cases you might want to redirect a user to, but having to make their own well known for it seems dumb instead of using a more generic one. I guess the more flexible it is, the harder adoption becomes as the usage within a spec might diverge, or it grows outside of the spec and becomes unofficial. So maybe password-reset is correct level of specification.

        Anyway discord domain verification can tell in their onboarding docs to put it anywhere. It being well known does nothing. If there was a root level domain verification, then you might as well put it under that. But otherwise why go through a process?

        • notpushkin 3 hours ago

          It’s just easier for everybody to implement. Password manager opens https://<some-website>/.well-known/change-password in the user’s browser, it gets redirected to the actual page where password change form is located. You could make the password manager look it up in a link tree and then open a correct page, yes, but...

          > I guess the more flexible it is, the harder adoption becomes

          Yeah. If there is one account management related URL that password managers care about, it’s the change password page. You don’t really need to change email on your account that often, but it is probably a good idea to rotate your password once in a while. So I guess it’s a good idea to make it as easy as possible to adopt – which means just a single URL redirecting to another.

          > If the website implements it.

          That’s a good catch, though. I guess right now password managers would still have to make a “preflight” request just to see if /.well-known/change-password is implemented before showing it to the user. (But that can go away if most websites adopt it.)

          • masklinn 3 hours ago

            > That’s a good catch, though. I guess right now password managers would still have to make a “preflight” request just to see if /.well-known/change-password is implemented before showing it to the user. (But that can go away if most websites adopt it.)

            It’s not really a catch? Like robots.txt it’s just something you probe if you have the capabilities to use it. You can just cache the info afterwards.

    • arcfour 4 hours ago

      > Why discord domain verification instead of domain-verifications with a dynamic list on entries?

      The TXT record itself is already a dynamic list of entries. It's far simpler and easier to iterate through the list and compare the start of each value with your search string until you find "discord domain verification" directly than it would be to do anything else.

      Example:

          ;; ANSWER SECTION:
          ycombinator.com.        300     IN      TXT     "openai-domain-verification=dv-QbhxxK0G0JK0dnyZ4YTsNAfw"
          ycombinator.com.        300     IN      TXT     "v=spf1 include:_spf.google.com include:mailgun.org a:rsweb1-36.investorflow.com include:_spf.createsend.com include:servers.mcsv.net -all"
          ycombinator.com.        300     IN      TXT     "MS=ms37374900"
          ycombinator.com.        300     IN      TXT     "anthropic-domain-verification-0qe2ww=yK576oHdDgyTcXgkPfj1KXgGt"
          ycombinator.com.        300     IN      TXT     "ZOOM_verify_2ndw8KZxSRa8PT8NmdyXvw"
          ycombinator.com.        300     IN      TXT     "google-site-verification=KsI69Y_jEVkp4eXqSQ9R9gwxjIpZznvuvrus6UolB9Y"
          ycombinator.com.        300     IN      TXT     "ca3-4861b957e83847c188e45d04ec314ee3"
          ycombinator.com.        300     IN      TXT     "apple-domain-verification=WG0sP5Alm7N6h1Te"
          ycombinator.com.        300     IN      TXT     "dropbox-domain-verification=asc63coma4mv"
          ycombinator.com.        300     IN      TXT     "google-site-verification=GJKdQskycEclAGPua3yXB9m_nVhxbrsVps_y-t9SXV0"
          ycombinator.com.        300     IN      TXT     "Wayback verify for support request 741082"
          ycombinator.com.        300     IN      TXT     "google-site-verification=rivq8jKu6AADGtbbEzJhmOpcqq08B7QxIzXxYV8DtyU"
          ycombinator.com.        300     IN      TXT     "rippling-domain-verification=a660f7a4ab77a3de"
      • bombcar 28 minutes ago

        Domain verifications leak information that they shouldn't - it should be "random key.domain.com in TXT randomkey"

      • teddyh 19 minutes ago

        Having all those TXT records at the domain apex like that makes the TXT query reply huge, which affects, for instance, every mail recipient who merely wants to check the SPF record. This is a bad pattern to follow.

      • sandblast 4 hours ago

        "Domain-verifications" is an invitation for everyone else that might need it to use the same standard and convention. "Discord-domain-verification" is not, it's what feels like polluting the global namespace with the company name that might cease to exist in a few years.

        At the very least, it should be "domain-verification-discord", "-google" and so on. Maybe even "-com.discord", "-com.google"? And the first part clearly standardized and registered, instead of one entity using "domain" and another one "site".

        • arcfour 3 hours ago

          Why?

          • zamadatix an hour ago

            Why reinvent the wheel differently 50,000 times instead? I'll usually even prefer a badly designed, but standard, format/encoding over a NIH one from each company - it's just less friction in the end. Heck - include a common format for the value too, then it opens up doors to automating generation with new sites & automatically validating this config for any site following the common format.

    • notpushkin 3 hours ago

      > discord domain verification

      That’s on Discord. They’re not in the registry: https://www.iana.org/assignments/well-known-uris/well-known-...

      > Why password-reset instead of a more generic link tree?

      [edit: answered in more detail in a sibling thread https://news.ycombinator.com/item?id=48596286]

  • philipwhiuk 3 hours ago

    I'm not sure I like `https://domain.com/.well-known/robots.txt` any better frankly

    • russellbeattie 2 hours ago

      Whoever decided it would be a good idea for ".well-known" to be a "hidden" directory is a complete fool. All it does is provide the opportunity for confusion, misconfiguration, skipped backups, missed git check-ins, forgotten updates and more. Literally the only people a folder like that is hidden from is the whoever is managing the web server.

      Sure, if everyone knows what they're doing, it's not a problem. But we all know how long that assumption lasts.

      • 9dev 38 minutes ago

        The main point of consideration here probably was how to avoid conflicts with URLs of existing sites, not exactly people who aren't able to serve an endpoint with a dot within its path...

      • zamadatix an hour ago

        I think the blog author is the one who wrote the original RFC. To be fair to him, there once was a time web servers were more commonly thought of as truly being remote directories of files you can view or link to, not just domains the browser hides the rest of, and dotfiles would commonly act like dotfiles in local file listings. Nowadays, the assumption is if you go to the base URL it should only ever serve the default page and if you try to go to a directory it should throw an error. Well, unless you're one of those ancient sites like https://ftp.mozilla.org/

        I'm not saying it's good or bad how things turned it, but the choice of a dotfile for this sure did not pan out well as the web went the exact opposite direction it would have been relevant in.