Understanding Round Robin DNS

(blog.hyperknot.com)

117 points | by hyperknot 4 hours ago ago

52 comments

  • jgrahamc 3 hours ago

    Hmm. I've asked the authoritative DNS team to explain what's happening here. I'll let HN know when I get an authoritative answer. It's been a few years since I looked at the code and a whole bunch of people keep changing it :-)

    My suspicion is that this is to do with the fact that we want to keep affinity between the client IP and a backend server (which OP mentions in their blog). And the question is "do you break that affinity if the backend server goes down?" But I'll reply to my own comment when I know more.

    • delusional 2 hours ago

      > I'll let HN know when I get an authoritative answer

      Please remember to include a TTL so I know how long I can cache that answer.

      • jgrahamc an hour ago

        Thank you for appreciating my lame joke.

  • unilynx an hour ago

    > So what happens when one of the servers is offline? Say I stop the US server:

    > service nginx stop

    But that's not how you should test this. A client will see the connection being refused, and go on to the next IP. But in practice, a server may not respond at all, or accept the connection and then go silent.

    Now you're dependent on client timeouts, and round robin DNS will suddenly look a whole lot less attractive to increase reliability.

  • teddyh 2 hours ago

    One of the early proposed solutions for this was the SRV DNS record, which was similar to the MX record, but for every service, not just e-mail. With MX and SRV records, you can specify a list of servers with associated priority for clients to try. SRV also had an extra “weight” parameter to facilitate load balancing. However, SRV did not want the political fight of effectively hijacking every standard protocol to force all clients of every protocol to also check SRV records, so they specified that SRV should only be used by a client if the standard for that protocol explicitly specifies the use of SRV records. This technically prohibited HTTP clients from using SRV. Also, when the HTTP/2 (and later) HTTP standards were being written, bogus arguments from Google (and others) prevented the new HTTP protocols from specifying SRV. SRV seems to be effectively dead for new development, only used by some older standards.

    The new solution for load balancing seems to be the new HTTPS and SVCB DNS records. As I understand it, they are standardized by people wanting to add extra parameters to the DNS in order to to jump-start the TLS1.3 handshake, thereby making fewer roundtrips. (The SVCB record type is the same as HTTPS, but generalized like SRV.) The HTTPS and SVCB DNS record types both have the priority parameter from the SRV and MX record types, but HTTPS/SVCB lack the weight parameter from SRV. The standards have been published, and support seem to have been done in some browsers, but not all have enabled it. We will see what browsers will actually do in the near future.

    • jsheard 2 hours ago

      > The new solution for load balancing seems to be the new HTTPS and SVCB DNS records. As I understand it, they are standardized by people wanting to add extra parameters to the DNS in order to to jump-start the TLS1.3 handshake, thereby making fewer roundtrips.

      The other big advantage of the HTTPS record is that it allows for proper CNAME-like delegation at the domain apex, rather than requiring CNAME flattening hacks that can cause routing issues on CDNs which use GeoDNS in addition to or instead of anycast. If you've ever seen a platform recommend using a www subdomain instead of an apex domain, that's why, and it's part of why Akamai pushed for HTTPS records to be standardized since they use GeoDNS.

  • realchaika 7 minutes ago

    May be worth mentioning Zero downtime failover is a Pro or higher feature I believe, that's how it was documented before as well, back when protect your origin server docs were split by plan level. So you may see different behavior/retries.

  • tetha 3 hours ago

    > As you can see, all clients correctly detect it and choose an alternative server.

    This is the nasty key point. The reliability is decided client-side.

    For example, systemd-resolved at times enacted maximum technical correctness by always returning the lowest IP address. After all, DNS-RR is not well-defined, so always returning the lowest IPs is not wrong. It got changed after some riots, but as far as I know, Debian 11 is stuck with that behavior, or was for a long time.

    Or, I deal with many applications with shitty or no retry behavior. They go "Oh no, I have one connection refused, gotta cancel everything, shutdown, never try again". So now 20% - 30% of all requests die in a fire.

    It's an acceptable solution if you have nothing else. As the article notices, if you have quality HTTP clients with a few retries configured on them (like browsers), DNS-RR is fine to find an actual load balancer with health checks and everything, which can provide a 100% success rate.

    But DNS-RR is no loadbalancer and loadbalancers are better.

    • nerdile 2 hours ago

      It's putting reliability in the hands of the client, or whatever random caching DNS resolver they're sitting behind.

      It also puts failover in those same hands. If one of your regions goes down, do you want the traffic to spread evenly to your other regions? Or pile on to the next nearest neighbor? If you care what happens, then you want to retain control of your traffic management and not cede it to others.

    • latchkey 3 hours ago

      > It's an acceptable solution if you have nothing else.

      I'd argue it isn't acceptable at all in this day and age and that there are other solutions one should pick today long before you get to the "nothing else" choice.

      • toast0 2 hours ago

        Anycast is nice, but it's not something you can do yourself well unless you have large scale. You need to have a large number of PoPs, and direct connectivity to many/most transit providers, or you'll get weird routing.

        You also need to find yourself some IP ranges. And learn BGP and find providers where you can use it.

        DNS round robin works as long as you can manage to find two boxes to run your stuff on, and it scales pretty high too. When I was at WhatsApp, we used DNS round robin until we moved into Facebook's hosting where it was infeasible due to servers not having public addresses. Yes, mostly not browsers, but not completely browserless.

        • latchkey 2 hours ago

          Back in 2013, that might have been the best solution for you. But there were still plenty of headlines... https://www.wamda.com/2013/11/whatsapp-goes-down

          We're talking about today.

          The reason why I said Anycast is cause the vast majority of people trying to solve the need for having multiple servers in multiple locations, will just use CF or any one of the various anycast based CDN providers available today.

          • toast0 2 minutes ago

            Oh sure, we had many outages. More outages on the one service where we tried using loadbalancers because the loadbalancers would take a one hour break every 30 days (which is pretty shitty, but that was the load balancer available, unless we wanted to run a software load balancer, which didn't make any sense).

            We didn't have many outages due to DNS, because we had fallback ips to contact chat in our clients. Usage was down in the 24 hours after our domain was briefly hijacked (thanks Network Solutions), and I think we lost some usage when our DNS provider was DDoSed by 'angry gamers'. But when FB broke most of their load balancers, that was a much bigger outage. BGP based outages broke everything, DNS and load balancers, so no wins there.

  • metadat 2 hours ago

    > This allows you to share the load between multiple servers, as well as to automatically detect which servers are offline and choose the online ones.

    To [hesitantly] clarify a pedantry regarding "DNS automatic offline detection":

    Out of the box, RR-DNS is only good for load balancing.

    Nothing automatic happens on the availability state detection front unless you build smarts into the client. TFA introduction does sort of mention this, but it took me several re-reads of the intro to get their meaning (which to be fair could be a PEBKAC). Then I read the rest of TFA, which is all about the smarts.

    If the 1/N server record selected by your browser ends up being unavailable, no automatic recovery / retry occurs at the protocol level.

    p.s. "Related fun": Don't forget about Java's DNS TTL [1] and `.equals()' [2] behaviors.

    [1] https://stackoverflow.com/questions/1256556/how-to-make-java...

    [2] https://news.ycombinator.com/item?id=21765788 (5y ago, 168 comments)

    • encoderer 2 hours ago

      We accomplish this on Route53 by having it pull servers out of the dns response if they are not healthy, and serving all responses with a very low ttl. A few clients out there ignore ttl but it’s pretty rare.

      • d_k_f an hour ago

        Honest question to somebody who seems to have a bit of knowledge about this in the real world: several (German, if relevant) providers default to a TTL of ~4 hours. Lovely if everything is more or less finally set up, but usually our first step is to decrease pretty much everything down to 60 seconds so we can change things around in emergencies.

        On average, does this really matter/make sense?

        • stackskipton 3 minutes ago

          Lower TTLs is cheap insurance so you can move hostnames around.

          However, you should understand that not ALL clients will respect those TTLs. There are resolvers that may minimum TTL threshold where IF TTL < Threshold, TTL == Threshold, Common with some ISPs, and also, there may be cases where browsers and operating systems will ignore TTLs or fudge them.

      • ChocolateGod 2 hours ago

        I once achieved something similar with PowerDNS, which you can use LUA rules to do health checks on a pool of servers and only return health servers as part of the DNS record, but found odd occurrences of clients not respecting the TTL on DNS records and caching too long.

        • tetha an hour ago

          You usually do this with servers that should be rock-solid and stateless. HAProxy, Traefik, F5. That way, you can pull the DNS record for maintenance 24 - 48 hours in advance. If something overrides DNS TTLs that much, there is probably some reason.

  • edm0nd 19 minutes ago

    The dark remix version of this is fast flux hosting and what a lot of the bulletproof hosting providers use.

    https://unit42.paloaltonetworks.com/fast-flux-101/

  • latchkey 3 hours ago

      > "It's an amazingly simple and elegant solution that avoids using Load Balancers."
    
    When a server is down, you have a globally distributed / cached IP address that you can't prevent people from hitting.

    https://www.cloudflare.com/learning/dns/glossary/round-robin...

    • toast0 2 hours ago

      Skipping an unnecessary intermediary is worth considering.

      Load balancing isn't without cost, and load balancers subtly (or unsubtly) messing up connections is an issue. I've also used providers where their load balancers had worse availability than our hosts.

      If you control the clients, it's reasonable to call the platform dns api to get a list of ips and shuffle and iterate through in an appropriate way. Even better if you have a few stablely allocated IPs you can distribute in client binaries for when DNS is broken; but DNS is often not broken and it's nice to use for operational changes without having to push new configuration/binaries everytime you update the cluster.

      If your clients are browsers, default behavior is ok; they usually use IPs in order, which can be problematic [1], but otherwise, they have good retry behavior: on connection refused they try another IP right away, in case of timeout, they try at least a few different IPs. It's not ideal, and I'd use a load balancer for browsers, at least to serve the initial page load if feasible, and maybe DNS RR and semi-smart client logic in JS for websockets/etc; but DNS RR is workable for a whole site too.

      If your clients are not browsers and not controlled by you, best of luck?

      I will 100% admit that sometimes you have to assume someone built their DNS caching resolver to interpret the TTL field as a number of days, rather than number of seconds. And that clients behind those resolvers will have trouble when you update DNS, but if your loadbalancer is behind a DNS name, when it needs to change addresses, you'll deal with that then, and you won't have experience.

      [1] one of the RFCs suggests that OS apis should sort responses by prefix match, which might make sense if IP prefixes were heirarchical as a proxy to get to a least network distance server. But in the real world, numerically adjacent /24s are often not network adjacent, but if your servers have widely disparate addresses, you may see traffic from some client ips gravitate towards numerically similar server ips.

      • ectospheno 2 hours ago

        > I will 100% admit that sometimes you have to assume someone built their DNS caching resolver to interpret the TTL field as a number of days, rather than number of seconds.

        I’ve run a min ttl of 3600 on my home network for over a year. No one has complained yet.

    • wongarsu 3 hours ago

      An clients tested in the article behaved correctly and chose one of the reachable servers instead.

      Of course somebody will inevitably misconfigure their local DNS or use a bad client. Either you accept an outage for people with broken setups or you reassign the IP to a different server in the same DC.

      • latchkey 3 hours ago

        If you know all of your clients, then you don't even need DNS. But, you don't know all of your clients. Nor do you always know your upstream DNS provider.

        Design for failure. Don't fabricate failure.

        • zamadatix 3 hours ago

          Why would knowing your clients change whether or not you want to use DNS? Even when you control all of the clients you'll almost always want to keep using DNS.

          A large number of services successfully achieve their failure tolerances via these kinds of DNS methods. That doesn't mean all services would or that it's always the best answer, it just means it's a path you can consider when designing for the needs of a system.

          • latchkey 3 hours ago

            I'm replying to the comment above. If the article picks a few clients and it happens to work, that is effectively "knowing your clients". At which point, it means you have control over the client/server relationship and if we are trying to simplify by not using load balancers, we might as well simplify things even further, and not use DNS.

            It is an absurd train of thought that nobody in their right mind would consider... just like using DNS-RR as a replacement for load balancing.

            • zamadatix 3 hours ago

              I must be having trouble following your train of thought here - many large web services like Cloudflare and Akamai serve large volumes of content through round robin DNS balancing, what's absurd about their success? They certainly don't know every client that'll ever connect to a CDN on the internet... it just happens to work almost every time anyways. That very few clients might not instantly flip over isn't always a design failure worth deploying full load balancers. I'm also still not following why the decisions for whether or not you need a load balancer are supposed to be in any way equivalent to the decisions of when using DNS would make sense or not?

              • latchkey 2 hours ago

                We are not talking about "large web services", we are talking about small end users spinning up their own DNS-RR "solution".

                LWS get away with it because of Anycast...

                https://www.cloudflare.com/en-gb/learning/cdn/glossary/anyca...

                • zamadatix 2 hours ago

                  Anycast is certainly a nice layer to add but it's not a requirement for DNS round robin to work reliably. It does save some of the concern around relying on selection of an efficiently close choice by the client though and can be a good option for failover.

                  More directly - is there some set of common web client I've been missing for many years that just doesn't follow DNS TTLs or try alternate records? I think the article gets it right with the wish list at the end containing a Amazon Route 53-like "pull dead entries automatically" note but maybe I'm missing something else? I've used this approach (pull the dead server entries from DNS, wait for TTL) and never caught any unexpected failures during outages but maybe I haven't been looking in the right places?

                  If you mean it's possible to design something with round-robin DNS in a way that more clients than you expect will fail then absolutely, you can do things the wrong way with most any solution. Sometimes you can be fine with a subset of clients not always working during an outage or you can be fine with a solution which provides slower failover than an active load balancer. What I'm trying to find is why round-robin DNS must always be the wrong answer in all design cases.

                  • buzer 2 hours ago

                    > More directly - is there some set of common web client I've been missing for many years that just doesn't follow DNS TTLs or try alternate records?

                    I don't know if there is such a list but older versions of Java are pretty famous for caching the DNS responses indefinitely. I don't hear much about it these days so I assume it was probably fixed around Java 8.

                  • latchkey 2 hours ago

                    > is there some set of common web client I've been missing for many years that just doesn't follow DNS TTLs or try alternate records?

                    Yes. There are tons of people with outdated and/or buggy software still using the internet today.

                    • zamadatix 2 hours ago

                      What % did you find to be "tons" with these specific bugs? I'm assuming it was quite a significant number (at least 10%?) that broke badly quite often given the certainty it's the wrong decision for all solutions, any idea how to help me identify which clients I've been missing or might run into? DNS TTLs are also pretty necessary for most web systems to work reliably, regardless of load balancer or not, so what ways do you work around having large numbers of clients which don't obey them (beyond hoping to permanently occupy the same set of IPs for the life of the service of course)?

                      • latchkey 6 minutes ago

                        The percentage is kind of irrelevant. The issue is that if you're running something like an e-commerce site and any percentage of people can't hit your site because of a TTL issue with one of your down servers, you're likely to never know how much lost revenue you've had. Site is down, go to another store to buy what you need. You also have no control over fixing the issue, other than to get the server back and running. This has downstream effects, how do you cycle the server for upgrades or maintenance?

                        I don't understand why anyone would argue for this as a solution when there are near zero effort better ways of doing this that don't have any of the negative downsides.

    • arrty88 3 hours ago

      The standard today is to use a relatively low TTL and to health check the members of the pool from the dns server.

      • latchkey 3 hours ago

        That's like saying there are traffic rules in Saigon.

        Exact implementation of TTL, is a suggestion.

  • stackskipton 9 minutes ago

    As SRE, I get a chuckle out of this article and some of the responses. Devs mess this up constantly.

    DNS has one job. Hostname -> IP. Nothing further. You can mess with it on server side like checking to see if HTTP server is up before delivering the IP but once IP is given, the client takes over and DNS can do nothing further so behavior will be wildly inconsistent IME.

    Assuming DNS RR is standard where Hostname returns multiple IPs, then it's only useful for load balancing in similar latency datacenters. If you want fancy stuff like geographic load balancing or health checks, you need fancy DNS server but at end of day, you should only return single IP so client will target the endpoint you want them to connect to.

  • hypeatei 2 hours ago

    The browser behavior is really nice, good to know that it falls back quickly and smoothly. Round robin DNS has always been referred to as a "poor mans load balancer" which it seems to be living up to.

    > Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

    This took two tries for me, which begs the question how curl is keeping track of RTT (round trip times), interesting.

  • rebelde an hour ago

    I have use round robin for years.

    Wish I could add instructions like:

    - random choice #round robin, like now

    - first response # usually connects to closest server

    - weights (1.0.0.1:40%; 2.0.0.2:60%)

    - failover: (quick | never)

    - etc: naming countries, continents

  • zamalek 3 hours ago

    Take a look at SRV records instead - they are very intentionally designed for this, and behave vaguely similarly to MX. Creating a DNS server (or a CoreDNS/whatever module) that dynamically updates weights based on backend metrics has been a pending pet project of mine for some time now.

    • jeroenhd 9 minutes ago

      Until the HTTP spec gets updated to include SRV records, using SRV records for HTTP(S) is technically spec-incompliant and practically useless.

      However, as is common with web tech, the old SRV record has been reinvented as the SVCB record with a smidge of DANE for good measure.

  • urbandw311er 3 hours ago

    What a great article! It’s often easy to forget just how flexible and self-correcting the “official” network protocols are. Thanks to the author for putting in the legwork.

  • cybice 3 hours ago

    Cloudflare results with worker as a reverse proxy can be much better.

    • easylion 2 hours ago

      But won't it add an additional hop hence additional latency to every single request ?

  • V__ 2 hours ago

    This seems like a nice solution for zero-downtime updates. Clone the server, add a the specified ip, deny access to the main one, upgrade and turn the cloned server off.

  • specto 3 hours ago

    Chrome and Firefox use the OS dns server by default, which in most OS' have caching as well.

  • easylion 3 hours ago

    did you try running a simple bash curl loop instead of manually printing. The data and statistics will be become exactly clear. Because i want to understand how to ensure my clients get the nearest edge data center

  • meindnoch 3 hours ago

    So half of your content is served from another server? Sounds like a recipe for inconsistent states.

    • ChocolateGod 2 hours ago

      You can easily use something like an object store or shared database to keep data consistent.

  • easylion 3 hours ago