Major AWS Outage Happening

(old.reddit.com)

696 points | by vvoyer 3 hours ago ago

304 comments

  • littlecranky67 an hour ago

    Just a couple of days ago in this HN thread [0] there were quite some users claiming Hetzner is not an options as their uptime isn't as good as AWS, hence the higher AWS pricing is worth the investment. Oh, the irony.

    [0]: https://news.ycombinator.com/item?id=45614922

    • jpalomaki an hour ago

      When AWS is down, everybody knows it. People don’t really question your hosting choice. It’s the IBM of cloud era.

      • patshead 9 minutes ago

        On the other side of that coin, I am excited to be up and running while everyone else is down!

      • sisve an hour ago

        That is 100% true. You cant be fired for picking AWS... But I doubt its the best choice for most people. Sad but true

        • zejn an hour ago

          You can't be fired, but you burn through your runway quicker. No matter which option you choose, there is some exothermic oxidative process involved.

          • rafaelmn 32 minutes ago

            AWS is smart enough to throw you a few mill credits to get you started.

            • dijit 18 minutes ago

              MILL?!

              I only got €100.000 bounded to a year, then a 20% discount for spend in the next year.

              (I say "only" because that certainly would be a sweeter pill, €100.000 in "free" credits is enough to make you get hooked, because you can really feel the free-ness in the moment).

        • dijit 31 minutes ago

          Schrodingers user;

          Simultaneously too confused to be able to make their own UX choices, but smart enough to understand the backend of your infrastructure enough to know why it doesn't work and excuses you for it.

          • 1dom 25 minutes ago

            The morning national TV news (BBC) was interrupted with this as breaking news, and about how many services (specifically snapchat for some reason) are down because of problems with "Amazon's Web Services, reported on DownDetector"

            I liked your point though!

          • Macha 27 minutes ago

            Well, at that level of user they just know "the internet is acting up this morning"

            • dijit 22 minutes ago

              I thought we didn't like when things were "too big to fail" (like the banks being bailed out because if we didn't the entire fabric of our economy would collapse; which emboldens them to take more risks and do it again).

        • brazukadev an hour ago

          Usually, 2 founders creating a startup can't fire each other anyway so a bad decision can still be very bad for lots of people in this forum

      • abujazar 20 minutes ago

        That depends on the service. Far from everyone is on their PC or smartphone all day, and even fewer care about these kinds of news.

      • jamesbelchamber 41 minutes ago

        To back up this point, currently BBC News have it as their most significant story, with "live" reporting: https://www.bbc.co.uk/news/live/c5y8k7k6v1rt

        This is alongside "live" reporting on the Israel/Gaza conflict as well as news about Epstein and the Louvre heist.

        This is mainstream news.

        • addandsubtract 28 minutes ago

          I like how their headline starts with Snapchat and Roblox being affected.

          • Maxion 26 minutes ago

            The journalist found out about it from their tween.

      • petesergeant an hour ago

        100%. When AWS was down, we'd say "AWS is down!", and our customers would get it. Saying "Hetzner is down!" raises all sorts of questions your customers aren't interested in.

        • sph 39 minutes ago

          I've ran a production application off Hetzner for a client for almost a decade and I don't think I have had to tell them "Hetzner is down", ever, apart from planned maintenance windows.

      • throw-10-13 7 minutes ago

        most people dont even know aws exists

      • stefan_ an hour ago

        And yet they still all activate their on call people (wait why do we have them if we are on the cloud?) to do .. nothing at all.

    • jwr 14 minutes ago

      As a data point, I've been running stuff at Hetzner for 10 years now, in two datacenters (physical servers). There were brief network outages when they replaced networking equipment, and exactly ONE outage for hardware replacement, scheduled weeks in advance, with a 4-hour window and around 1-2h duration.

      It's just a single data point, but for me that's a pretty good record.

      It's not because Hetzner is miraculously better at infrastructure, it's because physical servers are way simpler than the extremely complex software and networking systems that AWS provides.

    • neverminder an hour ago

      You can argue about Hetzner's uptime, but you can 't argue about Hetzner's pricing which is hands down the best there is. I'd rather go with Hetzner and cobble up together some failover than pay AWS extortion.

      • Lio an hour ago

        For the price of AWS you could run Hetzner, a second provider for resiliancy and still make a large saving.

        Your margin is my opportunity indeed.

        • k4rli 32 minutes ago

          I switched to netcup for even cheaper private vps for personal noncritical hosting. I'd heard of netcup being less reliable but so far 4 months+ uptime and no problems. Europe region.

          Hetzner has the better web interface and supposedly better uptime, but I've had no problems with either. Web interface not necessary at all either when using only ssh and paying directly.

          • wasmitnetzen 10 minutes ago

            I've been running my self-hosting stuff on Netcup for 5+ years and I don't remember any outages. There probably were some, but they were not significant enough for me to remember.

          • Aldipower 21 minutes ago

            I am on Hetzner with a primary + backup server and on Netcup (Vienna) with a secondary. For DNS I am using ClouDNS.

            I think I am more distributed then most of the AWS folks and it still is way cheaper.

        • benterix an hour ago

          Exactly. Hetzner is the equivalent of the original Raspberry Pi. It might not have all fancy features but it delivers and for the price that essentially unblocks you and allows you to do things you wouldn't be able to do otherwise.

          • esskay 35 minutes ago

            They've been working pretty hard on those extra features. Their load balancing across locations is pretty decent for example.

      • motorest an hour ago

        > I'd rather go with Hetzner and cobble up together some failover than pay AWS extortion.

        Comments like this are so exaggerated that they risk moving the goodwill needle back to where it was before. Hetzner offers no service that is similar to DynamoDB, IAM or Lambda. If you are going to praise Hetzner as a valid alternative during a DynamoDB outage caused by DNS configuration, you would need to a) argue that Hetzner is a better option regarding DNS outages, b) Hetzner is a preferable option for those who use serverless offers.

        I say this as a long-time Hetzner user. Herzner is indeed cheaper, but don't pretend that Herzner let's you click your way into a highly-availale nosql data store. You need non-trivial levels of you're ow work to develop, deploy, and maintain such a service.

        • 1dom 16 minutes ago

          > but don't pretend that Herzner let's you click your way into a highly-availale nosql data store.

          The idea you can click your way to a highly available, production configured anything in AWS - especially involving Dynamo, IAM and Lambda - is something I've only heard from people who've done AWS quickstarts but never run anything at scale in AWS.

          Of course nobody else offers AWS products, but people use AWS for their solutions to compute problems and it can be easy to forget virtually all other providers offer solutions to all the same problems.

        • esskay 33 minutes ago

          Are you Netflix? Because is not theres a 99% probability you dont need any of those AWS services and just have a severe case of shiny object syndrome in your organisation.

          Plenty of heavy traffic, high redundancy applications exist without the need for AWS (or any other cloud providers) overpriced "bespoke" systems.

          • rafaelmn 25 minutes ago

            To be honest I don't trust myself running a HA PostgreSQL setup with correct backups without spending an exorbitant effort to investigate everything (weeks/months) - do you ? I'm not even sure what effort that would take. I can't remember last time I worked with unmanaged DB in prod where I did not have a dedicated DBA/sysadmin. And I've been doing this for 15 years now. AFAIK Hetzner offers no managed database solution. I know they offer some load balancer so there's that at least.

            At some point in the scaling journey bare metal might be the right choice, but I get the feeling a lot of people here trivialize it.

          • sofixa 20 minutes ago

            > Plenty of heavy traffic, high redundancy applications exist without the need for AWS (or any other cloud providers) overpriced "bespoke" systems.

            And almost all of them need a database, a load balancer, maybe some sort of cache. AWS has got you covered.

            Maybe some of them need some async periodic reporting tasks. Or to store massive files or datasets and do analysis on them. Or transcode video. Or transform images. Or run another type of database for a third party piece of software. Or run a queue for something. Or capture logs or metrics.

            And on and on and and on. AWS has got you covered.

            This is Excel all over again. "Excel is too complex and has too many features, nobody needs more than 20% of Excel. It's just that everyone needs a different 20%".

        • ViewTrick1002 28 minutes ago

          If you need the absolutely stupid scale DynamoDB enables what is the difference compared to running for example FoundationDb on your own using Hetzner?

          You will in both cases need specialized people.

        • mschuster91 38 minutes ago

          > Hetzner offers no service that is similar to DynamoDB, IAM or Lambda.

          The key thing you should ask yourself: do you need DynamoDB or Lambda? Like "need need" or "my resume needs Lambda".

          • darkwater 29 minutes ago

            Well, Lambda scales down to 0 so I don't have to pay for the expensive EC2 instan... oh, wait!

      • sreekanth850 43 minutes ago

        TBH, in my last 3 years with Hetzner, i never saw a downtime to my servers other than myself doing some routin maitenance for os updates. Location Falkenstein.

        • whizzter 32 minutes ago

          You really need your backup procedures and failover procedures though, a friend bought a used server and the disk died fairly quickly leaving him sour.

          • esseph 2 minutes ago

            THE disk?

            It's a server! What in the world is your friend doing running a single disk???

            Ate a bare minimum they should have been running a mirror.

        • ratg13 29 minutes ago

          And I have seen them delete my entire environment including my backups due to them not following their own procedures.

          Sure, if you configure offsite backups you can guard against this stuff, but with anything in life, you get what you pay for.

    • krsdcbl 23 minutes ago

      We've been running our services on Hetzner for 10 years, never experienced any significant outages.

      That might be datacenter dependant of course, since our root servers and cloud services are all hosted in Europe, but I really never understood why Hetzner is said to be less reliable

    • bert-ye an hour ago

      I work at a small / medium company with about ~20 dedicated servers and ~30 cloud servers at Hetzner. Outages have happened, but we were lucky that the few times it did happen, it was never a problem / actual downtime.

      One thing to note is that there are some scheduled maintenances were we needed to react.

    • bigblind an hour ago

      I don't have an opinion either way, but for now, this is just anecdotal evidence.

      • brazukadev an hour ago

        Looks fine for pointing an irony

        • bigblind an hour ago

          In some ways yes. But in some ways this is like saying it's more likely to rain on your wedding day.

    • koliber 26 minutes ago

      My reocommendation is to use AWS, but not the US-EAST-1 region. That way you get benefits of AWS without the instability.

      • dijit 15 minutes ago

        AWS has internal dependencies on US-EAST-1.

        Admittedly they're getting fewer and fewer, but they exist.

        The same is also true in GCP, so as much as I prefer GCP from a technical standpoint: the truth is, if you can't see it, it doesn't mean it goes away.

      • maccard 14 minutes ago

        We have nothing deployed in us east 1, yet all of our CI was failing due to IAM errors this morning.

    • yard2010 38 minutes ago

      I'm not affiliated and won't be compensated in any way for saying this: Hetzner are the best business partners ever. Their service is rock solid, their pricing is fair, their support is kind and helpful.

      Going forward I expect American companies to follow this European vibe, it's like the opposite of enshitification.

    • YetAnotherNick 18 minutes ago

      Stop making things up. As someone who commented on the thread in favour of AWS, there is almost no mention of better uptime in any comment I could find.

      I could find one or two downvoted or heavily critisized comments, but I can find more people mentioning the opposite.

    • DataDaemon an hour ago

      Finally IT managers will start understanding that cloud is no difference than Hetzner.

      • benterix an hour ago

        Well, we have a naming issue (Hetzner also has Hetzner Cloud, it looks people still equal cloud with the three biggest public cloud providers).

        In any case, in order for this to happen, someone would have to collect reliable data (not all big cloud providers like to publish precise data, usually they downlplay the outages and use weasel words like "some customers... in some regions... might have experienced" just not to admit they had an outage) and present stats comparing the availability of Heztner Cloud vs the big three.

      • aembleton an hour ago

        When things go wrong, you can point at a news article and say its not just us that have been affected.

        • zimpenfish an hour ago

          I tried that but Slack is broken and the message hasn't got through yet...

    • grebc an hour ago

      I got a downvote already for pointing this out :’)

      • brazukadev an hour ago

        Unfortunately, HN is full of company people, you can't talk anything against Google, Meta, Amazon, Microsoft without being downvoted to death.

        • GoblinSlayer 6 minutes ago

          Isn't it just ads?

        • sph 36 minutes ago

          AWS and Cloudflare are HN darlings. Go so far as to even suggest a random personal blog doesn't need Cloudflare and get downvoted with inane comments as "but what about DDOS protection?!"

          The truth is one under the age of 35 is able to configure a webserver any more, apparently. Especially now that static site generators are in vogue and you don't even need to worry about php-fpm.

        • gloosx 36 minutes ago

          Can't fully agree. People genuinely detest Microsoft on HN and all over the globe. My Microsoft-related rants are always upvoted to the skies.

  • stepri an hour ago

    “Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery.”

    It’s always DNS.

    • Nextgrid an hour ago

      I wonder how much of this is "DNS resolution" vs "underlying config/datastore of the DNS server is broken". I'd expect the latter.

      • wdfx an hour ago

        ... wonders if the dns config store is in fact dynamodb ...

      • huflungdung an hour ago

        I don’t think it is DNS. The DNS A records were 2h before they announced it was DNS but _after_ reporting it was a DNS issue.

    • us0r 15 minutes ago

      Or expired domains which I suppose is related?

    • commandersaki an hour ago

      Someone probably failed to lint the zone file.

      • DrewADesign an hour ago

        DNS strikes me as the kind of solution someone designed thinking “eh, this is good enough for now. We can work out some of the clunkiness when more organizations start using the Internet.” But it just ended up being pretty much the best approach indefinitely.

    • bayindirh an hour ago

      Even when it's not DNS, it's DNS.

    • koliber 25 minutes ago

      It's always US-EAST-1 :)

    • shamil0xff an hour ago

      Might just be BGP dressed as DNS

  • amadeoeoeo 2 hours ago

    Oh no... may be LaLiga found out pirates hosting on AWS?

    • agos 26 minutes ago

      this is how I discover that is not just Serie A doing this shenanigans. I'm not really surprised

      • sofixa 15 minutes ago

        All the big leagues take "piracy" very seriously and constantly try to clamp down on it.

        TV rights is one of their main revenue sources, and it's expected to always go up, so they see "piracy" as a fundamental threat. IMO, it's a fundamental misunderstanding on their side, because people "pirating" usually don't have a choice - either there is no option for them to pay for the content (e.g. UK's 3pm blackout), or it's too expensive and/or spread out. People in the UK have to pay 3-4 different subscriptions to access all local games.

        The best solution, by far, is what France's Ligue 1 just did (out of necessity though, nobody was paying them what they wanted for the rights after the previous debacles). Ligue 1+ streaming service, owned and operated by them which you can get access through a variety of different ways (regular old TV paid channel, on Amazon Prime, on DAZN, via Bein Sport), whichever suits you the best. Same acceptable price for all games.

  • fairity 2 hours ago

    As this incident unfolds, what’s the best way to estimate how many additional hours it’s likely to last? My intuition is that the expected remaining duration increases the longer the outage persists, but that would ultimately depend on the historical distribution of similar incidents. Is that kind of data available anywhere?

    • greybeard69 an hour ago

      To my understanding the main problem is DynamoDB being down, and DynamoDB is what a lot of AWS services use for their eventing systems behind the scenes. So there's probably like 500 billion unprocessed events that'll need to get processed even when they get everything back online. It's gonna be a long one.

      • jewba an hour ago

        500 billions events. Always blows my mind how many people use aws

        • Implicated an hour ago

          I know nothing. But I'd imagine the number of 'events' generated during this period of downtime will eclipse that number every minute.

          • nicce 4 minutes ago

            I wonder how many companies have properly designed their clients. So that the timing before re-attempt is randomised and the re-attempt timing cycle is logarithmic.

          • zimpenfish 44 minutes ago

            "I felt a great disturbance in us-east-1, as if millions of outage events suddenly cried out in terror and were suddenly silenced"

            (Be interesting to see how many events currently going to DynamoDB are actually outage information.)

    • froobius an hour ago

      Yes, with no prior knowledge the mathematically correct estimate is:

      time left = time so far

      But as you note prior knowledge will enable a better guess.

      • matsemann an hour ago

        Yeah, the Copernican Principle.

        > I visited the Berlin Wall. People at the time wondered how long the Wall might last. Was it a temporary aberration, or a permanent fixture of modern Europe? Standing at the Wall in 1969, I made the following argument, using the Copernican principle. I said, Well, there’s nothing special about the timing of my visit. I’m just travelling—you know, Europe on five dollars a day—and I’m observing the Wall because it happens to be here. My visit is random in time. So if I divide the Wall’s total history, from the beginning to the end, into four quarters, and I’m located randomly somewhere in there, there’s a fifty-percent chance that I’m in the middle two quarters—that means, not in the first quarter and not in the fourth quarter.

        > Let’s suppose that I’m at the beginning of that middle fifty percent. In that case, one-quarter of the Wall’s ultimate history has passed, and there are three-quarters left in the future. In that case, the future’s three times as long as the past. On the other hand, if I’m at the other end, then three-quarters have happened already, and there’s one-quarter left in the future. In that case, the future is one-third as long as the past.

        https://www.newyorker.com/magazine/1999/07/12/how-to-predict...

        • hshdhdhehd 2 minutes ago

          > So if I divide the Wall’s total history, from the beginning to the end, into four quarters, and I’m located randomly somewhere in there, there’s a fifty-percent chance that I’m in the middle two quarters

          How come?

      • tsimionescu 15 minutes ago

        Note that this is equivalent to saying "there's no way to know". This guess doesn't give any insight, it's just the function that happens to minimize the total expected error for an unknowable duration.

  • emrodre an hour ago

    Their status page (https://health.aws.amazon.com/health/status) says the only disrupted service is DynamoDB, but it's impacting 37 other services. It is amazing to see how big a blast radius a single service can have.

    • jamesbelchamber an hour ago

      It's not surprising that it's impacting other services in the region because DynamoDB is one of those things that lots of other services build on top of. It is a little bit surprising that the blast radius seems to extend beyond us-east-1, mind.

      In the coming hours/days we'll find out if AWS still have significant single points of failure in that region, or if _so many companies_ are just not bothering to build in redundancy to mitigate regional outages.

      I'm looking forward to the RCA!

      • XorNot an hour ago

        I'm real curious how much of AWS GovCloud has continued through this actually. But even if it's fine, from a strategic perspective how much damage did we just discover you could do with a targeted disruption at the right time?

    • thmpp an hour ago

      AWS engineers are trained to use their internal services for each new system. They seem to like using DynamoDB. Dependencies like this should be made transparent.

      • Nextgrid an hour ago

        Not sure why this is downvoted - this is absolutely correct.

        A lot of AWS services under the hood depend on others, and especially us-east-1 is often used for things that require strong consistency like AWS console logins/etc (where you absolutely don't want a changed password or revoked session to remain valid in other regions because of eventual consistency).

      • bsjaux628 an hour ago

        Not "like using", they are mandated from the top to use DynamoDB for any storage. At my org in the retail page, you needed director approval if you wanted to use a relational DB for a production service.

    • nevada_scout an hour ago

      It's now listing 58 impacted services, so the blast radius is growing it seems

    • littlecranky67 an hour ago

      The same page now says 58 services - just 23 minutes after your post. Seems this is becoming a larger issue.

      • kalleboo 42 minutes ago

        When I first visited the page it said like 23 services, now it says 65

  • JCM9 13 minutes ago

    Have a meeting today with our AWS account team about how we’re no longer going to be “All in on AWS” as we diversity workloads away. Was mostly about the pace of innovation on core services slowing and AWS being too far behind on AI services so we’re buying those from elsewhere.

    The AWS team keeps touting the rock solid reliability of AWS as a reason why we shouldn’t diversify our cloud. Should be a fun meeting!

  • SeanAnderson 2 hours ago

    Looks like it affected Vercel, too. https://www.vercel-status.com/

    My website is down :(

    (EDIT: website is back up, hooray)

    • l5870uoo9y an hour ago

      Static content resolves correctly but data fetching is still not functional.

    • maximefourny an hour ago

      Have you done anything for it to be back up? Looks like mines are still down.

      • LostMyLogin an hour ago

        Looks as if they are rerouting to a different region.

      • hugh-avherald an hour ago

        mines are generally down

    • TiredOfLife an hour ago

      Service that runs on aws is down when aws is down. Who knew.

  • rirze an hour ago

    We just had a power outage in Ashburn starting at 10 pm Sunday night. It restored at 3:40am ish, and I know datacenters have redundant power sources but the timing is very suspicious. The AWS outage supposedly started at midnight

    • Hilift an hour ago

      Even with redundancy, the response time between NYC and Amazon East in Ashburn is something like 10 ms. The impedance mismatch and dropped packets and increased latency would doom most organizations craplications.

    • OliverGuy an hour ago

      Their latest update on the status page says it's a Dynamodb DNS issue

      • shawabawa3 an hour ago

        but the cause of that could be anything, including some kind of config getting wiped due to a temporary power outage

  • mittermayr 2 hours ago

    Careful: NPM _says_ they're up (https://status.npmjs.org/) but I am seeing a lot of packages not updating and npm install taking forever or never finishing. So hold off deploying now if you're dependent on that.

    • olex 29 minutes ago

      They've acknowledged an issue now on the status page. For me at least, it's completely down, package installation straight up doesn't work. Thankfully current work project uses a pull-through mirror that allows us to continue working.

      • tonyhart7 7 minutes ago

        "Thankfully current work project uses a pull-through mirror that allows us to continue working."

        so there is no free coffee time???? lmao

    • gjvr an hour ago

      Yep. It's the auditing part that is broken. As a (dangerous) workaround use --no-audit

    • drinchev 2 hours ago

      Also npm audit times out.

  • frays an hour ago

    Robinhood's completely down. Even their main website: https://robinhood.com/

    • mittermayr an hour ago

      Amazing, I wonder what their interview process is like, probably whiteboarding a next-gen LLM in WASM, meanwhile, their entire website goes down with us-east-1... I mean.

  • stavros 2 hours ago

    AWS truly does stand for "All Web Sites".

  • tedk-42 2 hours ago

    Internet, out.

    Very big day for an engineering team indeed. Can't vibe code your way out of this issue...

    • drevil-v2 2 hours ago

      Easiest day for engineers on-call everywhere except AWS staff. There’s nothing you can do except wait for AWS to come back online.

      Pour one out for the customer service teams of affected businesses instead

      • darkwater an hour ago

        Well, but tomorrow there will be CTOs asking for a contingency plan if AWS goes down, even if planning, preparing, executing and keeping it up to date as the infra evolves will cost more than the X hours of AWS outage.

        There are certainly organizations for which that cost is lower than the overall damage of services being down due to AWS fault, but tomorrow we will hear CTOs from smaller orgs as well.

        • noir_lord an hour ago

          They’ll ask, in a week they’ll have other priorities and in a month they’ll have forgotten about it.

          This will hold until the next time AWS had a major outage, rinse and repeat.

          • darkwater 30 minutes ago

            It's so true it hurts. If you are new in any infra/platform management position you will be scared as hell this week. Then you will just learn that feeling will just disappear by itself in a few days.

        • brazukadev an hour ago

          Lots of NextJS CTOs are gonna need to think about it for the first time too

      • mvdtnz 29 minutes ago

        No really true for large systems. We are doing things like deploying mitigations to avoid scale-in (eg services not receiving traffic incorrectly autoscaling down), preparing services for the inevitable storm, managing various circuit breakers, changing service configurations to ease the flow of traffic through the system, etc. We currently have 64 engineers in our on-call room managing this. There's plenty of work to do.

      • aswegs8 an hour ago

        Can confirm, pretty chill we can blame our current issues on AWS.

      • codeduck an hour ago

        and by one I trust you mean a bottle.

    • DebtDeflation 42 minutes ago

      >Can't vibe code your way out of this issue...

      I feel bad for the people impacted by the outage. But at the same time there's a part of me that says we need a cataclysmic event to shake the C-Suite out of their current mindset of laying off all of their workers to replace them with AI, the cheapest people they can find in India, or in some cases with nothing at all, in order to maximize current quarter EPS.

    • SilverSlash 10 minutes ago

      I expect it's their SREs who are dealing with this mess.

    • LostMyLogin 2 hours ago

      Pour one out for everyone on-call right now.

      • xvector 2 hours ago

        After some thankless years preventing outages for a big tech company, I will never take an oncall position again in my life.

        Most miserable working years I have had. It's wild how normalized working on weekends and evenings becomes in teams with oncall.

        But it's not normal. Our users not being able to shitpost is simply not worth my weekend or evening.

        And outside of Google you don't even get paid for oncall at most big tech companies! Company losing millions of dollars an hour, but somehow not willing to pay me a dime to jump in at 3AM? Looks like it's not my problem!

        • scns an hour ago

          > And outside of Google you don't even get paid for oncall at most big tech companies.

          What the redacted?

        • einarfd an hour ago

          When I used to be on call for Cisco WebEx services. I got paid extra, and got extra time of. Even if nothing happened. In addition we where enough people on the rotation, so I didn't have to do it that often.

          I believe the rules varied based on jurisdiction, and I think some had worse deals, and some even better. But I was happy with our setup in Norway.

          Tbh I do not think we would have had, what we had if it wasn't for the local laws and regulations. Sometimes worker friendly laws can be nice.

        • sksksk 36 minutes ago

          It's also unneccesary at large companies, since there'll likely be enough offices globally to have a follow the sun model.

          • decimalenough 12 minutes ago

            Follow the sun does not happen by itself. Very few if any engineering teams are equally split across thirds of the globe in such a way that (say) Asia can cover if both EMEA and the Americas are offline.

            Having two sites cover the pager is common, but even then you only have 16 working hours at best and somebody has to take the pager early/late.

        • tilolebo an hour ago

          "Your shitposting is very important to us, please stay on the site"

        • sneak an hour ago

          > But this is not normal. Our users not being able to shitpost is simply not worth my weekend or evening.

          It is completely normal for staff to have to work 24/7 for critical services.

          Plumbing, HVAC, power plant engineers, doctors, nurses, hospital support staff, taxi drivers, system and network engineers - these people keep our modern world alive, all day, every day. Weekends, midnights, holidays, every hour of every day someone is AT WORK to make sure our society functions.

          Not only is it normal, it is essential and required.

          It’s ok that you don’t like having to work nights or weekends or holidays. But some people absolutely have to. Be thankful there are EMTs and surgeons and power and network engineers working instead of being with their families on holidays or in the wee hours of the night.

          • milutinovici an hour ago

            You know, there's this thing called shifts. You should look it up.

          • guitarbill an hour ago

            Nice try at guilt-tripping people doing on-call, and doing it for free.

            But to parent's points: if you call a plumber or HVAC tech at 3am, you'll pay for the privilege.

            And doctors and nurses have shifts/rotas. At some tech places, you are expected to do your day job plus on-call. For no overtime pay. "Salaried" in the US or something like that.

            • xvector an hour ago

              And these companies often say "it's baked into your comp!" But you can typically get the same exact comp working an adjacent role with no oncall.

              • sneak an hour ago

                Then do that instead. What’s the problem with simply saying “no”?

                • xvector 42 minutes ago

                  Yup, that is precisely what I did and what I'm encouraging others to do as well.

                  Edit: On-call is not always disclosed. When it is, it's often understated. And finally, you can never predict being re-orged into a team with oncall.

                  I agree employees should still have the balls to say "no" but to imply there's no wrongdoing here on companies' parts and that it's totally okay for them to take advantage of employees like this is a bit strange.

                  Especially for employees that don't know to ask this question (new grads) or can't say "no" as easily (new grads or H1Bs.)

            • sneak an hour ago

              Guilt tripping? Quite the opposite.

              If you or anyone else are doing on-call for no additional pay, precisely nobody is forcing you to do that. Renegotiate, or switch jobs. It was either disclosed up front or you missed your chance to say “sorry, no” when asked to do additional work without additional pay. This is not a problem with on call but a problem with spineless people-pleasers.

              Every business will ask you for a better deal for them. If you say “sure” to everything you’re naturally going to lose out. It’s a mistake to do so, obviously.

              An employee’s lack of boundaries is not an employer’s fault.

              • guitarbill 41 minutes ago

                First, you try to normalise it:

                > It is completely normal for staff to have to work 24/7 for critical services.

                > Not only is it normal, it is essential and required.

                Now you come with the weak "you don't have to take the job" and this gem:

                > An employee’s lack of boundaries is not an employer’s fault.

                As if there isn't a power imbalance, or employers always disclose everything or chance their mind. But of course, let's blame those entitled employees!

          • xvector an hour ago

            No one dies if our users can't shitpost until tomorrow morning.

            I'm glad there are people willing to do oncall. Especially for critical services.

            But the software engineering profession as a whole would benefit from negotiating concessions for oncall. We have normalized work interfering with life so the company can squeeze a couple extra millions from ads. And for what?

            Nontrivial amount of ad revenue lost? Not my problem if the company can't pay me to mitigate.

            • disgruntledphd2 an hour ago

              > Nontrivial amount of ad revenue lost? Not my problem if the company can't pay me to mitigate.

              Interestingly, when I worked on analytics around bugs we found that often (in the ads space), there actually wasn't an impact when advertisers were unable to create ads, as they just created all of them when the interface started working again.

              Now, if it had been the ad serving or pacing mechanisms then it would've been a lot of money, but not all outages are created equal.

            • sneak an hour ago

              Not all websites are for shitposting. I can’t talk to my clients for whom I am on call because Signal is down. I also can’t communicate with my immediate family. There are tons of systems positively critical to society downstream from these services.

              Some can tolerate downtime. Many can’t.

              • dist-epoch 15 minutes ago

                You could give them a Phone call, you know. Pretty reliable technology.

    • rvz 2 hours ago

      > Can't vibe code your way out of this issue...

      Exactly. This time, some LLM providers are also down and can't help vibe coders on this issue.

      • fragmede 2 hours ago

        Qwen3 on lm-studio running fine on my work Mac M3, what's wrong with yours?

  • cmiles8 6 minutes ago

    US-East-1 and its consistent problems are literally the Achilles Heel of the Internet.

  • Danborg 4 minutes ago

    r/aws not found

    There aren't any communities on Reddit with that name. Double-check the community name or start a new community.

  • ksajadi 2 hours ago

    A lot of status pages hosted by Atlasian StatusPage are down! The irony…

  • kryptn 2 hours ago
    • Titan2189 2 hours ago

      Yup

      > We have identified the underlying issue with one of our cloud service providers.

  • philipp-gayret 2 hours ago

    Our Alexa's stopped responding and my girl couldn't log in to myfitness pal anymore.. Let me check HN for a major outage and here we are :^)

    At least when us-east is down, everything is down.

  • greatgib 17 minutes ago

    When I follow the link, I arrive on a "You broke reddit" page :-o

  • edtech_dev 2 hours ago

    Signal is also down for me.

    • ArcHound an hour ago

      My messages are not getting through, but status page seems ok.

  • __alexs 2 hours ago

    Is there any data on which AWS regions are most reliable? I feel like every time I hear about an AWS outage it's in us-east-1.

    • theherk 2 hours ago

      Trouble is one can't fully escape us-east-1. Many services are centralized there like: S3, Organizations, Route53, Cloudfront, etc. It is THE main region, hence suffering the most outages, and more importantly, the most troubling outages.

    • Macha an hour ago

      We're mostly deployed on eu-west-1 but still seeing weird STS and IAM failures, likely due to internal AWS dependencies.

      Also we use Docker Hub, NPM and a bunch of other services that are hosted by their vendors on us-east-1 so even non AWS customers often can't avoid the blast radius of us-east-1 (though the NPM issue mostly affects devs updating/adding dependencies, our CI builds use our internal mirror)

      • donavanm 6 minutes ago

        FYI: 1. AWS IAM mutations all go through us-east-1 before being replicated to other public/commercial regions. Read/List operations should use local regional stacks. I expect you'll see a concept of "home region" give you flexibility on the write path in the future. 2. STS has both global and regional endpoints. Make sure you're setup to use regional endpoints in your clients https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credenti...

    • Thev00d00 2 hours ago

      Anywhere other than us-east-1 in my experience is rock solid.

    • bradhe 2 hours ago

      us-east-1 was, probably still is, AWS' most massive deployment. Huge percentage of traffic goes through that region. Also, lots of services backhaul to that region, especially S3 and CloudFront. So even if your compute is in a different region (at Tower.dev we use eu-central-1 mostly), outages in us-east-1 can have some halo effect.

      This outage seems really to be DynamoDB related, so the blast radius in services affected is going to be big. Seems they're still triaging.

      • gnaman 2 hours ago

        your website loads for a second and then suddenly goes blank. There is one fatal errors from Framer in the console

        • fragmede 2 hours ago

          It is dark. You are likely to be eaten by a grue.

    • Dave3of5 an hour ago

      If you're using AWS then you are most likely using us-east-1 there is no escape. When big problems happen on us-east-1 it affect most of AWS services.

    • nodesocket 2 hours ago

      I don't recommend to my clients they use us-east-1. It's the oldest and most prone to outages. I usually always recommend us-east-2 (Ohio) unless they require West Coast.

  • o1o1o1 an hour ago

    I'm so happy we chose Hetzner instead but unfortunately we also use Supabase (dashboard affected) and Resend (dashboard and email sending affected).

    Probably makes sense to add "relies on AWS" to the criteria we're using to evaluate 3rd-party services.

  • croemer an hour ago
  • assimpleaspossi an hour ago

    Isn't there a better source of information than Reddit?

    • Havoc 2 minutes ago

      Probably not. The sysadmin sub is usually the first place stuff like this shows up because there’s a bunch of oncall guys there

    • elaus an hour ago

      Maybe the mods can change it to https://health.aws.amazon.com/health/status

      • codeduck an hour ago

        amazon's health page is widely enjoyed as a work of fiction. community reports on places like reddit are, actually, more reliable.

    • orthoxerox an hour ago

      Especially since it's down as well.

  • TrackerFF 13 minutes ago

    Lots of outage in Norway, started approximately 1 hour ago for me.

  • spwa4 3 minutes ago

    Reddit seems to be having issues too:

    "upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout"

  • mumber_typhoon 2 hours ago

    >Oct 20 12:51 AM PDT We can confirm increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. This issue may also be affecting Case Creation through the AWS Support Center or the Support API. We are actively engaged and working to both mitigate the issue and understand root cause. We will provide an update in 45 minutes, or sooner if we have additional information to share.

    Weird that case creation uses the same region as the case you'd like to create for.

  • ctbellmar 2 hours ago

    Various AI services (e.g. Perplexity) are down as well

    • rvz 2 hours ago

      Just tried Perplexity and it has no answer.

      Damn, this is really bad.

      Looking forward to the postmortem.

  • croemer an hour ago
  • thomas_witt an hour ago

    DynamoDB is performing fine in production in eu-central-1.

    Seems to be really limited to us-east-1 (https://health.aws.amazon.com/health/status). I think they host a lot of console and backend stuff there.

    • okr an hour ago

      Yet. Everything goes down the ... Bach ;)

  • karel-3d an hour ago

    Slack is down. Is that related? Probably is.

  • raspasov an hour ago

    02:34 Pacific: Things seem to be recovering.

  • mrcsharp 2 hours ago

    Bitbucket seems affected too [1]. Not sure if this status page is regional though.

    [1] https://bitbucket.status.atlassian.com/incidents/p20f40pt1rg...

  • goinggetthem an hour ago

    This is from Amazon's latest earnings call when Andy Jessy was asked why they aren't growing as much as there competitors

    "I think if you look at what matters to customers, what they care they care a lot about what the operational performance is, you know, what the availability is, what the durability is, what the latency and throughput is of of the various services. And I think we have a pretty significant advantage in that area." also "And, yeah, you could just you just look at what's happened the last couple months. You can just see kind of adventures at some of these players almost every month. And so very big difference, I think, in security."

  • atymic 3 hours ago
  • danielpetrica 29 minutes ago

    In this moments I think devs should invest in vendor independence if they can. While I'm not to that stage yet (cloudlfare dependence) using open technologies like docker (or Kubernetes), Traefik instead of managed services can help in this disaster situations by switching to a different provider in a faster way than having to rebuild from zero. as a disclosure I'm not still to that point on my infrastructure But I'm trying to slowly define one for my self

  • jug 2 hours ago

    Of course this happens when I take a day off from work lol

    Came here after the Internet felt oddly "ill" and even got issues using Medium, and sure enough https://status.medium.com

  • andreygubarev an hour ago

    https://status.tailscale.com/ clients' auth down :( what a day

    • kondro an hour ago

      That just says the homepage and knowledge base are down and that admin access specifically isn’t effected.

      • andreygubarev an hour ago

        yep, admin panel works, but in practice my devices are logged out and there is no way to re-authorize them.

        • kondro 28 minutes ago

          I can authenticate my devices just fine.

  • saejox 2 hours ago

    AWS has been the backbone of the internet. It is single point of failure most websites.

    Other hosting services like Vercel, package managers like npm, even the docker registeries are down because of it.

  • shawn_w 29 minutes ago

    One of the radio stations I listen to is just dead air tonight. I assume this is the cause.

  • socalgal2 an hour ago

    Amazon itself apperas to be out for some products. I get a "Sorry, We couldn't find that page" when clicking on products

  • assimpleaspossi an hour ago

    I'm thinking about that one guy who clicked on "OK" or hit return.

    • rvitorper 8 minutes ago

      Somebody, somewhere tried to rollback something and it failed

  • rob 44 minutes ago

    My Alexa is hit or miss at responding to queries right now at 5:30 AM EST. Was wondering why it wasn't answering when I woke up.

  • rafa___ an hour ago

    "Oct 20 2:01 AM PDT We have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1..."

    It's always DNS...

  • 00deadbeef an hour ago

    It's not DNS

    There's no way it's DNS

    It was DNS

  • littlecranky67 an hour ago

    Maybe unrelated, but yesterday I went to pick up my package from an Amazon Locker in Germany, and the display said "Service unavailable". I'll wait until later today before I go and try again.

  • tdiff an hour ago

    That strange feeling of the world getting cleaner for a while without all these dependant services.

  • transitivebs 2 hours ago

    can't log into https://amazon.com either after logging out; so many downstream issues

  • JCM9 29 minutes ago

    US-East-1 is literally the Achilles Heel of the Internet.

    • sofixa 11 minutes ago

      You would think that after the previous big us-east-1 outages (to be fair there have been like 3 of them in the past decade, but still, that's plenty), companies would have started to move to other AWS regions and/or to spread workloads between them.

    • rvitorper 18 minutes ago

      Exactly

  • werdl an hour ago

    Looks like a DNS issue - dynamodb.us-east-1.amazonaws.com is failing to resolve.

    • lgats an hour ago

      "Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1."

      it seems they found your comment

  • croemer an hour ago

    Coinbase down as well: https://status.coinbase.com/

    • littlecranky67 an hour ago

      Best option for a whale to manipulate the price again.

  • disposable2020 an hour ago

    I seem to recall other issues around this time in previous years. I wonder if this is some change getting shoe-horned in ahead of some reinvent release deadline...

  • ngruhn 2 hours ago

    Can't login to Jira/Confluence either.

    • Xenoamorphous 2 hours ago

      Seems to work fine for me. I'm in Europe so maybe connecting to some deployment over here.

      • danias 2 hours ago

        You are already logged in. If you try to access your account settings, for example, you will be disappointed...

  • lsllc 2 hours ago

    The Ring (Doorbell) App isn't working, nor is any the MBTA (Transit) Status pages/apps.

    • LostMyLogin 2 hours ago

      My apartment uses “SmartRent” for access controls and temps in our unit. It’s down…

  • qrush 2 hours ago

    AWS's own management console sign-in isn't even working. This is a huge one. :(

  • ryanmcdonough 10 minutes ago

    Now, I may well be naive - but isn't the point of these systems that you fail over gracefully to another data centre and no-one notices?

    • codeulike 6 minutes ago

      I get the impression that this has been thought about to some extent, but its a constantly changing architecture with new layers and new ideas being added, so for every bit of progress there's the chance of new Single Points Of Failure being added. This time it seems to be a DNS problem with DynamoDB

  • tomaytotomato an hour ago

    Slack, Jira and Zoom are all sluggish for me in the UK

    • kalleboo an hour ago

      I wonder if that's not due to dependencies on AWS but all-hands-on-deck causing far more traffic than usual

  • mslm 2 hours ago

    Happened to be updating a bunch of NPM dependencies and then saw `npm i` freeze and I'm like... ugh what did I do. Then npm login wasn't working and started searching here for an outage, and wala.

  • assimpleaspossi an hour ago

    As of 4:26am Central Time in the USA, it's back up for one of my services.

  • munchlax an hour ago

    Nowadays when this happens it's always something. "Something went wrong."

    Even the error message itself is wrong whenever that one appears.

    • urbandw311er an hour ago

      Displaying and propagating accurate error messages is an entire science unto itself... ...I can see why it's sometimes sensible to invest resource elsewhere and fall back to 'something'.

      • foobar1962 7 minutes ago

        I use the term “unexpected error” because if the code got to this alert it wasn’t caught by any traps I’d made for the “expected” errors.

      • munchlax 17 minutes ago

        IMHO if error handling is rocket science, the error is you

        • urbandw311er 9 minutes ago

          Perhaps you're not handling enough errors ;-)

    • zigzag312 28 minutes ago

      Reddit shows:

      "Too many requests. Your request has been rate limited, please take a break for a couple minutes and try again."

  • devttyeu an hour ago

    Can't update my selfhosted HomeAssistant because HAOS depends on dockerhub which seems to be still down.

  • countWSS an hour ago

    Reddit itself breaking down and errors appear. Does reddit itself depends on this?

  • antihero an hour ago

    My website on the cupboard laptop is fine.

  • fujigawa 2 hours ago

    Appears to have also disabled that bot on HN that would be frantically posting [dupe] in all the other AWS outage threads right about now.

  • world2vec an hour ago

    Slack and Zoom working intermittently for me

  • colesantiago 2 hours ago

    It seems that all the sites that ask for distributed systems in their interview and has their website down wouldn't even pass their own interview.

    This is why distributed systems is an extremely important discipline.

    • mangamadaiyan 2 hours ago

      Maybe actually making the interviews less of a hazing ritual would help.

      Hell, maybe making today's tech workplace more about getting work done instead of the series of ritualistic performances that the average tech workday has degenerated to might help too.

      Ergo, your conclusion doesn't follow from your initial statements, because interviews and workplaces are both far more broken than most people, even people in the tech industry, would think.

      • colesantiago 2 hours ago

        Well it looks like if companies and startups did their job in hiring the proper distributed systems skills more rather than hazing for the wrong skills we wouldn't be in this outage mess.

        Many companies on Vercel don't think to have a strategy to be resilient to these outages.

        I rarely see Google, Ably and others serious about distributed systems being down.

        • rester324 an hour ago

          There was a huuuge GCP outage just a few months back: https://news.ycombinator.com/item?id=44260810

        • the_mitsuhiko an hour ago

          > Many companies on Vercel don't think to have a strategy to be resilient to these outages.

          But that's the job of Vercel and it looks like they did a pretty good job. They rerouted away from the broken region.

    • dist-epoch an hour ago

      distributed systems != continuous uptime

  • klon 2 hours ago

    Statuspage.io seems to load (but is slow) but what is the point if you can't post an incident because Atlassian ID service is down.

  • mcintyre1994 2 hours ago

    Presumably the root cause of the major Vercel outage too: https://www.vercel-status.com/

    • hyruo 2 hours ago

      No wonder, when I opened Vercel it showed a 502 error.

  • ssehpriest an hour ago

    Airtable is down as-well.

    A lot of businesses have all their workflows depending on their data on airtable.

  • hipratham 2 hours ago

    Strangely some of our services are scaling up on east-1, and there is downtick on downdetector.com so issue might be resolving.

  • trusche 2 hours ago

    Both Intercom and Twilio are affected, too.

    - https://status.twilio.com/ - https://www.intercomstatus.com/us-hosting

    I want the web ca. 2001 back, please.

  • cpfleming 2 hours ago

    Seems to be upsetting Slack a fair bit, messages taking an age to send and OIDC login doesn't want to play.

  • codegladiator an hour ago

    They haven't listed SES there yet in the affected services on their status page

  • donmb 2 hours ago

    Asana down Postman workspaces don't load Slack affected And the worst: heroku scheduler just refused to trigger our jobs

  • mk89 an hour ago

    It's fun to see SRE jumping left and right when they can do basically nothing at all.

    "Do we enable DR? Yes/No". That's all you can do. If you do, it's a whole machinery starting, which might take longer than the outage itself.

    They can't even use Slack to communicate - messages are being dropped/not sent.

    And then we laugh at the South Koreans for not having backed up their hard drives (which got burnt by actual fire, a statistically way less occurring event than an AWS outage). OK that's a huge screw up, but hey, this is not insignificant either.

    What will happen now? Nothing, like nothing happened after Crowdstrike's bug last year.

    • XorNot an hour ago

      Signal seems to be dead too though, which is much more of a WTF?

      • GoblinSlayer 26 minutes ago

        A decentralized messenger is Tox.

  • magnio 2 hours ago

    npm and pnpm are badly affected as well. Many packages are returning 502 when fetched. Such a bad time...

    • samsepia 2 hours ago

      Yup, was releasing something to prod and can't even build a react app. I wonder if there is some sort of archive that isn't affected?

      • JCharante 2 hours ago

        AWS CodeArtifact can act as a proxy and fetch new packages from npm when needed. A bit late for that though but sharing if you want to future proof against the yearly us-east-1 outage

    • JCharante 2 hours ago

      Oh damn that ruins all our builds for regions I thought would be unaffected

  • thecopy an hour ago

    I did get 500 error from their public ECR too

  • alvis an hour ago

    Why would us-east-1 cause many UK banks and even UK gov web sites down too!? Shouldn't they operate in the UK region due to GDPR?

    • GoblinSlayer 35 minutes ago

      Integration with USA for your safety :)

    • Nextgrid an hour ago

      2 things:

      1) GDPR is never enforced other than token fines based on technicalities. The vast majority of the cookie banners you see around are not compliant, so it the regulation was actually enforced they'd be the first to go... and it would be much easier to go after those (they are visible) rather than audit every company's internal codebases to check if they're sending data to a US-based provider.

      2) you could technically build a service that relies on a US-based provider while not sending them any personal data or data that can be correlated with personal data.

  • goodegg an hour ago

    Terraform Cloud is having problem too

  • BoredPositron 33 minutes ago

    There will be a lot of system starting cold l. I am really curious to see how many will manage it without hiccups.

  • drcongo 33 minutes ago

    Snow day!

  • sph 2 hours ago

    10:30 on a Monday morning and already slacking off. Life is good. Time to touch grass, everybody!

  • bstsb an hour ago

    glad all my services are either Hetzner servers or EU region of AWS!

  • codebolt 2 hours ago

    Atlassian cloud is having problems as well.

  • seanieb 2 hours ago

    Clearly this is all some sort of mass delusion event, the Amazon Ring status says everything is working.

    https://status.ring.com/

    (Useless service status pages are incredibly annoying)

    • storgaard 2 hours ago

      Atlassian is down as well so they probably can't access their Atlassian Statuspage admin panel to update it.

      • netdevphoenix 15 minutes ago

        When you know a service is down but the service says it's up: it's either your fault or the service is having a severe issue

  • circadian 2 hours ago

    BGP (again)?

  • kitd an hour ago

    O ffs. I can't even access the NYT puzzles in the meantime ... Seriously disrupted, man

  • grenran an hour ago

    seems like services are slowly recovering

  • AtNightWeCode 2 hours ago

    Considering the history of east-1 it is fascinating that it still causes so many single point of failure incidents for large enterprises.

  • goodegg an hour ago

    Happy Monday People

  • the_duke an hour ago

    Slack now also down: https://slack-status.com/

  • dude250711 2 hours ago

    They are amazing at LeetCode though.

    • tokioyoyo 2 hours ago

      Let's be nice. I'm sure devs and ops are on fire right now, trying to fix the problems. Given the audience of HN, most of us could have been (have already been?) in that position.

      • rirze an hour ago

        No we wouldn’t because there’s like a 50/50 chance of being a H1B/L1 at AWS. They should rethink their hiring and retention strategies.

      • dude250711 an hour ago

        They choose their hiring-retention practices and they choose to provide global infrastructure, when is the good time to criticise them?

        Granted, they are not as drunk on LLM as Google and Microsoft. So, at least we can say this outage had not been vibe-coded (yet).

      • fragmede an hour ago

        hugops ftw

  • nodesocket 2 hours ago

    Affecting Coinbase[1] as well, which is ridiculous. Can't access the web UI at all. At their scale and importance they should be multi-region if not multi-cloud.

    [1] https://status.coinbase.com

    • bradhe 2 hours ago

      Seems the underlying issue is with DynamoDB, according to the status page, which will have a big blast radius in other services. AWS' services form a really complicated graph and there's likely some dependency, potentially hidden, on us-east-1 in there.

    • Splizard 2 hours ago

      The issue appears to be cascading internationally due to internal dependencies on us-east-1

  • ArcHound 2 hours ago

    Good luck to all on-callers today.

    It might be an interesting exercise to map how many of our services depend on us-east-1 in one way or another. One can only hope that somebody would do something with the intel, even though it's not a feature that brings money in (at least from business perspective).

  • gadders an hour ago

    Substack seems to by lying about their status: https://substack.statuspage.io/

  • t0lo an hour ago

    It's weird that we're living in a time where this could be a taste of a prolonged future global internet blackout by adversarial nations. Get used to this feeling I guess :)

  • t0lo an hour ago

    Can't log into tidal for my music

    • antihero an hour ago

      Navidrome seems fine

  • JCharante 2 hours ago

    Ring is affected. Why doesn’t Ring have failover to another region?

  • roschdal 2 hours ago

    It's a reminder to never rely on something as flaky as the internet for your important things.

    • tietjens 2 hours ago

      This is such an HN response. Oh, no problem, I'll just avoid the internet for all of my important things!

      • jjcob 2 hours ago

        Door locks, heating and household appliances should probably not depend on Internet services being available.

      • tommit an hour ago

        Do you not have a self-hosted instance of every single service you use? :/

      • dukeyukey an hour ago

        They are probably being sarcastic.

    • rirze 2 hours ago

      Not very helpful. I wanted to make a very profitable trade but can’t login to my brokerage. I’m losing about ~100k right now.

      • fragmede an hour ago

        what's the trade?

        • muppetman an hour ago

          Probably AWS stock...

          • rkomorn an hour ago

            This reminds me of the twitter-based detector we had at Facebook that looked for spikes in "Facebook down" messages.

            When Facebook went public, the detector became useless because it fired anytime someone wrote about the Facebook stock being down and people retweeted or shared the article.

            I invested just enough time in it to decide it was better to turn it off.

        • rirze an hour ago

          Beyond Meat

  • zwnow an hour ago

    I love this to be honest. Validates my anti cloud stance.

    • flanked-evergl an hour ago

      No service that does not run on cloud has ever had outages.

      • Nextgrid an hour ago

        But at least a service that doesn't run on cloud doesn't pay the 1000% premium for its supposed "uptime".

      • zwnow 42 minutes ago

        At least its in my control :)

        • speedgoose 31 minutes ago

          Not having control or not being responsible are perhaps major selling points of cloud solutions. To each their own, I also rather have control than having to deal with a cloud provider support as a tiny insignificant customer. But in this case, we can take a break and come back once it's fixed without stressing.

          • zwnow 3 minutes ago

            Businesses not taking responsibility for their own business should not exist in the first place...