Amazon holds engineering meeting following AI-related outages

(ft.com)

44 points | by petethomas 3 hours ago ago

23 comments

palmotea an hour ago

> Amazon’s ecommerce business has summoned a large group of engineers to a meeting on Tuesday for a “deep dive” into a spate of outages, including incidents tied to the use of AI coding tools.

> The online retail giant said there had been a “trend of incidents” in recent months, characterised by a “high blast radius” and “Gen-AI assisted changes” among other factors, according to a briefing note for the meeting seen by the FT.

> Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”.

> “Folks, as you likely know, the availability of the site and related infrastructure has not been good recently,” Dave Treadwell, a senior vice-president at the group, told employees in an email, also seen by the FT.

[-]

VirusNewbie an hour ago

GenAI at fault, and nothing to do with amazon laying off 30k people and having an overall shitty culture where people mostly don’t want to stay?

[-]

applfanboysbgon an hour ago

> GenAI at fault, and nothing to do with amazon laying off 30k people

GenAI is literally the direct reasoning they used for laying off 30k people.

> “As we roll out more Generative AI and agents, it should change the way our work is done. We will need fewer people doing some of the jobs that are being done today, and more people doing other types of jobs,” [Amazon CEO Andy Jassy] bluntly admitted.

nixass 18 minutes ago

Absolutely correct. Now let's drop anothet few billions to make AI better and avoid such mistakes in the future. And we might lay off some more folks to make room in a budget for more AI

jiggawatts an hour ago

Also, managers are incentivised to force AI onto the remaining staff to “boost productivity” but of course they won’t accept any of the responsibility or blame for that decision.

[-]

zihotki 44 minutes ago

Just tell the employees to make AI fully adopted in SDLC and make it secure and reliable. Don't make mistakes.

If it works for models, why not humans? /s

aerhardt an hour ago

Maybe both, and possibly other causes too, but allow us a moment to revel in the schadenfreude of AI code slop at hyperscale, will you?

jqpabc123 an hour ago

Summary: AWS has voluteered to serve as a crash test dummy for vibe coding.

But don't tell anyone --- and if you do, don't blame AI because it's all the humans fault for not shaping their questions in the "right way".

[-]

arjie an hour ago

For this particular experiment, regardless of phrasing, I think the guys with the most appetite for risk have to be Cloudflare. They're shipping at an astonishing pace but I think there have been far more outages than there were before in jgc era. Perhaps Anthropic's application side teams are faster and more cowboy[0] but they are super AI-native so that makes sense.

0: I think this is the eras cowboys win so they're (unsurprisingly) smart about doing this

bootsmann 31 minutes ago

This wouldn't happen if they used my CLAUDE.md of course!

rhubarbtree 32 minutes ago

Some engineers will point to this and say, hey, AI is not gonna work. It doesn’t reason very well and it leads to these problems.

But what they’re missing is all code quality is going to tank, and we are just going to accept that. Just as artisanal goods were replaced in the Industrial Revolution with mass produced inferior ones.

People will accept bad code if it is cheap enough.

We’ve gotten used to aiming for great, even if we often only hit functional. The new bar is going to be so much lower. Welcome to the era of cheap bad code. Lots more software, lots more value overall, but much worse reliability. Every day the apps I use get buggier.

[-]

gtsop 27 minutes ago

You are almost right. As I say since the beginning of this ai circus, this is the equivalent of flipping mcdonalds burgers (no insult intended for those workers). It is a thing, and people buy and eat them. But high quality burgers made by talented chefs will always be out there. That's my analogy, and i dont intend to be on the side of flipping mcdonalds burgers

mediumsmart an hour ago

Is it only 45 dollars for the subscription? Does that cover the AI-related outages too or just the engineering meeting

jcgrillo 23 minutes ago

> Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.

Lol. Lmao. You have got to be joking. Seniors leaving in droves is how that plays out.

o10449366 2 hours ago

Paywalled

[-]

techterrier an hour ago

paste headline into google, click first link

[-]

kqr an hour ago

Huh, it has to be Google, specifically, too! There used to be a shortcut for this action on HN (a link under the submission saying "web" or something?), but it seems that has been removed.

andyjohnson0 34 minutes ago

https://archive.ph/wXvF3

kerim-ca an hour ago

Full Article

Amazon’s ecommerce business has summoned a large group of engineers to a meeting on Tuesday for a “deep dive” into a spate of outages, including incidents tied to the use of AI coding tools.

The online retail giant said there had been a “trend of incidents” in recent months, characterised by a “high blast radius” and “Gen-AI assisted changes” among other factors, according to a briefing note for the meeting seen by the FT.

Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”.

“Folks, as you likely know, the availability of the site and related infrastructure has not been good recently,” Dave Treadwell, a senior vice-president at the group, told employees in an email, also seen by the FT.

The note ahead of Tuesday’s meeting did not specify which particular incidents the group planned to discuss.

Amazon’s website and shopping app went down for nearly six hours this month in an incident the company said involved an erroneous “software code deployment”. The outage left customers unable to complete transactions or access functions such as checking account details and product prices.

Treadwell, a former Microsoft engineering executive, told employees that Amazon would focus its weekly “This Week in Stores Tech” (TWiST) meeting on a “deep dive into some of the issues that got us here as well as some short immediate term initiatives” the group hopes will limit future outages.

He asked staff to attend the meeting, which is normally optional.

Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.

Amazon said the review of website availability was “part of normal business” and it aims for continual improvement.

“TWiST is our regular weekly operations meeting with a specific group of retail technology leaders and teams where we review operational performance across our store,” the company said.

Separately, the company’s cloud computing arm — Amazon Web Services — has suffered at least two incidents linked to the use of AI coding assistants, which the company has been actively rolling out to its staff.

AWS suffered a 13-hour interruption to a cost calculator used by customers in mid-December after engineers allowed the group’s Kiro AI coding tool to make certain changes, and the AI tool opted to “delete and recreate the environment”, the FT previously reported.

Amazon previously said the incident in December was an “extremely limited event” affecting only a single service in parts of mainland China. Amazon added that the second incident did not have an impact on a “customer facing AWS service”.

The FT previously reported multiple Amazon engineers said their business units had to deal with a higher number of “Sev2s” — incidents requiring a rapid response to avoid product outages — each day as a result of job cuts.

Amazon has undertaken multiple rounds of lay-offs in recent years, most recently eliminating 16,000 corporate roles in January. The group has disputed the claim that headcount cuts were responsible for an increase in recent outages.

[-]

scuff3d an hour ago

Gonna see a lot more of this in the coming years. The real cost of LLM tools has a delay. Devs don't tend to notice it until they're neck deep in code then don't understand, swearing the next prompt will get them out. CEOs won't notice until it starts costing them money, and that of course assumes anyone will be willing to admit it. Lot of people have their careers on the line spending a metric shit ton of money on untested tools.

potetoooooo an hour ago

nice domain

wiseowise an hour ago

Hold a meeting?! No way! That’s a news worthy material!

Seriously, who even cares? It’s probably going to be “guys be careful but also continue to push slop kthx”.