I’d really, really like to know what microcontroller family this was found on. Assuming that this is a safety processor (lockstep, ECC, etc) it suggests that ECC was insufficient for the level of bit flips they’re seeing — and if the concern is data corruption, not unintended restart, it means it’s enough flips in one word to be undetectable. The environment they’re operating in isn’t that different from everyone else, so unless they ate some margin elsewhere (bad voltage corner or something), this can definitely be relevant to others. Also would be interesting to know if it’s NVM or SRAM that’s effected.
See my other comments in the other threads. This does not have EDAC. I was as surprised as you but it doesn't seems to be an MCU but a composition of several distinct chips. That flight computer was designed in the 90's and updated in 2002 with a new hw variant that does have edac. So yes, for this kind of thing, I can buy that a bit flip happened.
The recalled aircraft include the latest A320neo model, some of which are basically brand new. Why would they be using flight computers from before 2002? Why is an old report from 2008, relating to a completely different aircraft type (A330), relevant to the A320 issue today?
> Why would they be using flight computers from before 2002?
Why would you assume they're not? I don't know about aircraft specifically, but there's plenty of hardware that uses components older than that. Microchip still makes 8051 clones 45 years after the 8051 was released.
Because the problem isn't just this. It's that the flight controller did not properly decide what to do when the data spiked because of this issue as well.
In redundant systems like these, how do you avoid the voting circuit becoming a single point of failure?
Eg. I could understand if each subsystem had its own actuators and they were designed so any 3 could aerodynamically override the other 2, but I don't think that's how it works in practice.
Voting can be coordinated between the N cpus rather than an external arbiter (even making that redundant eventually required the CPUs to decide what to do if they disagree so may as well handle it internally).
I'm reminded of the Apollo moon landing where the computer was rapidly rebooting and being in an OK-ish state to continue to be useful almost immediately
It wasn't rebooting, it ran out of memory and started aborting lower priority tasks. It was a excellent example of robust programming in the face of unexpected usage scenarios.
Of topic for the thread, but on for the comment: I was working in an automotive project 3 years ago. It was all about safety, and one hypothesis was the processor could get overloaded. I was astonished no one in a grouo of 20 “senior sw architecs” had any idea about the concept of load shedding. The proposed solution was “in that case, reboot”.
Mind you whatever came out of that project is rolling on the street today.
software fixes are totally fine since the chance of two redundant pairs failing within the time it takes to correct these errors is more zero's than there are atoms in the universe. (each pilot has a redundant computer and because there's two pilots there's two redundant pairs)
("une supervision interne du composant à l’origine de la défaillance ;
- un mécanisme de redémarrage automatique de ce composant dès lors que la défaillance
est détectée)
Do they really need to ground the entire fleet for that? One incident for ten thousand planes in the air for years. I'd think that giving airlines two months to fix it would be sufficient.
I don’t believe it’s been years, only the latest firmware version for the ELAC is affected. The fix is to downgrade (or replace hardware with a unit running earlier firmware)
I wonder who eats the cost of this? I presume it's the airlines.
So the immediate cost to Airbus of grounding the fleet is quite low, whilst the downside of not grounding the fleet (risk of incident, lawsuits, reputation, etc.) could be substantial.
It sounds like the fix is fairly quick so probably not as expensive as the max multi month groundings
I doubt anyone is going to sue. Repairs etc are a part of life when owning aircraft. So as long as Airbus makes this happen fast and smooth they’re probably ok
I've noticed that some carriers seem to be suggesting that there might be no impact to flights, but isn't this an immediate grounding for each aircraft until the update is made?
How is it possible that this wouldn't impact upon flight schedules?
Curious what a sw change might have done in terms of resiliency. Maybe an incorrect memory setting or some code path that is not calculating things redundantly maybe?
There was a very large CME ten days ago. The NOAA scale had predicted a high likelihood of disruptions, and had specifically suggested that spacecraft and high altitude aircraft could be impacted.
FWIW the "industry sources say" line on the incident is that it occurred on 30 October[1], so further back than ten days ago but of course there may have been other CME incidents at that time.
The European Agency Aviation Safety Agency [2] instruction describes the characteristics of the incident but not the date.
I feel like the event was something that happened to a plane. That said, I wouldn't think sunlight would be penetrating to the chips running the plane.
> The grounding of Airbus A320neo aircraft around the world can be traced back to an incident on a JetBlue flight operating a Cancun to New Jersey service on 30 October.
> At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.
> The Thursday flight from Cancun was headed to Newark, New Jersey, when the altitude dropped, leading to the diversion to Tampa International Airport, the US Federal Aviation Administration said in a statement.
> Pilots reported “a flight control issue” and described injuries including a possible “laceration in the head,” according to air traffic audio recorded by LiveATC.net.
> Medical personnel met the passengers and crew on the ground at the airport. Between 15 and 20 people were taken to hospitals with non-life-threatening injuries, said Vivian Shedd, a spokesperson for Tampa Fire Rescue.
> Pablo Rojas, a Miami-based attorney who specialises in aviation law, said a “flight control issue” indicated that the aircraft wasn't responding to the pilots.
> At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.
I’m surprised passengers are allowed to unbuckle for so much of each flight. You can get injured while buckled it, but that seems less common.
The flight attendants/safety card will tell you to stay buckled whenever seated, even if the seat belt sign is off, but many (most?) people will ignore that guidance and stay unbuckled for as long as they are technically allowed.
Only aviation professionals or recovering flight phobics like me who have watched every episode of Air Crash Investigation will take proactive safety measure of their own accord. To normies it's all just a pointless hassle.
I stay buckled and I’m just a “normie” not afraid of flying that understands turbulence doesn’t always happen in a bell curve with some notice. Not sure if that makes you feel any better? :)
Following the Airbus A320 emergency airworthiness action, everyone will be talking about the ELAC (Elevator Aileron Computer) manufactured by Thales, which caused a sudden pitch-down without pilot input on JetBlue 1230 back in October.
I was traveling during this entire ordeal. My flight got delayed by 7 hours. Insane day, just now boarding my flight. American Airlines was in shambles today.
I’d really, really like to know what microcontroller family this was found on. Assuming that this is a safety processor (lockstep, ECC, etc) it suggests that ECC was insufficient for the level of bit flips they’re seeing — and if the concern is data corruption, not unintended restart, it means it’s enough flips in one word to be undetectable. The environment they’re operating in isn’t that different from everyone else, so unless they ate some margin elsewhere (bad voltage corner or something), this can definitely be relevant to others. Also would be interesting to know if it’s NVM or SRAM that’s effected.
See my other comments in the other threads. This does not have EDAC. I was as surprised as you but it doesn't seems to be an MCU but a composition of several distinct chips. That flight computer was designed in the 90's and updated in 2002 with a new hw variant that does have edac. So yes, for this kind of thing, I can buy that a bit flip happened.
You can see much more data in the report:
https://www.atsb.gov.au/sites/default/files/media/3532398/ao...
The recalled aircraft include the latest A320neo model, some of which are basically brand new. Why would they be using flight computers from before 2002? Why is an old report from 2008, relating to a completely different aircraft type (A330), relevant to the A320 issue today?
> Why would they be using flight computers from before 2002?
Why would you assume they're not? I don't know about aircraft specifically, but there's plenty of hardware that uses components older than that. Microchip still makes 8051 clones 45 years after the 8051 was released.
The neo is not brand new - it's an incremental update to the 320. neo refers to New Engine Option
They wrote "some of which are basically brand new", which is technically correct.
They didn't say the design was brand new.
Because the problem isn't just this. It's that the flight controller did not properly decide what to do when the data spiked because of this issue as well.
> Why would they be using flight computers from before 2002?
Guessing that using previously certified stuff is an advantage
An early revision of the Raspberry Pi 2 would crash if you hit it with a bright light like a camera flash. Specifically a xenon flash.
https://forums.raspberrypi.com/viewtopic.php?t=99167
https://forums.raspberrypi.com/viewtopic.php?f=28&t=99042
https://www.raspberrypi.com/news/xenon-death-flash-a-free-ph...
https://www.youtube.com/watch?v=wyptwlzRqaI
Completely unrelated and due to a design failure by the rpi folks.
Yah, but that's a case of the package not being opaque enough.
proper SEU mitigation goes far beyond ECC. Satellites fly higher than the A320, and they (at least the ones I know about) use Triple Modular Redundancy: https://en.wikipedia.org/wiki/Triple_modular_redundancy
https://en.wikipedia.org/wiki/Single-event_upset
For manned spaceflight, NASA ups N from 3 to 5.
Other mitigations include completely disabling all CPU caches (with a big performance hit), and continuously refreshing the ECC RAM in background.
There are also a bunch of hardware mitigations to prevent "latch up" of the digital circuits.
In redundant systems like these, how do you avoid the voting circuit becoming a single point of failure?
Eg. I could understand if each subsystem had its own actuators and they were designed so any 3 could aerodynamically override the other 2, but I don't think that's how it works in practice.
Voting can be coordinated between the N cpus rather than an external arbiter (even making that redundant eventually required the CPUs to decide what to do if they disagree so may as well handle it internally).
if the issue is radiation bit flipping, you could make that part overly shielded?
Define ‘overly’. You can submerge it in a sphere of water, but that’s going to be expensive to launch.
I am worried about a software fix for what looks like hardware problem.
It could be as simple as storing multiple copies of the relevant data and adding a checksum, something like that.
Hardware fix is the ultimate solution but it might be possible to paper over with software.
Gracefully handling hardware faults is a software problem. The Air France Flight 447 crash was the result of bad software and bad hardware.
And bad pilot training, if I recall correctly.
I'm reminded of the Apollo moon landing where the computer was rapidly rebooting and being in an OK-ish state to continue to be useful almost immediately
It wasn't rebooting, it ran out of memory and started aborting lower priority tasks. It was a excellent example of robust programming in the face of unexpected usage scenarios.
Of topic for the thread, but on for the comment: I was working in an automotive project 3 years ago. It was all about safety, and one hypothesis was the processor could get overloaded. I was astonished no one in a grouo of 20 “senior sw architecs” had any idea about the concept of load shedding. The proposed solution was “in that case, reboot”.
Mind you whatever came out of that project is rolling on the street today.
software fixes are totally fine since the chance of two redundant pairs failing within the time it takes to correct these errors is more zero's than there are atoms in the universe. (each pilot has a redundant computer and because there's two pilots there's two redundant pairs)
The Aviation Herald has more technical details:
https://avherald.com/h?article=52f1ffc3&opt=0
Has BoFesc vibes "It's friday, so I get into work early, before lunch even. The phone rings. Shit!
I turn the page on the excuse sheet. "SOLAR FLARES" stares out at me. I'd better read up on that..."
Solar Flares was always my favourite result on the BoFH Excuse Generator.
http://jefflane.org/bofh/bofh.pl
The aerospace industry has had countermeasures in place against bit-flips for a long time, oftentimes thanks to redudancy
Airbus/Thales's fix in this case appears to add more error checking, and to restart the misbehaving component. https://bea.aero/fileadmin/user_upload/BEA2024-0404-BEA2025-...
("une supervision interne du composant à l’origine de la défaillance ; - un mécanisme de redémarrage automatique de ce composant dès lors que la défaillance est détectée)
Do they really need to ground the entire fleet for that? One incident for ten thousand planes in the air for years. I'd think that giving airlines two months to fix it would be sufficient.
this is Airbus, not Boeing
I don’t believe it’s been years, only the latest firmware version for the ELAC is affected. The fix is to downgrade (or replace hardware with a unit running earlier firmware)
I wonder who eats the cost of this? I presume it's the airlines.
So the immediate cost to Airbus of grounding the fleet is quite low, whilst the downside of not grounding the fleet (risk of incident, lawsuits, reputation, etc.) could be substantial.
Yeah should be airlines
It sounds like the fix is fairly quick so probably not as expensive as the max multi month groundings
I doubt anyone is going to sue. Repairs etc are a part of life when owning aircraft. So as long as Airbus makes this happen fast and smooth they’re probably ok
I would personally not want to seat in those planes in those 2 months.
I imagine it could help with Airbus marketing.
"We take proactive measures, whereas our competitor only takes action after multiple fatal crashes!"
Imagine an airplane crashed in these 2 months. I bet you would join the chorus and blame them for gross negligence.
There's a great postmortem here about what might have been a similar SEU (single event upset--bitflip) here: https://www.atsb.gov.au/sites/default/files/media/3532398/ao...
We flew too close to the sun
I've noticed that some carriers seem to be suggesting that there might be no impact to flights, but isn't this an immediate grounding for each aircraft until the update is made?
How is it possible that this wouldn't impact upon flight schedules?
The grounding is for 6000 of 11000 A320 series. I believe it's some combination of software and hardware configuration that is at risk.
Thank you; that makes sense. I had the impression it was the entire fleet.
N of 1, but I’m stuck in phoenix overnight because our flight was delayed an hour and a half by airbus maintenance and we missed our connection.
Apparently the fix is reverting to a previous version of the SW (see https://avherald.com/h?article=52f1ffc3&opt=0 )
Curious what a sw change might have done in terms of resiliency. Maybe an incorrect memory setting or some code path that is not calculating things redundantly maybe?
More discussion: https://news.ycombinator.com/item?id=46082296
A friend works at Jetblue. They are scrambling hard to do the updates.
Solar radiation like solar wind, or sunlight? They don’t say.
“Analysis of a recent event”
I presume they mean a Coronal Mass Ejection.
There was a very large CME ten days ago. The NOAA scale had predicted a high likelihood of disruptions, and had specifically suggested that spacecraft and high altitude aircraft could be impacted.
https://www.swpc.noaa.gov/noaa-scales-explanation
https://kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/prediction/de...
FWIW the "industry sources say" line on the incident is that it occurred on 30 October[1], so further back than ten days ago but of course there may have been other CME incidents at that time.
The European Agency Aviation Safety Agency [2] instruction describes the characteristics of the incident but not the date.
[1] https://www.theguardian.com/business/2025/nov/28/airbus-issu...
[2] https://ad.easa.europa.eu/ad/2025-0268-E
I feel like the event was something that happened to a plane. That said, I wouldn't think sunlight would be penetrating to the chips running the plane.
Gamma rays penetrate everything and have definitely been known to disrupt computer circuits.
Yes, which is why the solar flare scenario makes more sense.
> The grounding of Airbus A320neo aircraft around the world can be traced back to an incident on a JetBlue flight operating a Cancun to New Jersey service on 30 October.
> At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.
> The Thursday flight from Cancun was headed to Newark, New Jersey, when the altitude dropped, leading to the diversion to Tampa International Airport, the US Federal Aviation Administration said in a statement.
> Pilots reported “a flight control issue” and described injuries including a possible “laceration in the head,” according to air traffic audio recorded by LiveATC.net.
> Medical personnel met the passengers and crew on the ground at the airport. Between 15 and 20 people were taken to hospitals with non-life-threatening injuries, said Vivian Shedd, a spokesperson for Tampa Fire Rescue.
> Pablo Rojas, a Miami-based attorney who specialises in aviation law, said a “flight control issue” indicated that the aircraft wasn't responding to the pilots.
https://www.stuff.co.nz/travel/360903363/what-happened-fligh...
> At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.
I’m surprised passengers are allowed to unbuckle for so much of each flight. You can get injured while buckled it, but that seems less common.
The flight attendants/safety card will tell you to stay buckled whenever seated, even if the seat belt sign is off, but many (most?) people will ignore that guidance and stay unbuckled for as long as they are technically allowed.
Only aviation professionals or recovering flight phobics like me who have watched every episode of Air Crash Investigation will take proactive safety measure of their own accord. To normies it's all just a pointless hassle.
People have different priors for bad things that can happen on a plane. If you’ve experienced turbulence you’ll probably buckle up.
I stay buckled and I’m just a “normie” not afraid of flying that understands turbulence doesn’t always happen in a bell curve with some notice. Not sure if that makes you feel any better? :)
This is one of the rare cases where, IMO, it makes sense to use a modified title as you've done here.
Following the Airbus A320 emergency airworthiness action, everyone will be talking about the ELAC (Elevator Aileron Computer) manufactured by Thales, which caused a sudden pitch-down without pilot input on JetBlue 1230 back in October.
So here’s everything you need to know about ELAC.
The ELAC System in the Airbus A320: The Brains Behind Pitch and Roll Control https://x.com/Turbinetraveler/status/1994498724513345637
Better not be "vibe-coded".
I was traveling during this entire ordeal. My flight got delayed by 7 hours. Insane day, just now boarding my flight. American Airlines was in shambles today.