61 comments

  • addaon 7 hours ago

    I’d really, really like to know what microcontroller family this was found on. Assuming that this is a safety processor (lockstep, ECC, etc) it suggests that ECC was insufficient for the level of bit flips they’re seeing — and if the concern is data corruption, not unintended restart, it means it’s enough flips in one word to be undetectable. The environment they’re operating in isn’t that different from everyone else, so unless they ate some margin elsewhere (bad voltage corner or something), this can definitely be relevant to others. Also would be interesting to know if it’s NVM or SRAM that’s effected.

    • RealityVoid 3 hours ago

      See my other comments in the other threads. This does not have EDAC. I was as surprised as you but it doesn't seems to be an MCU but a composition of several distinct chips. That flight computer was designed in the 90's and updated in 2002 with a new hw variant that does have edac. So yes, for this kind of thing, I can buy that a bit flip happened.

      You can see much more data in the report:

      https://www.atsb.gov.au/sites/default/files/media/3532398/ao...

      • Reason077 an hour ago

        The recalled aircraft include the latest A320neo model, some of which are basically brand new. Why would they be using flight computers from before 2002? Why is an old report from 2008, relating to a completely different aircraft type (A330), relevant to the A320 issue today?

        • LiamPowell 26 minutes ago

          > Why would they be using flight computers from before 2002?

          Why would you assume they're not? I don't know about aircraft specifically, but there's plenty of hardware that uses components older than that. Microchip still makes 8051 clones 45 years after the 8051 was released.

        • 4ndrewl 15 minutes ago

          The neo is not brand new - it's an incremental update to the 320. neo refers to New Engine Option

          • rkomorn 13 minutes ago

            They wrote "some of which are basically brand new", which is technically correct.

            They didn't say the design was brand new.

        • RealityVoid 36 minutes ago

          Because the problem isn't just this. It's that the flight controller did not properly decide what to do when the data spiked because of this issue as well.

        • Havoc 34 minutes ago

          > Why would they be using flight computers from before 2002?

          Guessing that using previously certified stuff is an advantage

    • TehCorwiz 6 hours ago
      • russdill 3 hours ago

        Completely unrelated and due to a design failure by the rpi folks.

      • mlyle 4 hours ago

        Yah, but that's a case of the package not being opaque enough.

    • anonymousiam 3 hours ago

      proper SEU mitigation goes far beyond ECC. Satellites fly higher than the A320, and they (at least the ones I know about) use Triple Modular Redundancy: https://en.wikipedia.org/wiki/Triple_modular_redundancy

      https://en.wikipedia.org/wiki/Single-event_upset

      For manned spaceflight, NASA ups N from 3 to 5.

      Other mitigations include completely disabling all CPU caches (with a big performance hit), and continuously refreshing the ECC RAM in background.

      There are also a bunch of hardware mitigations to prevent "latch up" of the digital circuits.

      • rkagerer 2 hours ago

        In redundant systems like these, how do you avoid the voting circuit becoming a single point of failure?

        Eg. I could understand if each subsystem had its own actuators and they were designed so any 3 could aerodynamically override the other 2, but I don't think that's how it works in practice.

        • AlphaSite an hour ago

          Voting can be coordinated between the N cpus rather than an external arbiter (even making that redundant eventually required the CPUs to decide what to do if they disagree so may as well handle it internally).

        • exe34 an hour ago

          if the issue is radiation bit flipping, you could make that part overly shielded?

          • baq 6 minutes ago

            Define ‘overly’. You can submerge it in a sphere of water, but that’s going to be expensive to launch.

    • jayanmn 6 hours ago

      I am worried about a software fix for what looks like hardware problem.

      • afavour 5 hours ago

        It could be as simple as storing multiple copies of the relevant data and adding a checksum, something like that.

        Hardware fix is the ultimate solution but it might be possible to paper over with software.

      • themerone 5 hours ago

        Gracefully handling hardware faults is a software problem. The Air France Flight 447 crash was the result of bad software and bad hardware.

        • f1shy 2 hours ago

          And bad pilot training, if I recall correctly.

        • vel0city 5 hours ago

          I'm reminded of the Apollo moon landing where the computer was rapidly rebooting and being in an OK-ish state to continue to be useful almost immediately

          • CrossVR 3 hours ago

            It wasn't rebooting, it ran out of memory and started aborting lower priority tasks. It was a excellent example of robust programming in the face of unexpected usage scenarios.

            • f1shy 2 hours ago

              Of topic for the thread, but on for the comment: I was working in an automotive project 3 years ago. It was all about safety, and one hypothesis was the processor could get overloaded. I was astonished no one in a grouo of 20 “senior sw architecs” had any idea about the concept of load shedding. The proposed solution was “in that case, reboot”.

              Mind you whatever came out of that project is rolling on the street today.

      • kachapopopow 5 hours ago

        software fixes are totally fine since the chance of two redundant pairs failing within the time it takes to correct these errors is more zero's than there are atoms in the universe. (each pilot has a redundant computer and because there's two pilots there's two redundant pairs)

  • rene_d 24 minutes ago

    The Aviation Herald has more technical details:

    https://avherald.com/h?article=52f1ffc3&opt=0

  • qaq 7 hours ago

    Has BoFesc vibes "It's friday, so I get into work early, before lunch even. The phone rings. Shit!

    I turn the page on the excuse sheet. "SOLAR FLARES" stares out at me. I'd better read up on that..."

  • pyb 3 hours ago

    The aerospace industry has had countermeasures in place against bit-flips for a long time, oftentimes thanks to redudancy

    Airbus/Thales's fix in this case appears to add more error checking, and to restart the misbehaving component. https://bea.aero/fileadmin/user_upload/BEA2024-0404-BEA2025-...

    ("une supervision interne du composant à l’origine de la défaillance ; - un mécanisme de redémarrage automatique de ce composant dès lors que la défaillance est détectée)

  • joelthelion 4 hours ago

    Do they really need to ground the entire fleet for that? One incident for ten thousand planes in the air for years. I'd think that giving airlines two months to fix it would be sufficient.

    • miyuru 7 minutes ago

      this is Airbus, not Boeing

    • mrpippy 3 hours ago

      I don’t believe it’s been years, only the latest firmware version for the ELAC is affected. The fix is to downgrade (or replace hardware with a unit running earlier firmware)

    • jfoster 2 hours ago

      I wonder who eats the cost of this? I presume it's the airlines.

      So the immediate cost to Airbus of grounding the fleet is quite low, whilst the downside of not grounding the fleet (risk of incident, lawsuits, reputation, etc.) could be substantial.

      • Havoc 29 minutes ago

        Yeah should be airlines

        It sounds like the fix is fairly quick so probably not as expensive as the max multi month groundings

        I doubt anyone is going to sue. Repairs etc are a part of life when owning aircraft. So as long as Airbus makes this happen fast and smooth they’re probably ok

    • f1shy 2 hours ago

      I would personally not want to seat in those planes in those 2 months.

    • kijin 3 hours ago

      I imagine it could help with Airbus marketing.

      "We take proactive measures, whereas our competitor only takes action after multiple fatal crashes!"

      • brabel 28 minutes ago

        Imagine an airplane crashed in these 2 months. I bet you would join the chorus and blame them for gross negligence.

  • 65a 3 hours ago

    There's a great postmortem here about what might have been a similar SEU (single event upset--bitflip) here: https://www.atsb.gov.au/sites/default/files/media/3532398/ao...

  • minitoar 4 hours ago

    We flew too close to the sun

  • jfoster 5 hours ago

    I've noticed that some carriers seem to be suggesting that there might be no impact to flights, but isn't this an immediate grounding for each aircraft until the update is made?

    How is it possible that this wouldn't impact upon flight schedules?

    • icegreentea2 5 hours ago

      The grounding is for 6000 of 11000 A320 series. I believe it's some combination of software and hardware configuration that is at risk.

      • jfoster 3 hours ago

        Thank you; that makes sense. I had the impression it was the entire fleet.

    • arrel 5 hours ago

      N of 1, but I’m stuck in phoenix overnight because our flight was delayed an hour and a half by airbus maintenance and we missed our connection.

  • raverbashing 31 minutes ago

    Apparently the fix is reverting to a previous version of the SW (see https://avherald.com/h?article=52f1ffc3&opt=0 )

    Curious what a sw change might have done in terms of resiliency. Maybe an incorrect memory setting or some code path that is not calculating things redundantly maybe?

  • ChrisArchitect 11 hours ago
  • owenthejumper 6 hours ago

    A friend works at Jetblue. They are scrambling hard to do the updates.

  • op00to 7 hours ago

    Solar radiation like solar wind, or sunlight? They don’t say.

    • mr_toad 7 hours ago

      “Analysis of a recent event”

      I presume they mean a Coronal Mass Ejection.

      • bparsons 7 hours ago

        There was a very large CME ten days ago. The NOAA scale had predicted a high likelihood of disruptions, and had specifically suggested that spacecraft and high altitude aircraft could be impacted.

        https://www.swpc.noaa.gov/noaa-scales-explanation

        https://kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/prediction/de...

      • fwip 7 hours ago

        I feel like the event was something that happened to a plane. That said, I wouldn't think sunlight would be penetrating to the chips running the plane.

        • dtagames 7 hours ago

          Gamma rays penetrate everything and have definitely been known to disrupt computer circuits.

          • fwip 3 hours ago

            Yes, which is why the solar flare scenario makes more sense.

        • awesome_dude 6 hours ago

          > The grounding of Airbus A320neo aircraft around the world can be traced back to an incident on a JetBlue flight operating a Cancun to New Jersey service on 30 October.

          > At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.

          > The Thursday flight from Cancun was headed to Newark, New Jersey, when the altitude dropped, leading to the diversion to Tampa International Airport, the US Federal Aviation Administration said in a statement.

          > Pilots reported “a flight control issue” and described injuries including a possible “laceration in the head,” according to air traffic audio recorded by LiveATC.net.

          > Medical personnel met the passengers and crew on the ground at the airport. Between 15 and 20 people were taken to hospitals with non-life-threatening injuries, said Vivian Shedd, a spokesperson for Tampa Fire Rescue.

          > Pablo Rojas, a Miami-based attorney who specialises in aviation law, said a “flight control issue” indicated that the aircraft wasn't responding to the pilots.

          https://www.stuff.co.nz/travel/360903363/what-happened-fligh...

          • lostlogin 6 hours ago

            > At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.

            I’m surprised passengers are allowed to unbuckle for so much of each flight. You can get injured while buckled it, but that seems less common.

            • MaxfordAndSons 5 hours ago

              The flight attendants/safety card will tell you to stay buckled whenever seated, even if the seat belt sign is off, but many (most?) people will ignore that guidance and stay unbuckled for as long as they are technically allowed.

              Only aviation professionals or recovering flight phobics like me who have watched every episode of Air Crash Investigation will take proactive safety measure of their own accord. To normies it's all just a pointless hassle.

              • baq a minute ago

                People have different priors for bad things that can happen on a plane. If you’ve experienced turbulence you’ll probably buckle up.

              • sailfast 3 hours ago

                I stay buckled and I’m just a “normie” not afraid of flying that understands turbulence doesn’t always happen in a bell curve with some notice. Not sure if that makes you feel any better? :)

  • jMyles 11 hours ago

    This is one of the rare cases where, IMO, it makes sense to use a modified title as you've done here.

  • kappi 5 hours ago

    Following the Airbus A320 emergency airworthiness action, everyone will be talking about the ELAC (Elevator Aileron Computer) manufactured by Thales, which caused a sudden pitch-down without pilot input on JetBlue 1230 back in October.

    So here’s everything you need to know about ELAC.

    The ELAC System in the Airbus A320: The Brains Behind Pitch and Roll Control https://x.com/Turbinetraveler/status/1994498724513345637

  • rvz 4 hours ago

    Better not be "vibe-coded".

  • viiralvx 5 hours ago

    I was traveling during this entire ordeal. My flight got delayed by 7 hours. Insane day, just now boarding my flight. American Airlines was in shambles today.