150 comments

  • crote 15 hours ago

    I strongly recommend watching/reading the entire report, or the summary by Sal Mercogliano of What's Going On In Shipping [0].

    Yes, the loose wire was the immediate cause, but there was far more going wrong here. For example:

    - The transformer switchover was set to manual rather than automatic, so it didn't automatically fail over to the backup transformer.

    - The crew did not routinely train transformer switchover procedures.

    - The two generators were both using a single non-redundant fuel pump (which was never intended to supply fuel to the generators!), which did not automatically restart after power was restored.

    - The main engine automatically shut down when the primary coolant pump lost power, rather than using an emergency water supply or letting it overheat.

    - The backup generator did not come online in time.

    It's a classic Swiss Cheese model. A lot of things had to go wrong for this accident to happen. Focusing on that one wire isn't going to solve all the other issues. Wires, just like all other parts, will occasionally fail. One wire failure should never have caused an incident of this magnitude. Sure, there should probably be slightly better procedures for checking the wiring, but next time it'll be a failed sensor, actuator, or controller board.

    If we don't focus on providing and ensuring a defense-in-depth, we will sooner or later see another incident like this.

    [0]: https://www.youtube.com/watch?v=znWl_TuUPp0

    • jacquesm 4 hours ago

      The problem is that there are a thousand merchant marine vessels operating right now that are all doing great - until the next loose wire. The problem is that nobody knows about that wire and it worked fine on the last trip. The other systems are all just as marginal as they were on the 'Dali' but that one shitty little wire is masking that.

      Running a 'tight ship' is great when you have a budget to burn on excellent quality crew. But shipping is so incredibly cut-throat that the crew members make very little money, are effectively modern slaves and tend to carry responsibilities way above their pay grade. They did what they could, and more than that, and for their efforts they were rewarded with what effectively amounted to house arrest while the authorities did their thing. The NTSB of course will focus on the 'hard' causes. But you can see a lot of frustration shine through towards the owners who even in light of the preliminary findings had changed absolutely nothing on the rest of their fleet.

      The recommendation to inspect the whole ship with an IR camera had me laughing out loud. We're talking about a couple of kilometers of poorly accessible duct work and cabinets. You can do that while in port, but while you're in port most systems are idle or near idle and so you won't ever find an issue like this until you are underway, when vibration goes up and power consumption shoots up compared to being in port.

      There is no shipping company that is effectively going to do a sea trial after every minor repair, usually there is a technician from some supplier that boards the vessel (often while it is underway), makes some fix and then goes off-board again. Vessels that are not moving are money sinks so the goal is to keep turnaround time in port to an absolute minimum.

      What should really amaze you is how few of these incidents there are. In spite of this being a regulated industry it is first and foremost an oversight failure, if the regulators would have more budget and more manpower there maybe would be a stronger drive to get things technically in good order (resist temptation: 'shipshape').

      • gosub100 8 minutes ago

        It's a tangent but I don't understand why the dock workers can unionize and earn livable wages but the crew cannot.

        • padjo 4 minutes ago

          The dock can’t move to a jurisdiction that is less union friendly.

      • amelius 2 hours ago

        You can also look at the problem from the perspective of the bridge. Why was it possible that a ship took it down? Motors can fail ...

        • richardwhiuk 10 minutes ago

          It’s not realistically plausible to build bridges that won’t be brought down by that size of ship

    • Aurornis 15 hours ago

      Thanks for the summary for those of us who can't watch video right now.

      There are so many layers of failures that it makes you wonder how many other operations on those ships are only working because those fallbacks, automatic switchovers, emergency supplies, and backup systems save the day. We only see the results when all of them fail and the failure happens to result in some external problem that means we all notice.

      • arjie 15 hours ago

        It seems to just be standard "normalization of deviance" to use the language of safety engineering. You have 5 layers of fallbacks, so over time skipping any of the middle layers doesn't really have anything fail. So in time you end up with a true safety factor equal only to the last layer. Then that fails and looking back "everything had to go wrong".

        As Sidney Dekker (of Understanding Human Error fame) says: Murphy's Law is wrong - everything that can go wrong will go right. The problem arises from the operators all assuming that it will keep going right.

        I remember reading somewhere that part of Qantas's safety record came from the fact that at one time they had the highest number of minor issues. In some sense, you want your error detection curve to be smooth: as you get closer to catastrophe, your warnings should get more severe. On this ship, it appeared everything was A-OK till it bonked a bridge.

        • bombcar 14 hours ago

          This is the most pertinent thing to learn from these NTSB crash investigations - it's not what went wrong at the final disaster, but all the things that went wrong that didn't detect that they were down to one layer of defense.

          Your car engaging auto brake to prevent a collision shouldn't be a "whew, glad that didn't happen" and more a "oh shit, I need to work on paying attention more."

          • aidenn0 7 hours ago

            I had to disable the auto-brake from RCT[1] sensors because of too many false-positives (like 3 a week) in my car.

            1: rear-cross-traffic i.e. when backing up and cars are coming from the side.

          • raverbashing 5 hours ago

            Yes and having 3 O-rings doesn't mean you can have one frozen solid "just this time"

          • dmurray 12 hours ago

            Why then does the NTSB point blame so much at the single wiring issue? Shouldn't they have the context to point to the 5 things that went wrong in the Swiss cheese and not pat themselves on the back with having found the almost-irrelevant detail of

            > Our investigators routinely accomplish the impossible, and this investigation is no different...Finding this single wire was like hunting for a loose rivet on the Eiffel Tower.

            In the software world, if I had an application that failed when a single DNS query failed, I wouldn't be pointing the blame at DNS and conducting a deep dive into why this particular query timed out. I'd be asking why a single failure was capable of taking down the app for hundreds or thousands of other users.

            • plorg 12 hours ago

              That seems like a difference between the report and the press release. I'm sure it doesn't help that the current administration likes quick, pat answers.

              The YouTube animation they published notes that this also wasn't just one wire - they found many wires on the ship that were terminated and labeled in the same (incorrect) way, which points to an error at the ship builder and potentially a lack of adequate documentation or training materials from the equipment manufacturer, which is why WAGO received mention and notice.

              • da_chicken an hour ago

                > I'm sure it doesn't help that the current administration likes quick, pat answers.

                Oh, the wire was blue?

                In all seriousness, listing just the triggering event in the headline isn't that far out of line. Like the Titanic hit an iceburg, but it was also traveling faster than it should in spite of iceberg warnings, and it did so overloaded and without adequate lifeboats, and it turns out there were design flaws in the hull. But the iceberg still gets first billing.

              • bombcar 11 hours ago

                It’s also immediately actionable and other similar ships can investigate their wires

            • toast0 12 hours ago

              The faulty wire is the root cause. If it didn't trigger the sequence of events, all of the other things wouldn't have happened. And it's kind of a tricky thing to find, so that's an exciting find.

              The flushing pump not restarting when power resumed did also cause a blackout in port the day before the incident. But you know, looking into why you always have two blackouts when you have one is something anybody could do; open the main system breaker, let the crew restore it and that flushing pump will likely fail in the same way every time... but figuring out why and how the breaker opened is neat, when it's not something obvious.

              • nothercastle 10 hours ago

                Operators always like to just clear the fault and move on they have extremely high pressure to make schedule and low incentive to work safely

      • crote 15 hours ago

        Oh, it gets even worse!

        The NTSB also had some comments on the ship's equivalent of a black box. Turns out it was impossible to download the data while it was still inside the ship, the manufacturer's software was awful and the various agencies had a group chat to share 3rd party software(!), the software exported thousands of separate files, audio tracks were mixed to the point of being nearly unusable, and the black box stopped recording some metrics after power loss "because it wasn't required to" - despite the data still being available.

        At least they didn't have anything negative to say about the crew: they reacted timely and adequately - they just didn't stand a chance.

        • nothercastle 10 hours ago

          It’s pretty common for black boxes to be load shed during an emergency. Kind of funny how that was allowed for a long time.

        • MengerSponge 9 hours ago

          "they reacted timely and adequately" and yet: they're indefinitely restricted (detained isn't the right word, but you get it) to Baltimore, while the ship is free to resume service.

      • haddonist 8 hours ago

        One of the things Sal Mercogliano stressed is that the crew (and possibly other crews of the same line) modified systems in order to save time.

        Rather than doing the process of purging high-sulphur fuel that can't be used in USA waters, they had it set so that some of the generators were fed from USA-approved fuel, resulting in redundancy & automatic failover being compromised.

        It seems probable that the wire failure would not have caused catastrophic overall loss of power if the generators had been in the normal configuration.

      • myself248 an hour ago
      • dboreham 11 hours ago

        Also the zeroth failure mode: someone built a bridge that will collapse if any of the many many large ships that sail beneath it can't steer itself with high precision.

        • myself248 an hour ago

          Right? There's an artificial island in that very harbor, which could be rammed by similar ships all day and give nary a fuck. It's called Fort Carroll and it was built in the *1850s*.

          Why the bridge piers weren't set into artificial islands, I can't fathom. Sure. Let's build a bridge across a busy port but not make it ship-proof. The bridge was built in the 1970s, had they forgotten how to make artificial islands?

        • foobar1962 10 hours ago

          Ships were a lot smaller when the bridge was designed and built.

    • renhanxue 14 hours ago

      The fuel pump not automatically restarting on power loss may actually have been an intentional safety feature to prevent scenarios like pumping fuel into a fire in or around the generators. Still part of the Swiss cheese model, of course.

      • crote 14 hours ago

        It wasn't. They were feeding generators 1 & 2 with the pump intended for flushing the lines while switching between different fuel types.

        The regular fuel pumps were set up to automatically restart, which is why a set of them came online to feed generator 3 (which automatically spinned up after 1 & 2 failed, and wasn't tied to the fuel-line-flushing pump) after the second blackout.

    • ChrisMarshallNY 14 hours ago

      I have found that 99% of all network problems are bad wires.

      I remember that the IT guys at my old company, used to immediately throw out every ethernet cable, and replace them with ones right out of the bag; first thing.

      But these ships tend to be houses of cards. They are not taken care of properly, and run on a shoestring budget. Many of them look like floating wrecks.

      • gerdesj 12 hours ago

        If I see a RJ45 plug with a broken locking thingie, or bare wires (not just bare copper - any internal wire), I chop the plug off.

        If I come across a CATx (solid core) cable being used as a really long patch lead then I lose my shit or perhaps get a backbox and face plate and modules out along with a POST tool.

        I don't look after floating fires.

      • leoedin 3 hours ago

        That's true for almost all electronics. I worked on robotic arms for a few years - if things broke it was always the wiring (well, to be precise - the connectors).

      • jmonty900 13 hours ago

        I recently had a home network outage. The last thing I tested was the in-wall wiring because I just didn't think that would be the cause. It was. Wiring fails!

        • nrhrjrjrjtntbt an hour ago

          Oh yeah had outages recently. Turned out to be corroded connector to box in the street. Not a wire per-se but close.

      • potato3732842 12 hours ago

        If I had a nickle for every time someone clobbered some critical connectivity with an ill-advised switch configuration I wouldn't have to work for a living.

        And the physical layer issues I do see are related to ham fisted people doing unrelated work in the cage.

        Actual failures are pretty damn rare.

    • kfarr 13 hours ago

      Another case study to add to the maritime chapter of this timeless classic: https://www.amazon.com/Normal-Accidents-Living-High-Risk-Tec...

      Like you said (and illustrated well in the book) it's never just 1 thing, these incidents happen when multiple systems interact and often reflect a the disinvestment in comprehensive safety schemes.

    • rolph 10 hours ago

      ive been in an environment like that.

      "nuisance" issues like that are deferred bcz they are not really causing a problem, so maintenance spends time on problems with things that make money, rather than what some consider spit n polish on things that have no prior failures.

    • FridayoLeary 12 hours ago

      Just insane how much criminal negligence went on. Even boeing hardly comes close. What needs to change is obviously a major review of how ships are allowed to operate near bridges and other infrastructure. And far stricter safety standards like aircraft face.

    • pstuart 15 hours ago

      Hopefully the lesson from this will be received by operators: it's way cheaper to invest in personnel, training, and maintenance than to let the shit hit the fan.

      • stackskipton 15 hours ago

        Why? It's cost them 100M (https://www.justice.gov/archives/opa/pr/us-reaches-settlemen...) but rebuilding the bridge is going to be 5.2Billion so if gundecking all this maintenance for 20+ years has saved more then 100M, they will do it again.

        • xp84 14 hours ago

          From your article - this answered a question I had:

          > The settlement does not include any damages for the reconstruction of the Francis Scott Key Bridge. The State of Maryland built, owned, maintained, and operated the bridge, and attorneys on the state’s behalf filed their own claim for those damages. Pursuant to the governing regulation, funds recovered by the State of Maryland for reconstruction of the bridge will be used to reduce the project costs paid for in the first instance by federal tax dollars.

          • Barbing 11 hours ago

            So was the bridge self-insured?

        • stevenjgarner 13 hours ago

          Isn't there a big liability insurance payout on this towards the 5.2 Billion, and if so won't the insurer be more motivated to mandate compliance?

          • nothercastle 10 hours ago

            Yes the insurer will likely be able to charge more.

        • toast0 14 hours ago

          The vessel owner may possibly be able to recover some of that from the manufacturer, as the wiring was almost certainly a manufacturing error, and maybe some of the configurations that continued the blackout were manufacturer choices as well.

          • potato3732842 12 hours ago

            At the end of the day we all just pay for it in terms of insurance costs priced into our goods.

            • usefulcat 10 hours ago

              What would be a better solution?

              • potato3732842 2 hours ago

                Well the current way involves paying for a bunch of non-value producing busy work by insurers, lawyers and a ton of expert parties relevant to the litigation process.

                There's probably some combination of "everyone just posts up a bond into a fund to cover this stuff" plus a really high deductible on payout that basically deletes all those expensive man hours without causing any increased incentive for carnage.

                Events like these are a VERY rare exception compared to all the shipping activities that go on in an uneventful manner. Doesn't take a genius to do the napkin math here. Whatever the solution is probably ought to try to avoid expending resources in the base case where everything is fine.

              • mjevans 7 hours ago

                Regulations to require work is done correctly the first time. Also inspections.

                I like a government that pays workers to look out for my safety.

              • DANmode 4 hours ago

                Informed consumers who actually walk, ever.

              • cco 7 hours ago

                A punishment that was felt by decision makers but was unable to be offloaded as a cost to the public, except maybe in the form of rent. Prison :)

            • genter 11 hours ago

              But it's important to "punish" (via punitive fines) the right people, so that they will put some effort into not making that mistake again.

        • lazide 15 hours ago

          Actually, to be even more cynical….

          If everyone saved $100M by doing this and it only cost one shipper $100M, then of course everyone else would do it and just hope they aren’t the one who has bad enough luck to hit the bridge.

          And statistically, almost all of them will be okay!

      • dv_dt 4 hours ago

        I imagine every vessel has its own corporation that owns it which would declare insolvency if this kind of thing happens

      • nothercastle 10 hours ago

        It’s not thought. These situations are extremely rare. When they happen it just close the company and shed liability.

        • dv_dt 4 hours ago

          Yup, nobody wants to admit that regulations and inspections are a reasonable solution

  • psunavy03 16 hours ago

    Although I was never named to a mishap board, my experience in my prior career in aviation is that the proper way to look at things like this is that while it is valuable to identify and try to fix the ultimate root cause of the mishap, it's also important to keep in mind what we called the "Swiss cheese model."

    Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up. Otherwise, something happens (planned or otherwise) that allows you to dodge the bullet this time.

    Meaning a) it's important to identify places where firebreaks and redundancies can be put in place to guard against failures further upstream, and b) it's important to recognize times when you had a near-miss, and still fix those root causes as well.

    Which is why the "retrospectives are useless" crowd spins me up so badly.

    • drivers99 16 hours ago

      > it's important to recognize times when you had a near-miss, and still fix those root causes as well.

      I mentioned this principal to the traffic engineer when someone almost crashed into me because of a large sign that blocked their view. The engineer looked into it and said the sight lines were within spec, but just barely, so they weren't going to do anything about it. Technically the person who almost hit me could have pulled up to where they had a good view, and looked both ways as they were supposed to, but that is relying on one layer of the cheese to fix a hole in another, to use your analogy.

      • kennethrc 16 hours ago

        Likewise with decorative hedges and other gardenwork; your post brought to mind this one hotel I stay regularly where a hedge is high enough and close enough to the exit that you have to nearly pull into the street to see if there's oncoming cars. I've mentioned to the FD that it's gonna get someone hurt one day, yet they've done nothing about it for years now.

        • avidiax 15 hours ago

          Send certified letters to the owner of the hedge and whatever government agency would enforce rules about road visibility. That puts them "on notice" legally, so that they can be held accountable for not enforcing their rules or taking precautions.

          • crote 15 hours ago

            The problem is that they are legally doing nothing wrong. Everything is done according to the rules, so they can't be held accountable for not following them. After all, they are taking all reasonable precautions, what more could be expected of them?

            The fact that the situation on the ground isn't safe in practice is irrelevant to the law. Legally the hedge is doing everything, so the blame falls on the driver. At best a "tragic accident" will result in a "recommendation" to whatever board is responsible for the rules to review them.

            • bombcar 14 hours ago

              All that applies for criminal cases, but if a civil lawsuit is started and evidence is presented to the jury that the parties being sued had been warned repeatedly that it would eventually occur, it can be quite spicy.

              Which is why if you want to be a bastard, you send it to the owners, the city, and both their insurance agencies.

              • ahmeneeroe-v2 13 hours ago

                This is stupid. Unless you happen to be the one that crashes it won't be a factor at all.

                • bombcar 11 hours ago

                  Discovery’s a bitch which is why they settle.

                • thaumasiotes 11 hours ago

                  Well, it could be; you can watch out for accidents at that intersection and offer to support a case arising from one.

                  If your goal is to get the intersection fixed, this is a reasonable thing to do.

            • mrandish 9 hours ago

              @Bombcar is correct. Once they've been legally notified of the potential issue, they have increased exposure to civil liability. Their lawyers and insurance company will strongly encourage them to just fix it (assuming it's not a huge cost to trim back the stupid hedge). A registered letter can create enough impetus to overcome organizational inertia. I've seen it happen.

              • purple_turtle an hour ago

                In my experience (European country) even email with magic words "clear risk to health and life" can jumpstart the process.

      • loeg 13 hours ago

        People love to rag on Software Engineers for not being "real" engineers, whatever that means, but American "Traffic Engineers" are by far the bigger joke of a profession. No interest in defense in depth, safety, or tradeoffs. Only "maximize vehicular traffic flow speed."

        • windows_hater_7 11 hours ago

          In this case, being a "traffic engineer" with the ability to sign engineering plans means graduating from an ABET-accredited engineering program, passing both the Fundamentals of Engineering exam and the Principles & Practice of Engineering exam, being licensed as a professional engineer, and passing the Professional Traffic Operations Engineer exam. I think they do a little more than "maximize vehicular traffic flow."

          • rocqua 5 hours ago

            Certifications prove that you studied, and are smart and or diligent enough to pass an exam.

            If those certifications try to teach you bad approaches. Then they don't help competence. In fact, they can get people stuck in bad approaches. Because it's what they have been Taught by the rigorous and unquestionable system. Especially when your job security comes from having those certifications, it becomes harder to say that the certifications teach wrong things.

            It seems quite likely from the outside that this is what happened to US traffic engineering. Specifically that they focus on making it safe to drive fast and with the extra point that safe only means safe for drivers.

            This isn't just based on judging their design outcomes to be bad. It's also in the data comparing the US to other countries. This is visible in vehicle deaths per Capita, but mostly in pedestrian deaths per Capita. Correcting for miles driven makes the vehicle deaths in the US merely high. But correcting for miles walked (not available data) likely pushes pedestrian deaths much higher. Which illustrates that a big part of the safety problem is prioritizing driving instead of encouraging and proyecting other modes of transportat. (And then still doing below average on driving safety)

          • loeg 10 hours ago

            > I think they do a little more than "maximize vehicular traffic flow."

            You would be mistaken. Traffic engineers are responsible for far, far more deaths than software engineers.

      • Mawr 11 hours ago

        To be fair, there is no way to fix this in the general case—large vehicles and other objects may obstruct your view also. Therefore, you have to learn to be cognisant of line-of-sight blockers and to deal with them anyway. So for a not-terrible driver, the only problem that this presents is that they have to slow down. Not ideal, but not a safety issue per se.

        That we allow terrible drivers to drive is another matter...

        • lmm 8 hours ago

          > there is no way to fix this in the general case—large vehicles and other objects may obstruct your view also

          Vehicles are generally temporary. It is actually possible to ensure decent visibility at almost all junctions, as I found when I moved to my current country - it just takes a certain level of effort.

    • Aurornis 15 hours ago

      > Which is why the "retrospectives are useless" crowd spins me up so badly.

      When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines. They're done every sprint (or week, if you're unlucky) and even if nothing happens the whole team might have to sit for an hour and come up with things to say to fill the air.

      In software, the analysis following a mishap is usually called a post-mortem. I haven't seen many complaints about those have no value. Those are usually highly appreciated. Thought some times the "blameless post-mortem" people take the term a little too literally and try to avoid exploring useful failures if they might cause uncomfortable conversations about individuals making mistakes or even dropping the ball.

      • burnstek 13 hours ago

        Post mortems are absolutely key in creating process improvements. If you think about an organization's most effective processes, they are likely just representations of years of fixed errors.

        Regarding blamelessness, I think it was W. Edwards Deming who emphasized the importance of blaming process over people, which is always preferable, but its critical for individuals to at least be aware of their role in the problem.

      • potato3732842 11 hours ago

        >When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines.

        You mean to tell me that this comment section where we spew buzzwords and reference the same tropes we do for every "disaster" isn't performative.

      • xp84 14 hours ago

        Agree. I am obligated to run those retrospectives and the SNR is very poor.

        It is nice though (as long as there isn't anyone in there that the team is afraid to be honest in front of), when people can vent about something that has been pissing them off, so that I as their manager know how they feel. But that happens only about 15-20% of the time. The rest is meaningless tripe like "Glad Project X is done" and "$TECHNOLOGY sucks" and "Good job to Bob and Susan for resolving the issue with the Acme account"

    • astrocat 16 hours ago

      this is essentially the gist of https://how.complexsystems.fail which has been circulating more with discussions of the recent AWS/Azure/Cloudflare outages.

    • robocat 6 hours ago

      > Swiss cheese model

      I always thought that before the "Swiss cheese model" introduced in the 1990s that the term Swiss cheese was used to mean something that had oodles of security holes(flaws).

      Perhaps I find the metaphor weird because pre-sliced cheese was introduced later in my life (processed slices were in my childhood, but not packets of pre-sliced cheese which is much more recent).

    • pugworthy 14 hours ago

      > All the holes in the cheese line up...

      I absolutely heard that in Hoover's voice.

      Is there an equivalent to YouTube's Pilot Debrief or other similar channels but for ships?

      https://www.youtube.com/@pilot-debrief

    • stackskipton 16 hours ago

      >Which is why the "retrospectives are useless" crowd spins me up so badly.

      As Ops person, I've said that before when talking about software and it's mainly because most companies will refuse to listen to the lessons inside of them so why am I wasting time doing this?

      To put it aviation terms, I'll write up something being like (Numbers made up) "Hey, V1 for Hornet loaded at 49000 pounds needs to be 160 knots so it needs 10000 feet for takeoff" Well, Sales team comes back and says NAS Norfolk is only 8700ft and customer demands 49000+ loads, we are not losing revenue so quiet Ops nerd!

      Then 49000+ Hornet loses an engine, overruns the runway, the fireball I'd said would happen, happens and everyone is SHOCKED, SHOCKED I TELL YOU this is happening.

      Except it's software and not aircraft and loss was just some money, maybe, so no one really cares.

    • thaumasiotes 15 hours ago

      > Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up.

      The metaphor relies on you mixing and matching some different batches of presliced Swiss cheese. In a single block, the holes in the cheese are guaranteed to line up, because they are two-dimensional cross sections of three-dimensional gas bubbles. The odds of a hole in one slice of Swiss cheese lining up with another hole in the following slice are very similar to the odds of one step in a staircase being followed by another step.

      • imtringued 2 hours ago

        The three-dimensional gas bubbles aren't connected. An attacker has to punch through the thin walls to cross between the bubbles or wear and tear has to erode the walls over time. This doesn't fundamentally change anything.

      • jibal 7 hours ago

        No, it's a metaphor.

      • psunavy03 14 hours ago

        And there's the archetypal comment on technology-based social media that is simultaneously technically correct and utterly irrelevant to the topic at hand.

        • mrguyorama 13 hours ago

          Actually the pedantry is meaningful!

          You cannot create a swiss cheese safety model with correlated errors, same as how the metaphor fails if the slices all come from the same block of swiss cheese!

          You have to ensure your holes come from different processes and systems! You have to ensure your swiss cheese holes come from different blocks of cheese!

  • tialaramex 16 hours ago

    Note that "Don't make mistakes" is no more actionable for maintenance of a huge cargo ship than for your 10MLoC software project. A successful safety strategy must assume there will be mistakes and deliver safe outcomes nevertheless.

    • potato3732842 2 hours ago

      It kind of is though. There's a lot less opportunity for failures at the limit and unforeseen scale. Mechanical things also mostly don't keel over or go haywire with no warning.

    • andrewflnr 9 hours ago

      Obviously this is the standard line any disaster prevention, and makes sense 99% of the time. But what's the standard line about where this whole protocols-to-catch-mistakes thing bottoms out? Obviously people executing the protocol can make mistakes, or fall victim to normalization of deviance. The same is true for the next level of safety protocol you layer on top of that. At some level, the only answer really is just "don't make mistakes", right? And you're mostly trying to make sure you can do that at a level where it's easier to not make mistakes, like simpler decisions not under time pressure.

      Am I missing something? I feel like one of us is crazy when people are talking about improving process instead of assigning blame without addressing the base case.

      • lmm 7 hours ago

        Normalization of deviance doesn't happen through people "making mistakes", at least not in the conventional sense. It's a deliberate choice, usually a response to bad incentives, or sometimes even a reasonable tradeoff.

        I mean ultimately establishing a good process requires make good choices and not making bad ones, sure. But the kind of bad decisions that you have to avoid are not really "mistakes" the same way that, like, switching on the wrong generator is a mistake.

        • andrewflnr 7 hours ago

          Quite, normalization is another failure mode, besides simple mistakes, that process has to account for.

  • airstrike 13 hours ago

    Only tangentially related but the debate over whether the Francis Scott Key bridge is or was a bridge got so heated on Wikipedia that the page had to be protected, and I finally have a reason for bringing this up

    Edit wars aside, it's a nice philosophical question.

    https://en.wikipedia.org/wiki/Francis_Scott_Key_Bridge_(Balt...

  • DamnInteresting 17 hours ago
    • bmelton 16 hours ago

      That was super helpful. I was assuming from skimming the text description that it was a failed crimp

      A lot of people wildly under-crimp things, but marine vessels not only have nuanced wire requirements, but more stringent crimping requirements that the field at large frustratingly refuses to adhere to despite ABYC and other codes insisting on it

      • Aurornis 16 hours ago

        > A lot of people wildly under-crimp things

        The good tools will crimp to the proper pressure and make it obvious when it has happened.

        Unfortunately the good tools aren't cheap. Even when they are used, some techs will substitute their own ideas of how a crimp should be made when nobody is watching them.

        • potato3732842 2 hours ago

          This attitude wherein one thinks they can just spend money and offload responsibility is exactly the problem.

          Abdicating responsibility to those "good tools" are why shit never gets crimped right. People just crimp away without a care in the world. Don't get me wrong, they're great for speed and when all you're doing it working on brand new stuff that fits perfect. But when you're working on something sketchy you really want the feedback of the older styles of tool that have more direct feedback. They have a place, but you have to know what that place is.

          See also: "the low level alarm would go off if it was empty"

        • DannyBee 14 hours ago

          While the US is still very manual at panel building, Europe is not.

          So outside of waiting time, I can go from eplan to "send me precrimped and labeled wires that were cut, crimped, and labeled by machine and automatically tested to spec" because this now exists as a service accessible even to random folks.

          It is not even expensive.

          • phasetransition 20 minutes ago

            Can you give an examples of companies that offer this service?

  • caminanteblanco 12 hours ago

    >The seven highway workers and inspector on the Key Bridge at the time were not notified of the Dali’s emergency situation before the bridge collapsed. We found that, had they been notified about the same time the MDTA Police officers were told to block vehicular traffic, the highway workers may have had sufficient time to drive to a portion of the bridge that did not collapse. Further, we found that effective and immediate communication to evacuate the bridge during an emergency is critical to ensuring the safety of bridge workers.

  • caminanteblanco 13 hours ago

    Here's the attached report, it has a lot of additional helpful information: https://www.ntsb.gov/investigations/Documents/Board%20Summar...

  • fabian2k 16 hours ago

    The big problem was that they didn't have the actual fuel pumps running but were using a different pump that was never intended to fulfill this role. And this pump stays off if the power fails for any reason.

    The bad contact with the wire was just the trigger, that should have been recoverable had the regular fuel pumps been running.

  • buildsjets 16 hours ago

    In a well engineered control system, any single failure will not result in a loss of control over the system.

    Was a FMECA (Failure Mode, Effects, and Criticality Analysis) performed on the design prior to implementation in order to find the single points of failure, and identify and mitigate their system level effects?

    Evidence at hand suggests "No."

    • CGMthrowaway 16 hours ago

      "Catastrophe requires multiple failures – single point failures are not enough. The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners."

      https://how.complexsystems.fail/#3

    • Aurornis 15 hours ago

      > In a well engineered control system, any single failure will not result in a loss of control over the system

      That's true in this case, as well. There was a long cascade of failures including an automatic switchover that had been disabled and set to manual mode.

      The headlines about a loose wire are the media's way of reducing it to an understandable headline.

    • jojobas 15 hours ago

      Most cargo ships have a single main engine with plenty of backup-less failure points. They are sort of engineered so these failures can't happen suddenly but you can help yourself to a bunch of videos on how substandard fuel and parts shortages cause week-long poweroffs in a middle of the ocean.

      • LeifCarrotson 14 hours ago

        System designers and regulators are aware that the main engine is a single point of failure, but they generally consider loss of main engine power to not be an immediate emergency. There are redundant systems to retain electrical and hydraulic power, and losing motive power isn't generally an instant emergency. Power and steering together is an emergency, yes, and steering is degraded without power, but had they still been able to use the rudder they wouldn't have hit the bridge.

        • jojobas 13 hours ago

          Steering without power at 8 knots would be pretty inefficient (and was - they tried to steer as the power came back). Loss of power in ports, narrow straits etc is recognized as a major issue which is why an engineer and ETO must be in the engine control room during such passages.

  • comeonbro 16 hours ago

    A label placed half an inch wrong on misleading affordance -> 200,000 ton bridge collapse, 6 deaths, tens of billions of dollars of economic damage

    Instant classic destined for the engineering-disasters-drilled-into-1st-year-engineers canon (or are the other swiss cheese holes too confounding)

    Where do you think it would fit on the list?

  • jtokoph 16 hours ago

    It’s been noted that automatic failover systems did not kick in due to shortcuts being taken by the company: https://youtu.be/znWl_TuUPp0

  • kylehotchkiss 14 hours ago

    When shipowners are willing to cut costs with sketchy moves like registering with a random landlocked African country, why should we believe they'll spend any time or effort reading/implementing NTSB guidelines? It isn't like there's some well respected international body like ITAO calling the shots

  • bell-cot an hour ago

    Worth noting: The MV Dali is a 1000-foot-long ship, weighing 50% more than a nuclear aircraft carrier, with a total crew of twenty-two.

    That's everybody - captain, bridge crew, deck crew, cook, etc.

    So - how many of those 22 will be your engineering crew? How many of those engineers would be on duty, when this incident happened? And once things start going wrong, and you're sending engineers off to "check why Pump #83, down on Deck H, shows as off-line" or whatever - how many people do you have left in the big, complex engineering control room - trying to figure out what's wrong and fix it, as multiple systems fail, in the maybe 3 1/2 minutes between the first failure and when collision becomes inevitable?

  • dopamean 15 hours ago

    I know a little about planes and nothing about ships so maybe this is crazy but it seems to me that if you're moving something that large there should be redundant systems for steering the thing.

    • gk1 15 hours ago

      There are.[1] Unfortunately they take longer to employ than the crew had time.

      [1] As it happens I open with an anecdote about steering redundancy on ships in this post: https://www.gkogan.co/simple-systems/

      • dopamean 15 hours ago

        Thanks for this comment!

    • cjensen 14 hours ago

      Shipping is a low-margin business. That business structure does not incentivize paying for careful analysis of failure modes.

      Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.

      • protocolture 10 hours ago

        >Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.

        Way it worked in Sydney harbour 20+ years ago when I briefly worked on the wharves/tugs, was that the big ships had to have both local tugs, and a local pilot who would come aboard and run the ship. Which seemed to me to be quite an expensive operation but I honestly cant recall any big nautical disasters in the habour so I guess it works.

      • dboreham 11 hours ago

        > mandatory tug boats

        Which there are in some places. Where I grew up I'd watch the ships sail into and out of the oil and gas terminals, always accompanied by tugs. More than one in case there's a tug failure.

  • ROOFLES 6 hours ago

    Non redundant fuel pump that doesn't even restart on power failure. Main engine shutting of when water pressure drops, backup generator not even starting in time AND shoddy wiring that offlines the whole steering system. Thats what i call GOATED engineering. props to Hyundai HI

  • nacozarina 9 hours ago

    I predicted 10yr & $20B to replace it and stand by that forecast.

  • dboreham 11 hours ago

    My rule for a couple decades: any failover procedure that only gets run when there's a failure, will not work.

  • mberning 12 hours ago

    This is a great example of why “small details” matter. How many times do you think an apprentice has been corrected about this? What percentage of the time does the apprentice say “yeah but it’s just a label”. Lots of things went wrong in this case, but if the person that put the label on that wire did it correctly then this whole catastrophe could have been avoided.

  • 1970-01-01 16 hours ago

    So there were two big failures: Electrician not doing work to code; inspector just checking the box during the final inspection.

    • DannyBee 14 hours ago

      No. Lots more : It's because they were abusing a non-redundant pump to supply fuel to the generators. Which then failed, which ....

      From the report:

      > The low-voltage bus powered the low-voltage switchboard, which supplied power to vessel lighting and other equipment, including steering gear pumps, the fuel oil flushing pump and the main engine cooling water pumps. We found that the loss of power to the low-voltage bus led to a loss of lighting and machinery (the initial underway blackout), including the main engine cooling water pump and the steering gear pumps, resulting in a loss of propulsion and steering.

      ...

      > The second safety concern was the operation of the flushing pump as a service pump for supplying fuel to online diesel generators. The online diesel generators running before the initial underway blackout (diesel generators 3 and 4) depended on the vessel’s flushing pump for pressurized fuel to keep running. The flushing pump, which relied on the low-voltage switchboard for power, was a pump designed for flushing fuel out of fuel piping for maintenance purposes; however, the pump was being utilized as the pump to supply pressurized fuel to diesel generators 3 and 4\. Unlike the supply and booster pumps, which were designed for the purpose of supplying fuel to diesel generators, the flushing pump lacked redundancy. Essentially, there was no secondary pump to take over if the flushing pump turned off or failed. Furthermore, unlike the supply and booster pumps, the flushing pump was not designed to restart automatically after a loss of power. As a result, the flushing pump did not restart after the initial underway blackout and stopped supplying pressurized fuel to the diesel generators 3 and 4, thus causing the second underway blackout (lowvoltage and high-voltage).

    • nightpool 16 hours ago

      No, there was a larger failure: whoever designed the control system such that a single loose wire on a single terminal block (!) could take down the entire steering system for a 91,000 ton ship.

      • DannyBee 14 hours ago

        They didn't.

        If you read the report they were misusing this pump to do fuel supply when it wasn't for that. And it was non redundant when fuel supply pumps are.

        Its like someone repurposing a husky air compressor to power a pneumatic fire suppression system and then saying the issue is someone tripping over the cord and knocking it out.

      • bragr 15 hours ago

        There's a 3rd failure: the failure to install/upgrade dolphins that could deflect a modern containership, despite the identified need for such. That proposed project seems cheap in retrospect.

        • nightpool 15 hours ago

          Yes, 100%. Lots of failures across the board here. Especially with large ships and how many different nations they might be registered in, I can't imagine it's easy to have a lot of regulatory oversight into their construction, mechanical inspection or maintenance schedules. I'm curious how modern ports handle this problem, feels like it could cause a ton of issues beyond just catastrophic ones like this one.

    • IncreasePosts 15 hours ago

      The terminal blocks could also have been designed to aid visual inspection.

  • jojobas 16 hours ago

    "Contact" is a weird choice of words.

    • nocoiner 14 hours ago

      Yeah, when the word “allision” was right there!

    • crote 15 hours ago

      Not really, because that's where that part of the investigation ends.

      Pre-contact everything is about the ship and why it hit anything, post-contact everything is about the bridge and why it collapsed. The ship part of the investigation wouldn't look significantly different if the bridge had remained (mostly) intact, or if the ship had run aground inside the harbor instead.

    • analog31 13 hours ago

      Reminds me of "fetched up" describing what happened to the Exxon Valdez.

    • charles_f 15 hours ago

      Thought the same, bridge is fallen on its entire length, sounds like a way to undersell it. Such an opportunity to pass on clickbait is interesting in this day and age.

      • dhosek 15 hours ago

        I’m not sure that the NTSB is really in the clickbait business. But yes, contact does seem to really be underselling the event.

    • ErroneousBosh 13 hours ago

      Right? Like when I read that I thought we're talking a little paint-swapping.

      No, we are not talking a little paint-swapping.

  • ocdtrekkie 15 hours ago

    "and WAGO Corporation, the electrical component manufacturer"

    Sucks to be any of the YouTubers influencers today telling everyone they should use WAGO connectors in all their walls.

    Seriously though, impressive to trace the issue down this closely. I am at best an amateur DIY electrician, but I am always super careful about the quality of each connection.

    • Polizeiposaune 14 hours ago

      The WAGO connectors typically used in home wiring have a transparent plastic shell which lets you see whether the wire made it all the way through the spring clip. The ones shown in the NTSB video had an opaque shell around the spring clip.

      • ocdtrekkie 13 hours ago

        I think my attempt at humor butthurt a lot of WAGO fans. I used "seriously though" after in my actual... serious comment.

    • rootusrootus 15 hours ago

      I don't see anything in the report that suggests the connector failed. It sounds like the installer failed. Trust me, they can screw up twist connections too :)

  • gishh 16 hours ago

    The date for bridge completion was bumped from 2028 to 2030 already. I assume it won't be done until 2038. It is absolutely murdering traffic in the Baltimore area, not having a bridge. I would be super interested in seeing where every single dollar goes for this project, I assume at least 1/3 of it will be skimmed off the top.

    • gishh 14 hours ago

      The consensus seems to be skimming won’t occur. I’d encourage people to research the corruption of elected officials in the Baltimore area.

      • tgv 33 minutes ago

        The consensus is that your comment is way off-topic.

  • tonymet 13 hours ago

    The older I get , the more I trust people over rules.

    • fghorow 2 hours ago

      Does this comment apply to the current crop of American politicians? (Just curious.)