TSMC Arizona outage saw fab halt, Apple wafers scrapped

(culpium.com)

181 points | by speckx 9 hours ago ago

72 comments

  • Animats 7 hours ago

    This was apparently a Linde installation custom built for TSMC in Arizona.[1] Nitrogen, oxygen, and argon are extracted from air on-site and purified. That's Linde's primary business; liquefying and distilling air. This isn't some little local company or a company operating outside their area of expertise.

    Those gases are storeable, so it's surprising there wasn't enough tank capacity to deal with outages.

    The site plan [2] shows "Gas Plant 1", and future "Gas Plant 2" and "Gas Plant 3". The gas plants are across a small road from the fab and feed the plant directly. Once Gas Plants 2 and 3 were built, there would be redundancy, but at this stage, there isn't a backup. The plan doesn't show a large tank farm, so they can't store gases in bulk.

    [1] https://www.aztechcouncil.org/utility-company-makes-progress...

    [2] https://semiwiki.com/forum/threads/tsmc-phoenix-arizona-fab-...

    • sevensor 4 hours ago

      Unless things have changed a lot since I fled semiconductor manufacturing, you would still need silane tanks at least. I’m as surprised as you that they don’t have buffer tanks.

      • cosmic_quanta an hour ago

        Why did you use the term "fled"? Any interesting story?

        From the outside, I would love to participate in semiconductor manufacturing.

    • NoiseBert69 6 hours ago

      Linde is huge. They can produce and offer all gases in all available purity classes.

      • astrange 5 hours ago

        Is there some reason you'd want gasses in a lower purity class?

        (Well, it's cheaper.)

        • rpmisms 5 hours ago

          Fire suppression, welding gases, etc.

      • jack_tripper 6 hours ago

        Is this an ad?

        • RealityVoid 6 hours ago

          I don't think Linde™ needs an add, everyone knows Linde™ is the most reliable partner in producing and storing gases in all available purity classes.

          (joke off, it's probably not an add, but they were excited to share the reason you see Linde on all sorts of gas tanks all over the place. It's actually quite common and if you see it once you see it everywhere. )

          • wiredpancake 3 hours ago

            I've never seen Linde ever in my life (Australia).

            What is funny though, at least in the Australia and UK regions, they still use the BOC brand, which is a subsidiary under Linde.

            • nandomrumber 10 minutes ago

              Competitors in the space in Australia include Air Liquide and Supagas.

              Supagas tend to have better prices for smaller operators, and hobbyists.

        • nebula8804 5 hours ago

          Possibly but also plausible is that its a deep joke that everyone is in on.

          When googling the company, the marketing slogan that comes up is "Linde is Everywhere" but that works on so many levels. They sell air, air is everywhere. Therefore Linde is everywhere.

          They are a company that sells air: that stuff that people breathe. Forget this AI nonsense. Jensen has to constantly pull something out of rear to keep food on the table. These guys sell air. What a business. :)

    • ErroneousBosh 4 hours ago

      > Those gases are storeable, so it's surprising there wasn't enough tank capacity to deal with outages.

      It probably depends on the duration of the outage. I'd expect they have some storage, and if they plan on having the compressor plant down for longer that that can manage they'll bring in tanks.

  • angelgonzales 8 hours ago

    This isn’t very big news. Issues occur during bring-up often. Linde’s processes are possibly so power intensive that failing over to generator power is not possible. TSMC is right to put Linde on notice since Linde should have a PFMEA and control plan to eliminate any root causes for downtime. I suspect in the long term TSMC has plans to insource this if the issue persists. Scrap happens sometimes during manufacturing, if the writer only has journalism experience and no manufacturing experience then they may not have a conceptual understanding of acceptable first pass yield. After all, the TSMC logo features failing parts!

    • FaradayRotation 7 hours ago

      In many ways I agree with you, but the problem statement (constrained/exhausted gas supply from vendor) makes it seems like this was not just line down, but the whole factory stopped for a few hours. Line down is a miserable migrane but still managable... while a whole factory stoppage makes a lobotomy seem like a good idea. It also sounds like there was not enough forewarning to park critical customer wafers in a "safe" stage of the process.

      Even so, I also would still call this another monday at a semiconductor factory. Welcome! Here we play a nearly endless game of whack-a-mole. Here's your mallet and your towel. Now whack enough of the moles hard enough until they stop coming back (at least through the same holes). Beware the alpha moles.

      By any road, I am surprised to see even this high-level perspective on a quality event disclosed to the mainstream public; I thought this was not standard practice. I enjoyed the read.

    • throwaway2037 4 hours ago

          > This isn’t very big news.
      
      The opening paragraph feels a bit pearl clutching to me.

          > the company had to scrap thousands of wafers that were in production for clients at the site which include Apple, Nvidia, and AMD.
      
      Eh. So what? I am sure scrap thousands of wafers for all kinds of other reasons. I would be better to know the cost per hour of a total plant shutdown. (Of course, I'm sure the author doesn't have this information.)

          > After all, the TSMC logo features failing parts!
      
      Final hat tip here. I never knew that.
    • nutjob2 2 hours ago

      > After all, the TSMC logo features failing parts!

      I'm not sure about that, I think the blank spaces are just parts that have been picked. The dies have been cut and the good ones are being removed.

  • bob1029 8 hours ago

    > forcing the facility to shut down for at least a few hours

    > As a result, the company had to scrap thousands of wafers

    Anything involving wet chemistry, photoresist, furnaces, etc. is very time-constrained. You can't let wafers sit around indefinitely. Certain process steps must be followed up very quickly to avoid scrap.

    This is why you dont see redundant power for manufacturing lines. A 3nm line needs hundreds of megawatts to operate. You cant clear queued lots without a fully functional line. There's not much you could save by keeping part of the line operational.

    • tantalor 8 hours ago

      Good idea for a Factorio mod?

      A new failure mode resets output progress back to zero if you lose power or some other input while crafting.

      You could design circuit networks to cut power to non-essential systems so the rest of the factory can keep producing.

      • tetha 8 hours ago

        Some mods in modded minecraft had that and it's a very punishing mechanic unless implemented well.

        It eats all of your power and usually also very expensive items very quickly usually. Assume you have like 600RF/tick generated, common with certain generator constellations. 1 tick - 1024 RF and one input consumed, crafting fails due to not enough power. 1 tick wait, 1 tick, 1024 RF and one input consumed, ... This can void 10+ items / second, which can hurt very badly. Even for common items in fact.

        It also tends to kick you while you're down, because it only kicks in if everything else is already failing. Then the only thing to continue functioning is the thing voiding your energy and your expensive items. Or even worse, if you did one miscalculation about your power grid, and then all of your resources are gone, often before you can react.

        It can be interesting in the right packs, but it is Gregtech level hardness.

        • bombcar 7 hours ago

          GT:NH has “easy mode” enabled in some regards - it won’t finish the craft but it WILL wait for power (actually keep trying) - so if you fix the power problems you can finish and not lose the mats.

          May or may not apply to multi blocks.

          • mjevans 2 hours ago

            I didn't finish GT:NH but if I ever set aside enough time to play a GregTech build again, GTNH is on the very short list.

          • DanHulton an hour ago

            Multiblocks power fail and void, but then your machine shuts down until you restart it. This is much better than suggested above, where you'd void over and over, but it can still utterly mess up a large craft being orchestrated thru AE2, which is still waiting forever for he failed craft to submit a part back into the system.

        • hofrogs 7 hours ago

          GregTech doesn't use RF though, at least it didn't. Machines pull packets of amps through the wires from the generators/batteries, the whole system is pretty interesting. Also high-level circuits have to be manufactured in cleanrooms with a pretty complex tech chain.

          • tetha 7 hours ago

            Oh GTs power is absolutely not RF. Back in the day, even GTs power could be cruel though. You could over-volt your machines and thus void machines you spend literal days on crafting. And the cables in the process too. And you could lose your entire infrastructure once it rained and you had no roof :)

          • squigz 7 hours ago

            I think GP just used RF as an example and was only referring to GT as a comparison in difficulty.

            GT's system of only pulling power on-demand is very nice though; no wasting fuel

        • squigz 7 hours ago

          Out of curiosity, which mods are this cruel? I've been playing GT (modern) lately and even it doesn't void your machine's items unless you break the machine itself.

          • tetha 7 hours ago

            Oh this was in the days of yore of modded 1.4 and early 1.7. I don't remember specific mods, I just remember the pain and frustration of this happening.

            I'm currently playing Stoneblock 4 and have been playing GT:NH and Nomifactory some time ago, and the more modern mods have learned a lot from those old janky things. Heck, back in the day every mod had a different power system and you needed a nonsense amount of conversion infrastructure, unless the modpack did a lot of work to combine all of this somehow, haha.

      • rtkwe 7 hours ago

        Power brownouts are pretty rare outside of the very early game. It's too easy and cheap to massively over produce power for that to really harm players outside the early game so I don't think there'd be much interest. Usually brownouts rapidly develop into full blown blackouts and black restarts as your miners reduce output during the brownout often leading to a reduction in incoming fuel leading to even less power being generated in a self consuming cycle.

        • tantalor 6 hours ago

          I would apply it to inputs as well.

          Suppose you can start production with only 1 of each input required for a recipe, but to keep it going you need to keep feeding all of the inputs to finish it. If any of them run out, then the recipe fails, you lose the inputs, and the machine stalls.

          This works better for high latency recipes (>10s) with lots of inputs, like low density structure, modules, and atomic bombs.

          • rtkwe an hour ago

            Usually the answer is to just slightly overproduce the inputs, only the new planet Gleba even slightly discourages letting items just sit on the conveyors with their freshness mechanic. What's the benefit?

      • sidewndr46 8 hours ago

        Isn't that what spoilage is?

        • helpfulclippy 6 hours ago

          I thought of spoilage as a mechanic that punishes overproduction.

          • blmarket 5 hours ago

            It's a constraint to process item within limited time (regardless of overproduction or power outage). Matching with the problem description.

            Surely the reality might be much more complex (like... the yield/quality drop by time function?)

      • dylan604 7 hours ago

        Mindustry has something similar with pumping various gasses/liquids through plumbing. If you accidentally mix them while building new lines, things stop working when your gases get mixed up forcing you to purge the line.

      • ActorNightly 8 hours ago

        Someone needs to make the whole chip manufacturing process into a factorio like game and let the gamers optimize it, then build the factories around that.

        • gnatman 5 hours ago

          Like Ender’s Game but instead of intergalactic shooting war it’s international chip war.

    • j_walter 6 hours ago

      TSMC has backup generators in their AZ fab. You actually have to have backup power or a few hundred millisecond blip could cause days or weeks of tool down time. You should see what happens when you lose the ability to keep a clean room at temp/humidity/airflow...it's weeks or months.

    • sevensor 7 hours ago

      It didn’t happen, but the facilities team at the fab where I worked was seriously considering installing a flywheel to cover power bumps. What I don’t get about this story is how this actually happened. All our process gasses were out in a tank farm and we knew how much pressure we had. We would have stopped the line if there wasn’t enough to proceed. Were they separating air onsite or something?

      • jaggederest 5 hours ago

        I was very impressed by the modest little fab I worked at having thousands of lead acid batteries for momentary takeover, and 8 five-megawatt locomotive engines for longer term redundancy. Apparently their steady state usage was 25MW, which allowed still having a hot spare and concurrent downtime for two of the locomotive generator units.

      • bobmcnamara 6 hours ago

        Yes, Linde has an onsite plant and is building two more.

        For some processes, stopping will botch the wafer. In the event of a gas shortage, do plants plan which lines to take down first, and which lines should complete a process step?

        • sevensor 6 hours ago

          The way this worked at the fab where I was, was that facilities would have paged everybody, and whoever needed to hold wafers would do so. You could mark your equipment down or unavailable for a particular step. I don’t know what we would have done if it was “hey, we lost dry nitrogen a minute ago.” I think at that point you lose a lot of wafers in wet cleans.

          In the case of a power interruption at the fab, consequences were highly dependent on the equipment and the unit process. A prolonged power interruption to diffusion was the worst case scenario. You’d have 150 wafers in the furnace, and any significant deviation from the nominal temperature profile meant they were all scrap. Worse, if the furnace cooled off, you had to scrap the quartz boat the wafers rode in, too. Other processes had a smaller blast radius but were even more of a headache to disposition. Implant, you’d lose beam and probably lose vacuum too. Then the wafer in the chamber would be dusted and in an indeterminate state, and the rest of the wafers you’d have to sleuth out whether they were implanted or not. Sometimes you’d have a lot sitting in the end station and it wouldn’t be clear whether or not it had been run at all. At least in photolithography you could tell whether or not a wafer was patterned by looking at it.

    • Kye 7 hours ago

      A video showing those steps, for the curious: https://www.youtube.com/watch?v=dX9CGRZwD-w

      It's probably not 100% identical to TSMC's process.

  • agentifysh 7 hours ago

    seems like what is often downplayed or silent on American media is the cultural mismatch between TSMC taiwanese engineers and their american counterparts

    so it always comes to those out of the loop as a bit of a surprise but from what I've read from individual Taiwanese workers and their feedback its clear that there is significant regret from one side.

    and it doesn't seem to limited to just TSMC but another large company as of recent that receive icey reception for their large investment in America manufacturing.

    i think this is a big reason why lot of these jobs simply wouldn't stay in america as the consumer would not be able to foot the costs added by "cultural premium" faster than what innovation can reduce.

    • itake 7 hours ago

      Perhaps if the US workers earned OT like TW workers do, the "culture" gap would shrink.

      • j_walter 6 hours ago

        TW workers have a majority of their compensation in bonuses, so the OT portion is quite small and many do not even bother to ask for it. The overall compensation between a TW and US engineer at TSMC is also significant. Not to mention the lowest paid hourly workers...where in TW they make 2-3X minimum wage, but in the US it's like 1.25X.

        • itake 4 hours ago

          That is not what I heard from my cousin at TSMC. The OT gives the workers a “living wage”. Most of his coworkers charge OT every week of the year.

          He admitted, even with their OT and bonuses, he probably makes more than them w2 salaries.

          But my point still remains: if they want US (or TW) folks to work more hours, they need to pay for those hours.

          • j_walter 3 hours ago

            https://www.reddit.com/r/Semiconductors/comments/18x5vr5/sem...

            This reddit post captures what I've seen at TSMC in Taiwan. $120K is normal pay at the director level...engineers make $2500-5000 a month. TSMC AZ starting pay for a new college grad w/ BS is probably just under $100K/year with just salary, with the potential to make over $120K within a few years with full vested bonuses.

      • limagnolia 7 hours ago

        Is OT "overtime"? How is it legal not to pay overtime in any US factory? Unless they are salaried (exempt)?

        • itake 6 hours ago

          yeah, overtime. My cousin is an engineer at TSMC (who worked both in Tainin and now in Arizona) and is w-2 exempt.

        • TimorousBestie 6 hours ago

          A lot of the workers there probably are exempt under American law.

          I’m not an expert on Taiwanese labor laws but their list of exempt labor categories in the LSA is much shorter than the one in the American FLSA.

      • lazide 6 hours ago

        Hahaha. The work culture between TW and the US is night and day - and it isn’t flattering for the US.

        • bnjms 5 hours ago

          How so? Are the Americans relatively lazy or just unwilling to put in tedious but necessary extended hours?

          • lazide 4 hours ago

            The entire approach is different. Especially with Taiwanese engineers, their entire focus is whatever work they are doing. Everything else (quite literally), their wives handle.

            Americans typically ask for things like work life balance, non abusive working hours, etc. they also don’t (anymore) have the type of family life setup that allows them to actually focus so much - being pulled into child care duties, or taking care of family members, or whatever their next vacation should be, etc.

            The general attitude is also more ‘yeah whatever’ to some extent.

            The amount of singular obsessive engineering you get out of one vs the other is hard to compare.

            • agentifysh 2 hours ago

              hmmm this is interesting I was always the impression Taiwanese wives were more progressive and men had to do lot more lifting vs other regional cultures in east asia

              my original thinking after reading some of the anecdotes from TSMC engineers is that they were obsessively dedicated which means extreme hours from North American culture

              its also the same in places like Samsung where the company treats employers very well with perks and long career stability but its not free always requires huge sacrifice I'd imagine similar to Japanese conglomerates.

              I'm not sure which is better in America its definitely transactional relationship but it also comes with stability issues relatively compared to what these East Asian giants offer but at the cost of not being able to switch if and when you find yourselves at odds.

              Not sure what it was like at Nokia but also another conglomerate that ultimately folded under competition and also a country with more stringent labor/life constraints that you would find less enforced in East Asia.

              Getting a bit distracted here but noting how much culture plays a role in these large companies and their management styles.

    • coredog64 7 hours ago

      There are like half a dozen semiconductor manufacturers in Phoenix that were here before TSMC arrived. There's a robust pipeline from ASU to these same manufacturers. Can we please just stop with the nonsensical notion that "Americans don't know how to fabricate semiconductors"?

      • barkingcat 6 hours ago

        American economics doesn't allow fabrication of semiconductors even if there is the know how.

        Think about how Intel, who pioneered the know how, can't build cutting edge nodes in the levels that they need to make it profitable.

        IBM had to sell their fabs to cater to the whims of "shareholders".

        It's the greed of stockholders that you need to blame.

        • astrange 5 hours ago

          TSMC is a publicly traded company just like the others. I'm not familiar with their governance but Google tells me the largest owner (a state development fund) has 6%.

          They have a special advantage because they don't compete with their customers, which leads to trust, which leads to customers paying for their R&D for them.

          Intel on the other hand just kind of sucks at their job. Skill issue basically. (But they aren't /that/ far behind.)

      • itake 6 hours ago

        its not that the USA can't produce semi-conductors. Its that semi-conductor production, at TSMC's scale (both in terms of number of units, yield rates, and depth) currently requires highly skilled workers to work a lot of their hours to "baby sit" the wafer production.

        Maybe there is a world where TSMC can hire enough skilled workers and optimize processes enabling people to go home at 5p, but that is not currently the case.

        • fy20 3 hours ago

          We won't be seeing a TSMC plant in France anytime soon then.

        • lysace 6 hours ago

          Yes. This. So, yeah, essentially fundamentally incompatible with the US economy.

          The US is going to have to heavily subsidize the payroll of tens of thousands of very accomplished EEs/etc to make this work. By doing that they will also wreck the HW part of SV.

          • astrange 5 hours ago

            There isn't really a HW part of SV. Hardware engineers aren't paid well enough to live there in droves like programmers. There are some of course, but the ones I know are in San Diego or Bremerton or Israel.

            Also, it's completely normal to run a factory 24/7. I think people are just impressed because TSMC is the only one they've read about?

            (However, it's correct that a TSMC fab is the most advanced and complicated process on the planet.)

            • lysace 4 hours ago

              Nvidia, AMD, Intel and Applied Materials probably employ like 100k people in SV?

          • jwagenet 3 hours ago

            SV already wrecked HW engineering by paying far more for SW than market rate HW such that anyone with financial ambition made the switch long ago.

    • lysace 6 hours ago

      Spell it out. WTF.

  • perihelions 7 hours ago

    Some HN threads about the past incidents mentioned in OP,

    https://news.ycombinator.com/item?id=17686310 ("Computer Virus Cripples Several Taiwan Semiconductor Plants (bloomberg.com)"—2018, 100 comments)

    https://news.ycombinator.com/item?id=19214952 ("TSMC's Photoresist Material Incident: $550M Loss (anandtech.com)"—2019, 15 comments)

  • behnamoh 7 hours ago

    Does it further delay the launch of M5 Max/Ultra?

  • taurath 8 hours ago

    In September

    • joecool1029 8 hours ago

      But first being reported now: It was only speculation on the financial reports before this. How quickly do they normally report disruptions like this?

      I wouldn't think it would have to be too quickly since I've heard about fab disruptions from fires and such since the early 2000's. Probably just sometime after quarterly reporting to set the record straight? Why not in the report?

      • samus 7 hours ago

        I also had the impression from the report that shareholders were miffed about this Q3 snag, so they had to publish this even though they were about to treat this as business as usual.

  • caycep 7 hours ago

    the rebels have hit the tibonna gas supply I see

  • richisferezs 7 hours ago

    guys is it me or sonnet 4.5 just became like 10x worst ?