Trillium TPU Is GA

(cloud.google.com)

178 points | by gok 7 months ago ago

84 comments

xnx 7 months ago

> We used Trillium TPUs to train the new Gemini 2.0,

Wow. I knew custom Google silicon was used for inference, but I didn't realize it was used for training too. Does this mean Google is free of dependence on Nvidia GPUs? That would be a huge advantage over AI competitors.

[-]

dekhn 7 months ago

Google silicon TPUs have been used for training for at least 5 years, probably more (I think it's 10 years). They do not depend on nvidia gpus for the majority of their projects. Took TPUs a while to catch up on some details, like sparsity.

[-]

summerlight 7 months ago

This aligns with my knowledge. I don't know much about LLM, but TPU has been used for training deep prediction models in ads at least from 2018, though there were some gap filled by CPU/GPU for a while. Nowaday, TPU capacity is probably more than the combination of CPU/GPU.

[-]

felarof 7 months ago

+1, almost all (if not all) Google training runs on TPU. They don't use NVIDIA GPUs at all.

[-]

dekhn 7 months ago

at some point some researchers were begging for GPUs... mainly for sparse work. I think that's why sparsecore was added to TPU (https://cloud.google.com/tpu/docs/system-architecture-tpu-vm...) in v4. I think at this point with their tech turnaround time they can catch up as competitors add new features and researchers want to use them.

[-]

felarof 7 months ago

dumb question: wdym by sparse work? Is it embedding lookups?

(TPUs have had BarnaCore for efficient embedding lookups since TPU v3)

[-]

dekhn 7 months ago

Mostly embedding, but IIRC DeepMind RL made use of sparsity- basically, huge matrices with only a few non-zero elements.

BarnaCore existed and was used, but was tailored mostly for embeddings. BTW, IIRC they were called that because they were added "like a barnacle hanging off the side".

The evolution of TPU has been interesting to watch; I came from the HPC and supercomputing space, and seeing Google as mostly-CPU for the longest time, and then finally learning how to build "supercomputers" over a decade+ (gradually adding many features that classical supercomputers have long had), was a very interesting process. Some very expensive mistakes along the way. But now they've paid down almost all the expensive up-front costs and can now ride on the margins, adding new bits and pieces while increasing the clocks and capacities on a cadence.

amelius 7 months ago

Do they have the equivalent of CUDA, and what is it called?

[-]

dekhn 7 months ago

Not exactly, although CUDA is a huge topic. See https://gist.github.com/shawwn/0e524d4a7a5d8fb152a86616559cc... for some description of the process. Basically, jax or other program is converted to XLA, https://opensource.googleblog.com/2023/05/pjrt-simplifying-m... then lowered to the specific architecture (which coudl be CPU, GPU, or TPU). Last time I looked, it was a horribly complicated stack with many parts changing rapidly, although with the switch to jax, things got cleaned up a bit. My personal favorite bits are the lower levels of jax, xla, and pjrt.

rajnathani 7 months ago

Is sparsity used for training though? I thought it was an inference-only thing for both GPUs and TPUs. Furthermore, I think TPUs have not been able to support training models like GANs, etc. I'm not sure if that has changed now.

Permit 7 months ago

My understanding is that the Trillium TPU was primarily targeted at inference (so it’s surprising to see it was used to train Gemini 2.0) but other generations of TPUs have targeted training. For example the chip prior to this one is called TPU v5p and was targeted toward training.

ec109685 7 months ago

Apple trained their LLM on Google TPU’s: https://www.theregister.com/2024/07/30/apple_google_tpu_ai/

lern_too_spel 7 months ago

Since TPUv2, announced in 2017: https://arstechnica.com/information-technology/2017/05/googl...

The superscalers are all working on this. https://aws.amazon.com/ai/machine-learning/trainium/

felarof 7 months ago

TPUs have been used for training since a long time.

(PS: we are startup trying to make TPUs more accessible, if you wanna fine-tune Llama3 on TPU check out https://github.com/felafax/felafax)

drusepth 7 months ago

Why is that a huge advantage over AI competitors? Just not having to fight for limited Nvidia supply?

[-]

aseipp 7 months ago

That is one factor, but another is total cost of ownership. At large scales something that's 1/2 the overall speed but 1/3rd the total cost is still a net win by a large margin. This is one of the reasons why every major hyperscaler is, to some extent, developing their own hardware e.g. Meta, who famously have an insane amount of Nvidia GPUs.

Of course this does not mean their models will necessarily be proportionally better, nor does it mean Google won't buy GPUs for other reasons (like providing them to customers on Google Cloud.)

badlucklottery 7 months ago

Vertical integration.

Nvidia is making big bucks "selling shovels in a gold rush". Google has made their own shovel factory and they can avoid paying Nvidia's margins.

bufferoverflow 7 months ago

TPUs are cheaper and faster than GPUs. But it's custom silicon. Which means barrier to entry is very very high.

[-]

felarof 7 months ago

> Which means barrier to entry is very very high.

+1 on this. The tooling to use TPUs still needs more work. But we are betting on building this tooling and unlocking these ASIC chips (https://github.com/felafax/felafax).

imtringued 7 months ago

Nah, just take a look at their own comparison page:

https://cloud.google.com/tpu/docs/v6e

The A100 (from 2020) had 300 TFLOPs (1.5x) and 80GB HBM. It's only now with Trillium that they are starting to actually beat Nvidia in a chip vs chip battle, but being faster per chip was never the point. TPUs were supposed to be mass produced and connected into pods so that they become cheaper and they indeed are.

[-]

hustwindmaple1 7 months ago

Exactly. Kind of like the old days when they put together a massive amount of commodity CPUs to build search.

ein0p 7 months ago

They are cheaper, yes. But GPUs are faster and easier to program.

xnx 7 months ago

Yes. And cheaper operating cost per TFLOP.

m3kw9 7 months ago

Maybe only for their own models

[-]

walterbell 7 months ago

Now any Google customer can use Trillium for training any model?

[-]

richards 7 months ago

[Google employee] Yes, you can use TPUs in Compute Engine and GKE, among other places, for whatever you'd like. I just checked and the v6 are available.

[-]

KaoruAoiShiho 7 months ago

Is there not goin to be a v6p?

[-]

richards 7 months ago

Can't speculate on futures, but here's the current version log ... https://cloud.google.com/tpu/docs/system-architecture-tpu-vm...

xnx 7 months ago

Google trained Llama-2-70B on Trillium chips

[-]

monocasa 7 months ago

I thought llama was trained by meta.

DrBenCarson 7 months ago

> Google trained Llama

Source? This would make quite the splash in the market

[-]

xnx 7 months ago

It's in the article: "When training the Llama-2-70B model, our tests demonstrate that Trillium achieves near-linear scaling from a 4-slice Trillium-256 chip pod to a 36-slice Trillium-256 chip pod at a 99% scaling efficiency."

[-]

llm_nerd 7 months ago

I'm pretty sure they're doing fine-tune training, using Llama because it is a widely known and available sample. They used SDXL elsewhere for the same reason.

Llama 2 was released well over a year ago and was training between Meta and Microsoft.

[-]

hhh 7 months ago

They can just train another one.

[-]

llm_nerd 7 months ago

Llama 2 end weights are public. The data used to train it, or even the process used to train it, are not. Google can't just train another Llama 2 from scratch.

They could train something similar, but it'd be super weird if they called it Llama 2. They could call it something like "Gemini", or if it's open weights, "Gemma".

[-]

lern_too_spel 7 months ago

The article says they used maxtext to load the weights and pretrain on additional data. It looks like the instructions for doing that are here: https://github.com/AI-Hypercomputer/maxtext/blob/main/gettin...

ein0p 7 months ago

They don't mean literally LLaMA. They mean a model with the same architecture.

lanthissa 7 months ago

Okay I really dont understand this, nvidia has a 3.4T market cap google has a 2.4T post run up, and its PE is like 38 vs 25 so its a higher multiple on the business too. It appears making the best AI chip is a better business than googles entire conglomerate.

If TPU's are really that good why on earth would google not sell them. People say its better to rent, but how can that be true when you look at the value of nvidia.

[-]

LittleTimothy 7 months ago

I don't know why you'd be confused by the valuations, they're just 2 different businesses. Google is an ad sales business with an extremely strong revenue driver (youtube and Search) so it churns out money but it's pretty well understood revenue stream and is facing some possible disruption. Nvidia is a chip sales business and whether through luck or through skill (or both) they've essentially ended up as the monopoly provider of the worlds best chips. It's not unreasonable to expect Nvidia can squeeze a tonne of cash out of their customers for the next few years in an undisruptable way. So it's easy to understand the valuations.

On the GPU side - One way to look at it is lets just ignore whether you're selling chips. Let's just look at where compute is done - because Google let's you rent time on the TPUs so if they're really great people will just choose to rent time on them. How much time is being rented on TPUs in the cloud vs. Nvidia GPUs? Google's TPUs are not a market leader, they're probably not even significant market share. Pretend Google decided to sell TPUs tomorrow - are many people who weren't choosing to rent TPUs going to decide to buy them? No. There's an element of interoperability and supply chain etc. etc. but I think the core of it is a full ML product the TPU isn't anywhere close to the Nvidia GPUs.

[-]

lanthissa 7 months ago

I'd be confused by them choosing to not enter hte more lucrative business if they had the product to do so.

The only way the current situation makes sense is if your last sentence is true, which is what i'm getting at. Either TPUs are significantly behind Nvidia chips, or google is choosing not to add like a trillion dollars of market cap to their business its really one or the other.

VirusNewbie 7 months ago

From my understanding TPUs are in so much demand it’s hard to even get access to to the best ones on GCP. It’s a capacity issue more than a popularity one.

theptip 7 months ago

As a modeling exercise: this isn’t necessarily inconsistent.

Right now if Google can earn more from using their TPUs than they would selling them (ie profitably utilize them) they would be crazy to sell them.

Companies are valued not on their present value, but the net present value of their future earnings. So the big multiple implies that there is much more revenue growth potential here. For example, NVidia has Cuda, Google doesn’t have anything as good for third party users. Maybe the market is pricing in that Google doesn’t have a good path to manufacture and sell these chips externally, while NVidia has room to grow into every 1GW datacenter that’s built over the next few decades.

[-]

evanjrowley 7 months ago

>Right now if Google can earn more from using their TPUs than they would selling them (ie profitably utilize them) they would be crazy to sell them.

Reminds me of what happened with Bitcoin mining ASICs. Everyone capable of making them realized they could make more money using them directly vs. selling them to others.

hn3er1q 7 months ago

> If TPU's are really that good why on earth would google not sell them.

nVidia sells nearly 4M GPUs per year. Google claims like 100K TPUs. Scaling a production line by 10x is very difficult and Google has not shown aptitude in this area of expertise.

Even if Google wanted to scale 10x I'm not sure they could. nVidia is believed to be taking like half of TSMC's new capacity (existing capacity is not idle). I suppose technically that means the other half could be consumed by Google but it's likely TSMCs other customers wouldn't appreciate that.

shaism 7 months ago

Maybe the market hasn’t recognized the value yet.

Hence, buy $GOOG.

blackeyeblitzar 7 months ago

So Google has Trillium, Amazon has Trainium, Apple is working on a custom chip with Broadcom, etc. Nvidia’s moat doesn’t seem that big.

Plus big tech companies have the data and customers and will probably be the only surviving big AI training companies. I doubt startups can survive this game - they can’t afford the chips, can’t build their own, don’t have existing products to leech data off of, and don’t have control over distribution channels like OS or app stores

[-]

talldayo 7 months ago

> Nvidia’s moat doesn’t seem that big.

Well, look at it this way. Nvidia played their cards so well that their competitors had to invent entirely new product categories to supplant their demand for Nvidia hardware. This new hardware isn't even reprising the role of CUDA, just the subset of tensor operations that are used for training and AI inference. If demand for training and inference wanes, these hardware investments will be almost entirely wasted.

Nvidia's core competencies - scaling hardware up and down, providing good software interfaces and selling direct to consumer are not really assailed at all. The big lesson Nvidia is giving to the industry is that you should invest in complex GPU architectures and write the software to support it. Currently the industry is trying it's hardest to reject that philosophy, and only time will tell if they're correct.

[-]

felarof 7 months ago

> CUDA, just the subset of tensor operations that are used for training and AI inference. If demand for training and inference wanes

Interesting take, but why would demand for training and inference wade? This seems like a very contrarian take.

[-]

talldayo 7 months ago

Maybe it won't - I say "time will tell" because we really do not know how much LLMs will be demanded in 10 years. Nvidia's stock skyrocketed because they were incidentally prepared for an enormous increase in demand the moment it happened. Now that expectations are cooling down and Sam Altman is signalling that AGI is a long ways off, the math that justified designing NPU/TPU hardware in-house might not add up anymore. Even if you believe in the tech itself, the hype is cooling and the do-or-die moment is rapidly approaching.

My overall point is that I think Nvidia played smartly from the start. They could derive profit from any sufficiently large niche their competitors were too afraid to exploit, and general purpose GPU compute was the perfect investment. With AMD, Apple and the rest of the industry focusing on simpler GPUs, Nvidia was given an empty soapbox to market CUDA with. The big question is whether demand for CUDA can be supplanted with application-specific accelerators.

[-]

felarof 7 months ago

> The big question is whether demand for CUDA can be supplanted with application-specific accelerators.

At least for AI workloads, Google's XLA compiler and the JAX ML framework have reduced the need for something like CUDA.

There are two main ways to train ML models today:

1) Kernel-heavy approach: This is where frameworks like PyTorch are used, and developers write custom kernels (using Triton or CUDA) to speed up certain ops.

2) Compiler-heavy approach: This uses tools like XLA, which apply techniques like op fusion and compiler optimizations to automatically generate fast, low-level code.

NVIDIA's CUDA is a major strength in the first approach. But if the second approach gains more traction, NVIDIA’s advantage might not be as important.

And I think the second approach has a strong chance of succeeding, given that two massive companies—Google (TPUs) and Amazon (Trainium)—are heavily investing in it.

(PS: I'm also bit biased towards approach 2), we build llama3 fine-tuning on TPU https://github.com/felafax/felafax)

[-]

ein0p 7 months ago

Not really, no. Over the past several years, JAX was used in only 3% of top publications. PyTorch in 60%. There's no trend to suggest that JAX has "reduced the need" for anything, except for Google itself. https://paperswithcode.com/trends

[-]

orf 7 months ago

Not sure if publications are a good proxy for this - there’s definitely a selection bias there, and inertia.

[-]

ein0p 7 months ago

Feel free to suggest a more reliable proxy if you don't like this one. What this means is your chance to find a person who knows/has experience with JAX are very low.

01100011 7 months ago

It's weird to me that folks think NVDA is just sitting there, waiting for everyone to take their lunch. Yes, I'm totally sure NVDA is completely blind to competition and has chosen to sit on their cash rather than develop alternatives...</s>

QuadmasterXLII 7 months ago

Nvidia’s hardware optimized for gaming was the best crypto miner out of the gate, although it was eventually surpassed for crypto mining by asics. NVIDIA’s cards which (by market share) were mostly sold as crypto miners, were nonetheless the best llm accelerators out of the gate. When in three years the best selling compute task is suddenly fleeple inverting, 900 lines of CUDA will turn Nvidia’s LLM cards into the worlds fastest fleeple inverters, while TPUs will continue to be very good token predictors

tada131 7 months ago

> If demand for training and inference wanes, these hardware investments will be almost entirely wasted

Nvidia also need to invent smth then, as pumping mining (or giving good to gamers) again is not sexy. What's next? Will we finally compute for drug development and achieve just as great results as with chatbots?

[-]

talldayo 7 months ago

> Nvidia also need to invent smth then, as pumping mining (or giving good to gamers) again is not sexy.

They do! Their research page is well worth checking out, they wrote a lot of the fundamental papers that people cite for machine learning today: https://research.nvidia.com/publications

> Will we finally compute for drug development and achieve just as great results as with chatbots?

Maybe - but they're not really analogous problem spaces. Fooling humans with text is easy - Markov chains have been doing it for decades. Automating the discovery of drugs and research of proteins is not quite so easy, rudimentary attempts like Folding@Home went on for years without any breakthrough discoveries. It's going to take a lot more research before we get to ChatGPT levels of success. But tools like CUDA certainly help with this by providing flexible compute that's easy to scale.

[-]

dekhn 7 months ago

There was nothing rudimentary about Folding@Home (either in the MD engine or the MSM clustering method), and my paper on GPCRs that used Folding@Home regularly gets cites from pharma (we helped establish the idea that treating proteins as being a single structure at the global energy minimum was too simplistic to design drugs). But F@H was never really a serious attempt at drug discovery- it was intended to probe the underlying physics of protein folding, which is tangentially related.

In drug discovery, we'd love to be able to show that virtual screening really worked- if you could do docking against a protein to find good leads affordably, and also ensure that the resulting leads were likely to pass FDA review (IE, effective and non-toxic), that could potentially greatly increase the rate of discovery.

llm_nerd 7 months ago

It seems this way, but we've been saying this for years and years. And somehow nvidia keeps making more and more.

Isn't it telling when Google's release of an "AI" chip doesn't include a single reference to nvidia or its products? They're releasing it for general availability, for people to build solutions on, so it's pretty weird that there isn't comparisons to H100s et al. All of their comparisons are to their own prior generations, which you do if you're the leader (e.g. Apple does it with their chips), but it's a notable gap when you're a contender.

[-]

jeffbee 7 months ago

Google posted TPUv6 results for a few things on MLCommons in August. You can compare them to H100 over there, at least for inference on stable diffusion xl.

Suspiciously there is a filter for "TPU-trillium" in the training results table, but no result using such an accelerator. Maybe results were there and later redacted, or have been embargoed.

mlboss 7 months ago

The biggest barrier for any Nvidia competitor is that hackers can run the models on their desktop. You don't need a cloud provider specific model to do stuff locally.

[-]

r3trohack3r 7 months ago

This. I suspect consumer brands focusing on consumer hardware are going to make a bigger dent in this space than cloud vendors.

The future of AI is local, not remote.

[-]

stickwarlegacy 7 months ago

[dead]

melodyogonna 7 months ago

If Modular (modular.com) delivers on their MAX product the moat may get smaller as they may lose the software advantage.

randomcatuser 7 months ago

How good is Trillium/TPU compared to Nvidia? It seems the stats are: tpu v6e achieves 900 TFLOPS per chip (fp16) while Nvidia H100 achieves 1800 TFLOPS per gpu? (fp16)?

Would be neat if anyone has benchmarks!!

[-]

chessgecko 7 months ago

1800 on the h100s is with 2/4 sparsity, it’s half of that without. Not sure if the tpu number is doing that too, but I don’t think 2/4 is used that heavily so I probably would compare without it.

WanderPanda 7 months ago

Crazy conglomerate discount on Alphabet if you can see TPUs as the only Nvidia competitor for training. Breaking up Alphabet seems more profitable than ever

[-]

wrsh07 7 months ago

TPU is available to people outside of Google, but people prefer Nvidia

It's not obvious to me that a hardware business manufacturing tpus for the general public is necessarily more valuable than one that benefits from tight integration with Google's internal software stack and datacenter tech

It's orthogonal: tpu doesn't do much marketing and doesn't have to. Engineers at Google will use it because they have to and because of the huge cost advantage they get for using it. TPU is probably lacking the libraries that Nvidia has developed to be accessible to a broad array of use cases

[-]

lanthissa 7 months ago

This is the thing that makes no sense to me, if TPU's are even close to nvidia its a business worth as much as search. They could spend billions on just dx and get 100x that back on investment.

The whole TPU line makes no sense to me, if its as good as it says (which is does seem to be) sell it publicly and add a trillion to your market cap.

The only way this makes any sense is if inside google people legitimately think google cloud is going to be bigger than nvidia+ a lot of azure&aws, which seems crazy

[-]

wrsh07 7 months ago

TPU is extremely high performance for Google because it can be optimized for Google's workloads. Google has absolutely world class data centers (think power efficiency) and TPU performs extremely well in them. (In turn: I expect Gemini is optimized to be trained and run on TPU)

Google has vastly different constraints than an average startup or user of machine learning.

The flexibility of Nvidia GPU's, the software, these things are really valuable for most people. Only recently with LLMs has the majority of uses started to look very very similar (some slightly modified transformer)

Early versions of TPU required you to write your ml logic using a very restricted subset of tensorflow. (It had to be somewhat functional, etc)

Normal people don't write code like that, and it's not worth it for them to re write a working model to run on a TPU because software engineers are expensive and Nvidia GPUs are really good and general.

[-]

wrsh07 7 months ago

There's a winner take all network effect: research happens on cuda, if you want an off the shelf solution you'll need to use cuda, etc

teleforce 7 months ago

It's beyond me why processor with dataflow architecture is not being used for ML/AI workloads, not even in minority [1]. Native dataflow processor will hands down beats Von Neumann based architecture in term of performance and efficiency for ML/AI workloads, and GPU will be left redundant for graphics processing instead of being the default co-processor or accelerator for ML/AI [2].

[1] Dataflow architecture:

https://en.wikipedia.org/wiki/Dataflow_architecture

[2] The GPU is not always faster:

https://news.ycombinator.com/item?id=42388009

[-]

germanjoey 7 months ago

Sambanova's RDU is a dataflow processor being used for ML/AI workloads! It's amazing and actually works.

almostgotcaught 7 months ago

let me ask you a very serious question and please answer honestly: have you ever tried to program a "dataflow processor"? if the answer is no then I invite you to try and then you will understand intimately why they're not being used for anything.

LittleTimothy 7 months ago

The question I'd like to know the answer to is "What was the total cost of training Gemini 2.0 and how does it compare to the total cost to train equivalent capability models on Nvidia GPUs?". I'd be fascinated to know, and there must be someone at Google who has the data to actually answer that question. I suspect it's politically savvy for everyone at Google to pretend that question doesn't exist or can't be answered (because it would be an existential threat to the huge TPU project), but it would be absolutely fascinating. In the same way that Amazon eventually had to answer the "Soo.... how much money is this Alexa division actually making" question.

amelius 7 months ago

Are the Gemini models open?

[-]

jnwatson 7 months ago

Just the Gemma models are open.

stefan_ 7 months ago

Not even to Google customers most days, it seems.

Hilift 7 months ago

"we constantly strive to enhance the performance and efficiency of our Mamba and Jamba language models."

... "The growing importance of multi-step reasoning at inference time necessitates accelerators that can efficiently handle the increased computational demands."

Unlike others, my main concern with AI is any savings we got from converting petroleum generating plants to wind/solar, it was blasted away by AI power consumption months or even years ago. Maybe Microsoft is on to something with the TMI revival.

[-]

r3trohack3r 7 months ago

Energy availability at this point appears (to me at least) to be practically infinite. In the sense that it is technically finite but not for any definition of the word that is meaningful for Earth or humans at this stage.

I don't see our current energy production scaling to meet the demands of AI. I see a lot of signals that most AI players feel the same. From where I'm sitting, AI is already accelerating energy generation to meet demand.

If your goal is to convert the planet to clean energy, AI seems like one of the most effective engines for doing that. It's going to drive the development of new technologies (like small modular nuclear reactors) pushing down the cost of construction and ownership. Strongly suspect that, in 50 years, the new energy tech that AI drives development of will have rendered most of our current energy infrastructure worthless.

We will abandon current forms of energy production not because they were "bad" but because they were rendered inefficient.

beepbooptheory 7 months ago

This has been a constant thought for me as well. Like, the plan from what I can tell is that we are going to start to spinning all this stuff up every single time someone searches something on google, or perhaps, when someone would otherwise search something on there.

Just that alone feels like an absolutely massive load to bear! But its only a drop in the bucket compared to everything else around this stuff.

But while I may be thirsty and hungry in the future, at least I will (maybe) be able to know how many rs are in "strawberry".

aaron695 7 months ago

[dead]

7 months ago

[deleted]

peepeepoopoo95 7 months ago

[dead]