Sophon PFG-1: a monolithic-3D AI ASIC with 330 GB of on-die DRAM and no HBM

(phantafield.com)

28 points | by minkowsky 7 hours ago ago

35 comments

vessenes 6 hours ago

Minkowsky, cool design! Question - the ASIC designers I've worked with over the years have been fairly adamant that integrating memory on package interspersed with logic is very difficult; the general statements run like "those designs always look great on paper, but never tape out properly".

Have you done any hardware tests of this plan? Is this still considered quality advice?

Second q, why start with 28nm? Is the idea that you want to stick with TSMC and be able to shrink? If this does in fact work well, I can imagine wanting to shoot for a smaller process node pretty quickly. Is there some sort of tech / design gap you'll need to figure out as you go?

[-]

minkowsky 6 hours ago

Due to the thermal budget, most of the silicon design is constrained to a 2D layout. So the Memory is competing with logic for layout. Now we stack logic in the backend between metals.

We fabricated 2T0C DRAM arrays with a 3D monolithic structure. That's a must-do.

Why 28nm? Because it's cheap, widely available, and already gives us enough performance to beat Nvidia Vera Rubin. We have a road map, scaling it down. https://www.phantafield.com/whitepaper#6-scaling-roadmap

[-]

robocat an hour ago

> 2T0C

2 transistor zero capacitors

See last paragraph (3.) of another comment for context: https://news.ycombinator.com/item?id=48713803

gfody 6 hours ago

isn't cerebras the pudding proof of this design? it seems like ai chips galore are appearing from the woodwork but cerebras is 10 years down this rabbit hole and poised to dominate

[-]

vessenes 6 hours ago

I believe cerebras is one wafer, not deeply stacked, each core is like half memory half compute by area.

addaon 7 hours ago

Since when are we doing 32-layer planar transistor logic on a single chip? Even ignore the use of FETs for eDRAM… I didn’t realize we had decent logic density possible on BEOL.

[-]

minkowsky 5 hours ago

Because we can put FET on any layer. Usually, BEOL doesn't need such high density. The density depends on what lithography tool and mask you pick.

codingpanic 7 hours ago

I've been wondering how long before RAM is fabbed on die to get around supply issues. This is one of the first I've read of so far. How long before Apple releases a CPU with ram on die?

[-]

Rohansi 6 hours ago

They're typically manufactured with very different processes so one has to wonder what compromises are being made here to get both on the same die.

minkowsky 7 hours ago

Author here. The supply angle is exactly the motivation — HBM is the hardest part to get and ~26% of an AI rack's BOM.

First, separate three things people lump together. Apple already does memory on package (M-series unified memory = LPDDR5X dies next to the SoC). The near-term industry path is bonded stacking (AMD 3D V-cache, HBM4's logic base die). What we're doing is monolithic — growing the memory on top of finished logic. Three reasons that distinction matters:

1. Bonding only helps at the margin. A hybrid-bond interface still carries a relatively large interconnect capacitance in um scale, so at memory bandwidth the I/O drivers crossing it dissipate most of the power and overheat — you move the memory closer without escaping the I/O energy. Monolithic inter-tier vias are nano-scale (we model ~1% the interconnect energy of a bonded interface), and that's the only thing that actually moves the needle.

2. 2D-TMDs are the only functional CMOS you can build in the BEOL. Monolithic 3D means fabricating the upper tiers after the logic, at ≤450 °C, or you cook everything underneath. Silicon needs ~1000 °C; low-temp oxide semiconductors (IGZO) are n-type only, so no real CMOS. 2D-TMDs give both n- and p-type at BEOL temperature. Nothing else does.

3. ~6 orders of magnitude lower off-current (~1 fA/µm) finally makes a capacitor-free cell work. Conventional 1T1C DRAM needs a big storage capacitor — the deep-trench / high-aspect-ratio etch you can't do in the BEOL anyway. A 2T0C gain cell holds charge on a transistor gate with no capacitor; in silicon it leaked away in microseconds, so it was never usable. With 2D-TMD leakage you get ~1.8 s retention — refresh at ~1 Hz and drop the capacitor, and the trench, entirely.

[-]

robocat an hour ago

> TMDs

= Transition Metal Dichalcogenides

> BEOL

= Back End Of Line. The later stages of semiconductor manufacturing (after the standard CMOS logic transistors) e.g. adding the metal wiring and interconnect layers. Think end of a manufacturing line.

The core concept is to layer multiple non-standard non-silicon memory transistors above the metal layers.

That sounds like a stunning invention, since I think that alone implies better memory density than current SRAM (ignoring the extra complexity of stacking it above a logic layer).

wmf 7 hours ago

This design is absolutely wild. It probably won't work but I admire the dream.

[-]

minkowsky 7 hours ago

Author here. The economy is more realistic than the wafer-scale ASIC by Cerebras.

[-]

JumpCrisscross 7 hours ago

Can you explain why?

[-]

minkowsky 6 hours ago

I have a detailed comparison with Cerebras in economic analysis: https://www.phantafield.com/whitepaper#7-economic-analysis

wmf 6 hours ago

I'm questioning technical risks such as BEOL transistors and 2T DRAM cell structure, not the economics. Cerebras has already retired their technical risk.

[-]

minkowsky 5 hours ago

It's risky, like landing a rocket, but not impossible.

binyu 7 hours ago

Hello, kudos for the tremendous work. Could you explain the difference between your design and Cerebras?

Bests

[-]

minkowsky 7 hours ago

Author here. Thanks! Short version: Cerebras and we are attacking the same memory wall from opposite axes — they scale out in 2D, we scale up in 3D.

Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed.

Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW.

So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity.

Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach.

[-]

binyu 6 hours ago

> they scale out in 2D, we scale up in 3D.

This actually helps a lot, thanks.

> Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic

Is this done with current manufacturing technologies? Does it require a special process?

> no streaming, no off-chip memory at all. ~1 kW, not 23 kW

Is this for an individual compute unit? Compared to Cerebras, what's the ratio of power used vs compute output?

[-]

minkowsky 5 hours ago

I think you are asking for the Energy/token. Cerebras is 12.8J, Sophon is 25.8mJ. Three orders of difference.

[-]

binyu 4 hours ago

so Sophon is less efficient than Cerebras?

Edit: is that Joule vs micro-Joule? I need better glasses

> Cerebras is 12.8J, Sophon is 25.8mJ

Are your figures hypothetical or do you have a working prototype?

matt123456789 6 hours ago

I suspect you are being downvoted because your answer is AI-generated, but I found it very clear and will upvote.

[-]

binyu 6 hours ago

What makes you think his reply was AI generated?

Edit: I can see a bunch of hints, most definitely. Still a good comment though.

[-]

minkowsky 6 hours ago

I do use AI for some of the answers. I now know the penalty. Thank you for the heads up.

throwaway89201 5 hours ago

The entire design looks very interesting, but from the outside and without domain expertise I find it very hard to assess if anything about this is actually real or just a large and well-executed product of an AI psychosis.

The signs that this project is real are hard to verify:

- Angel investment by FinFET inventor Chenming Hu [1] seems a big vote of confidence, but there is no independent confirmation of this anywhere, except for two photo's in LinkedIn posts [2], which do look convincing.

- The NanoGalaxy PPMOCVD was presented at IEDM 2025 [3][4], but nobody seems to have written about it except the company itself. In this case, presenting means a poster presentation with a very vibrantly colored marketing picture.

- The NanoGalaxy PPMOCVD is built and in production, because you can "Witness a full 12-inch MoS₂ growth cycle on your own wafer lot" [5], but nobody has reported on this. A photo/video of the actual device would help a lot, but instead a very clean picture of what seems like a 3d-model is shown.

There are a few worrying signs:

- The submitter on HN presents itself as the founder. They have previously submitted other projects under the Phanta or PhantaField names [6]. Notably two hype cycle subjects: DAOs, NFTs and augmented reality, combined in a book that itself is rather 'out there' [7].

- The comments on HN by the founder are clearly AI generated with phrases like "honest caveat". The content seems to make sense (to a non-expert like me), but it's quite jarring.

- All the work except the NanoGalaxy seems to be theoretical for now, but written in very definitive language in extreme detail. For example "The die is built" with specific properties but then referencing three very experimental papers. This can of course be genuine (technical) marketing, but it's also very similar to AI psychosis work that I've encountered elsewhere. Although I must say in comparison this does look a lot more internally consistent and logical to me.

- I find it very hard to believe that the NanoGalaxy is actually existing and working hardware ready to "Witness a full 12-inch MoS₂ growth cycle on your own wafer lot". I would imagine you need a sizable team to produce such a new device, and that seems to be inconsistent with the way the company presents itself (a one man show of the founder). The absence of any verification or showcases of the device, or any evidence of a larger team make it suspect.

[1] https://www.phantafield.com/news/first-angel-investment-chen...

[2] https://www.linkedin.com/feed/update/urn:li:activity:7126002... and https://www.linkedin.com/posts/xuejunxie_its-a-great-honor-t...

[3] https://www.linkedin.com/feed/update/urn:li:share:7404253323...

[4] https://www.phantafield.com/news/12-inch-ppmocvd-iedm-2025

[5] https://www.phantafield.com/product/ppmocvd

[6] https://news.ycombinator.com/submitted?id=minkowsky

[7] https://xcancel.com/thepantheonai

[-]

minkowsky 5 hours ago

I also have another side project, thepantheon.ai. I think it's ok to have multiple talents. To make the world a much more interesting place.

We talk to fabs. I am not allowed to expose any conversation. We don't need to prove to average Joe what we have.

It's indeed teamwork to bring it into production, even with huge help from AI. It also takes expertise to make sure AI is correct. As a founder, it's more of a merit than a drawback to create with a minimum headcount.

[-]

throwaway89201 5 hours ago

> I think it's ok to have multiple talents.

Of course that's okay, but do recognize that most blockchain projects in general and DAOs and NFTs specifically have been considered frauds or at the very least pipe dreams by many on this site from the beginning. And wider in tech society from the moment the hype was gone.

> We don't need to prove to average Joe what we have.

Of course you don't need to prove anyone anything, but it really can't hurt and there doesn't seem to be much effort in addressing the main issues. Not having anyone write about you also just seems like bad marketing: as you clearly are not in stealth mode, an extremely verbose public website stands oddly against no external coverage.

> It's indeed teamwork to bring it into production

So is or isn't the NanoGalaxy an actually physical, working device ready for a demo? (which isn't necessarily production, but somewhat close)

[-]

minkowsky 4 hours ago

It's a valid prejudice. But smart contract is a beautiful technology.

I didn’t spend time on PR. Now I changed my mind. It’s a noisy world and good thing needs broadcasting.

Yes. We offer demo if you are a potential customer.

6 hours ago

[deleted]

brcmthrowaway 7 hours ago

What is this? AI generated company?

RobLach 6 hours ago

MoS2 lattice construction?

freakynit 3 hours ago

80B + INT4 + speculative (FP8 mode) => 72,188 tokens/s effective

..da fck!!

6 hours ago

[deleted]

minkowsky 7 hours ago

[flagged]