AMD Ryzen 7 9800X3D Linux Performance: Zen 5 With 3D V-Cache

(phoronix.com)

150 points | by mfiguiere 8 months ago ago

118 comments

nisten 8 months ago

I am surprised at how much this thing is just straight up crushing it with just 8 cores.

I think it topping the machine learning benchmarks has to do with having only 8 cores to share the 96MB of L3 cache, which ends up having a ratio of 1core having 1MBL2 + 12MB L3 which is huge, that means EACH THREAD has more cache than i.e the entire nvidia 3090 (6mb l2 total), and this ends up taking FULL advantage of the extra silicon of various avx extensions.

[-]

BeefWellington 8 months ago

I'm curious to see if AMD will release a 9950X3D this time around. I can foresee that kind of CPU dominating everything else across most workloads given how good this 8-core is holding up against CPUs with double or more cores.

[-]

didgetmaster 8 months ago

I have a 5950x that is now a few years old and I planned to upgrade to a 9950x.

I have never had one of the 3D V-Cache processors and am curious how it would improve the benchmarks for my multi-threaded data management system that does many operations against a set of 4K blocks of data.

I heard rumors that a 9950x3D version will be available in January. I am trying to figure out if I should wait.

[-]

grepfru_it 8 months ago

January is two months away. I would.

Tuna-Fish 8 months ago

Yes, it's supposedly coming early next year.

[-]

jsheard 8 months ago

I think the current rumor is that only one of the chiplets will have the extra cache though, so you'll have 8 cores with the big cache and 8 cores with the normal cache.

[-]

qzw 8 months ago

If they make one with extra cache on both CCDs, it would probably get some kind of AI branding and be at a significantly higher price point. Current games would hardly benefit from 16 cores all with that much cache.

[-]

scheeseman486 8 months ago

The main benefit is that it's a no-compromise product. High single thread performance for games and there's more of those cores for productivity, it'd be the best workstation CPU and the best gaming CPU in one package.

[-]

qzw 8 months ago

But you'd get 95+% of the same benefit with the v-cache on just one CCD, which is what they did with the 7950X3D.

[-]

scheeseman486 8 months ago

They've solved a lot of the manufacturing issues since then and since it's already a premium product, why not go all in? There might be cases where the extra cache helps performance in heavily multithreaded workloads, also niche usecases like running the chip in a headless gaming server, which would allow for splitting of the CPU into two for simultaneous game streaming without significantly compromising performance for either client.

That's the point of no compromise, sometimes it's nice to have 100% benefit (not to say that making a single CCD cache version doesn't make sense as a product).

snvzz 8 months ago

there was some point to doing that, due to the power restrictions imposed by having cache on top.

But now cache is underneath.

tiffanyh 8 months ago

While true, also keep in mind that the iPad Pro (M4) which has no active cooling, and uses only 1/4th the power ... is still faster (single & multicore) than this 9800X3D - and it's also been on the market for 1/2 year now already.

[-]

JohnTHaller 8 months ago

In multi-threaded workloads, the M4 gets 13,380, the 9800X3D gets ~19,000 (varies by build), and the 9950X gets 22,000-24,000 depending on build.

The M4 Max you can pre-order gets around 26,000 multicore but is significantly more expensive than the 9950X ($569) or 9800X3D ($479). The M4 Max is a $1,200 premium over the M4 on the 14 inch MacBook Pro and a $1,100 premium over the M4 Pro on the 16 inch.

The M4 Max is only available in the MacBook Pro at present. The Mac Mini and iMac will only get the base M4. The Mac Studio is still based on the M2.

This is just a summary of performance and cost. Portability, efficiency, and compatibility factors will weigh everyone's choices.

[-]

JohnTHaller 8 months ago

UPDATE TO MY COMMENT: The new small Mac Mini does have an option for the M4 Pro but not the M4 Max. For the curious, the M4 Pro supposedly gets around 22,000 in Geekbench. It's an $800 premium over the base M4 Mac Mini while adding 8GB of RAM and 256GB of storage.

adrian_b 8 months ago

Single core yes, but multi core no.

The Geekbench scores cannot compare laptop CPUs with desktop CPUs, because the tasks that are executed are too short and they do not demonstrate the steady-state throughput of the CPUs. The desktop CPUs are much faster for multithreaded tasks in comparison with laptop/tablet CPUs than it appears in the GB results.

The Apple CPUs have a much better instructions-per-clock-cycle ratio than any other CPUs, and now in M4 they also have a relatively high clock frequency, of at least 4.5 GHz. This allows them to win most single-threaded benchmarks.

However the performance in multi-threaded benchmarks has a very weak dependence on the CPU microarchitecture and it is determined mostly by the manufacturing process used for the CPU.

If we were able to compare Intel, AMD and Apple CPUs with the same number of cores and made with the same TSMC process, their multithreaded performance would be very close at a given power consumption.

The reason is that executing a given benchmark requires a number of logic transitions that is about the same for different microarchitectures, unless some of the design teams have been incompetent. An Apple CPU does more logic transitions per clock cycle, so in single thread it finishes the task faster.

However in multithreaded execution, where the power consumption of the CPU reaches the power limit, the number of logic transitions per second in the same manufacturing process is determined by the power consumption. Therefore the benchmark will be completed in approximately the same number of seconds when the power limits are the same, regardless of the differences in the single-threaded performance.

At equal power, an M4 will have a slightly better MT performance than an Intel or AMD CPU, due to the better manufacturing process, but the difference is too small to make it competitive with a desktop CPU.

[-]

wtallis 8 months ago

> The Geekbench scores cannot compare laptop CPUs with desktop CPUs, because the tasks that are executed are too short and they do not demonstrate the steady-state throughput of the CPUs. The desktop CPUs are much faster for multithreaded tasks in comparison with laptop/tablet CPUs than it appears in the GB results.

Bullshit. What you're talking about is the steady-state of the heatsink, not the steady state of the chip. Intel learned the hard way that a fast CPU core in a phone really does become a fast CPU core in a laptop or desktop when given a better cooling solution.

> However in multithreaded execution, where the power consumption of the CPU reaches the power limit, the number of logic transitions per second in the same manufacturing process is determined by the power consumption. Therefore the benchmark will be completed in approximately the same number of seconds when the power limits are the same, regardless of the differences in the single-threaded performance.

No, microarchitecture really does matter. And so does the macro architecture of AMD's desktop chips that burn a huge amount of power on an inefficient chip to chip interface.

kuschku 8 months ago

For an apples to apples comparison, you'll need to compare Zen 5 with M3, or whatever Zen 6 is going to be with M4.

Apple is paying for exclusive access to TSMC's next node. That improves their final products, but doesn't make their architecture inherently better.

[-]

ricketycricket 8 months ago

Do you though? M4 is what is on the market now and this chip is just coming out. Maybe they are on different processes, but you still have to compare things at a given point in time.

rowanG077 8 months ago

Why would a consumer care about what node something is on? You should only care about a set of processors that is available in the market at the same time. The M4 is available now and Zen 6 is not. Once zen 6 is here we probably have an M5.

[-]

OKRainbowKid 8 months ago

Where can I buy an M4? I don't care about the rest of Apple's products, but the chips are pretty sweet.

[-]

rowanG077 8 months ago

Is this a serious question? Apple.com and basically every computer part store.

[-]

OKRainbowKid 8 months ago

The question wasn't entirely serious, no. My point is: afaik the M processors aren't actually available on the market, they only come as one component of a much more expensive product.

I don't mean to take away from how impressive they are.

[-]

rowanG077 8 months ago

And why does that matter exactly for the discussion at hand?

[-]

OKRainbowKid 8 months ago

You wrote

>You should only care about a set of processors that is available in the market at the same time. The M4 is available now and Zen 6 is not.

I can't buy an M4, it's not available in the market.

[-]

rowanG077 8 months ago

I don't get it. You can literally go buy it right now. You have been able to buy it for months and months by getting an Ipad. If you are saying you can't buy it because it's used inside a product then the same goes for basically all mobile processors. I can't "buy" an AMD Ryzen HX 370. I can't buy an Intel Core Ultra 258V. And neither can I "buy" a Qualcomm Snapdragon X1E-80-100. This has never even been a factor.

[-]

OKRainbowKid 8 months ago

You are correct, the situation is similar for most mobile processors. They are unavailable on the market for consumers looking to build a system.

Apple goes one step further though: M processors aren't just unavailable to consumers, there's also no way for OEMs to build systems using these chips. In this point they differ significantly from the examples you mentioned. For people that do not want to buy into the Apple ecosystem, M chips are effectively not on the market, and benchmark comparisons to desktop or server CPUs are meaningless.

[-]

rowanG077 8 months ago

Why does it matter that they aren't available to OEMs? This is moving the goalpost from your original argument.

The second part of your argument is currently correct but that will change shortly. There is no reason to lock into the apple ecosystem. M4 support is underway in Linux though not yet available. You can easily use a Mac mini as a server running Linux.

[-]

OKRainbowKid 8 months ago

>This is moving the goalpost from your original argument.

No, them only being available as part of Apple products and thus not on the CPU market was my original point. I should probably have been more explicit in my original comments. I don't believe I have been moving goalposts, but your interpretation of the point I was trying to make might have changed.

fragmede 8 months ago

Mac Mini M4's went on sale today :)

[-]

OKRainbowKid 8 months ago

Please forgive the weird analogy, but if I'm on the market for a radio, I really don't want to buy a car just for its radio. Especially if there's no way to use the radio without that specific car.

[-]

fragmede 8 months ago

My issue with that analogy is that the CPU is more like the engine of a car, and people certainly do buy cars just to have a vehicle with a given engine.

When it was only the iPad with the M4, it was easier to be sympathetic to your cause, since the iPad is totally locked down and isn't a general purpose computing device since it can't run arbitrary code. But now the Mac Mini is available. It is a general purpose computing device, and you can install Firefox and Linux or whatever you want.

It doesn't meet the level of hardware vendor purity you're asking for, sure, but that's a more ideological now that there's a general purpose computer with the M4 for sale. (Just wish it were cheaper.)

And since it just went on sale today, I was highlighting that, since other readers might want to know that they can now get a computer with an M4.

osti 8 months ago

Yup I just looked at the clang score in geekbench, for single threaded 9800x3d scored about 3200, whilst m4 had 4400... The m4 is so far above the rest it's ridiculous. Wish Apple made an x86 equivalent so that it can play Windows games lol.

[-]

nightski 8 months ago

Just supporting Linux would be adequate imho. Non-existent Linux support straight up makes M4 a non-starter for myself as much as I can admire the hardware.

[-]

osti 8 months ago

For developers yes, but gamers seem to have the loudest voice in the desktop PC performance conversation, so I think it's important to cater to that market.

[-]

nieve 8 months ago

Gamers in general are not looking at Apple's chips.

8 months ago

[deleted]

hulitu 8 months ago

> for single threaded 9800x3d scored about 3200, whilst m4 had 4400... The m4 is so far above the rest it's ridiculous.

Except the fact that your computer runs more than one thread. Pity that this "single core" performance cannot be utilized at its maximum potential.

heraldgeezer 8 months ago

And the OS is terrible, so it's practically useless for me.

ploxiln 8 months ago

Hehe ... yeah, single threaded, in some benchmarks. Very impressive chip, the M4. Multi-threaded loads that take more than 30 seconds, no way, come on. But to see the X3D chips really shine above their competitors, you need to slot in a high-end graphics card, and load up a ... uh well you can't compare to Apple Silicon at that point ...

rasz 8 months ago

... in geekbench. How about compiling? compressing?

jsheard 8 months ago

> I am surprised at how much this thing is just straight up crushing it with just 8 cores.

♫ Cache rules everything around me ♫

helf 8 months ago

[dead]

aurareturn 8 months ago

https://tpucdn.com/review/amd-ryzen-7-9800x3d/images/efficie...

Raw gaming performance increase is good but its gaming efficiency seems to have taken a dip compared to 7800X3D.

So AMD chose to decrease efficiency to get more performance this generation.

Source: https://www.techpowerup.com/review/amd-ryzen-7-9800x3d/23.ht...

[-]

Numerlor 8 months ago

The efficiency is only worse because the CPU can use the power without burning itself up unlike the last generation's X3D. And efficiency is always better at lower clocks. You can get this generation's efficiency uplift by limiting its power to the levels where last generation's CPU started throttling to keep its 89C Tjmax, but that will inevitably also limit the frequency that's the main performance uplift for the CPU

For comparison on how limited last gen's X3D was wrt power, tom's hardware has it on 71W with all core AVX, while my 7600X with 2 fewer cores consumes up to 130W

[-]

aurareturn 8 months ago

If I can summarize what you wrote: Same IPC gain as normal Zen5 but more power can be drawn to increase performance due to moving the cache chiplet to the bottom.

[-]

wtallis 8 months ago

The previous 3D cache solutions were not just limited thermally, but also the cache chiplet could not tolerate the high voltages that AMD's CPU cores use at high frequencies. Even with excellent cooling, you weren't going to get a 7800X3D or 5800X3D to match frequencies with the non-3D parts. (This might have been less of a problem if AMD could put the extra cache on a different power rail, but that's hard to retrofit into an existing CPU socket.) This new cache chiplet still has a lower voltage limit than the CPU cores, but it's not as big a disparity.

shantara 8 months ago

9800X3D is supposed to have Eco mode with a lower TDP cap, similarly to other AMD processors. I don't see it included in the initial reviews, but it would be curious to see the followup data. If the history is anything to go by, it would significantly decrease the power consumption with only a marginal performance impact.

[-]

SushiHippie 8 months ago

I have the 7950x, and if I set it to 65W eco mode, I still have basically the same geekbench score

65W: https://browser.geekbench.com/v6/cpu/6126001

105W: https://browser.geekbench.com/v6/cpu/5821065

I actually haven't tested it with 170W (which is the default for the 7950x) for whatever reason, but the average 7950x score on geekbench is basically the same as my geekbench scores with lower than normal TDP.

https://browser.geekbench.com/processors/amd-ryzen-9-7950x

I wouldn't be surprised if the same is possible with the newer CPUs.

Nice added bonus is that my PC fans barely spin (not at audible speeds)

[-]

ahartmetz 8 months ago

Yeah, my 7950X is also limited to ~90W nominal (which is ~120W actual) - full power (170/230 or so) is very loud, rapidly wears out the weak-ass VRMs on my Asus B650 board (lesson: Asus can't be blindly trusted anymore), and buys 0-5% of performance.

xarope 8 months ago

my 2700 is due for a refresh next year, so that's what I plan to do, get one of these fancy X3D versions, then cap it to hopefully sub 100W

Hikikomori 8 months ago

Man Intel is so far behind on that list.

[-]

Already__Taken 8 months ago

Bad arch decision are punishing. AMD was absolutely dwarfed in the early core iX days and never really came back until Ryzen. The whole bulldozer linage was DoA to the point Opteron just never factored in.

Hopefully Intel pull something out again but they look asleep a the wheel.

toast0 8 months ago

For a long time, x86 chips are happy to give you a little more performance for a lot more watts at the top end of the performance chart.

Watts/fps @ max fps makes for an interesting graph, but not a very clear comparison. It would be better to compare watts used when locked at a given fps, or fps available when locked at a given wattage. Or watthours to do a video encode (with max wattage, and at various watt limits).

Night_Thastus 8 months ago

Nice to actually have a decent release this generation of CPUs.

The rest of Zen5 was maybe a 5% bump on average, and Intel's new series actually regressed in performance compared to 14th gen.

Seems like the Zen5X3D's will be the only good parts this time around.

[-]

notanote 8 months ago

Hardware Unboxed has the interesting theory that the I/O die, which is unchanged between Zen4 and Zen5, is a significant bottleneck especially for the latter. The 3D v-cache would then ease the pressure there, and so see the cpu get an extra boost beyond that expected from increased cache.

13hunteo 8 months ago

To cut Intel some slack, this latest version overhauls their old architecture, and they were fairly upfront about the lack of development in performance in this generation.

The idea is the new platform will allow for better development in future, while improving efficiency fairly significantly.

[-]

Night_Thastus 8 months ago

From a consumer standpoint - this doesn't matter. You can't buy that future product that may exist. You can only choose whether to buy the current product or not. And right now, that product is bad.

I certainly hope the next generation is a massive bump for Intel, but we'll see if that's the case.

[-]

mmaniac 8 months ago

Adding onto that, the roadmap to Intel's next generation isn't exactly clear. Arrow Lake Refresh would have seemingly bumped core counts healthily, but that's cancelled now. I don't believe that it's cancelled because its successor is ahead of schedule.

qzw 8 months ago

Also nice to be able to boast a bigger uplift in the following gen due to regressing this one! But they definitely did need to get their efficiency under control since their parts were turning into fairly decent personal heating units.

fweimer 8 months ago

I think the new T-equivalent CPU could be very interesting if Intel releases one. Those variants are optimized for 35W TDP, and they can be used for building high-performance fanless systems that can sustain their performance for quite some time. The lower power requirements for Arrow Lake might be a really good match there.

5kg 8 months ago

it's scrapped, the new design: https://www.pcworld.com/article/2507953/lunar-lakes-integrat...

[-]

zeusk 8 months ago

Parent is quite possibly talking about arrow lake and not lunar lake which is a mobile only part.

heraldgeezer 8 months ago

So why buy this generation and not wait unless your computer broke and you NEED Intel?

duskwuff 8 months ago

> To cut Intel some slack, this latest version overhauls their old architecture...

... and their 13th/14th generation processors had serious problems with overvoltage-induced failures - they clearly needed to step back and focus on reliability over performance.

antisthenes 8 months ago

9800X3D looks like an all-around winner, so if you don't mind spending $500 on just the CPU, I don't see why anyone would get anything else.

[-]

ThatMedicIsASpy 8 months ago

All-around winner in what? For $500 you can get a lot more cores.

All-around winning, $500, 8 cores makes no sense.

This thing has a premium gaming price tag because there is nothing close to it other than their own 7800X3D.

[-]

sliken 8 months ago

In theory, yes. But in the real world the bottleneck of the same 128 bit wide memory, interface that's been popular way back since the time of dual core chips.

Less cache misses (on popular workloads) helps decrease power and increase performance enough that few things benefit from 12-16 cores.

Thus the M3 max (with a 512 bit wide memory system) has a class leading single core and multi-core scores.

[-]

0xQSL 8 months ago

I'm not so sure about memory actually being the bottleneck for these 8 core parts. If memory bandwidth is the bottleneck this should show up in benchmarks with higher dram clocks. I can't find any good application benchmarks, but computerbase.de did it for gaming with 7800MHz vs 6000MHz and didn't find much of a difference [1]

The apple chips are APUs and need a lot of their memory bandwidth for the gpu. Are there any good resources on how much of this bandwidth is actually used in common cpu workloads? Can the CPU even max out half of the 512bit bus?

[1] https://www.computerbase.de/artikel/prozessoren/amd-ryzen-7-...

[-]

sliken 8 months ago

Well there's much more to memory performance than bandwidth. Generally applications are relatively cache friendly, thus the X3D helps a fair bit, especially with more intensive games (ones that barely hit 60 fps, not the silly game benchmarks that hit 500 fps).

Generally CPUs have relatively small reorder windows, so a cache miss hurts bad, 80ns latency @ 5 GHz is 400 clock cycles, and something north of 1600 instructions that could have been executed. If one in 20 operations is a cache miss that's a serious impediment to getting any decent fraction of peak performance. The pain of those cache misses is part of why the X3D does so well, even a few less cache misses can increase performance a fair bit.

With 8c/16 threads having only 2 (DDR4) or 4 (DDR5) cache misses pending with a 128 bit wide system means that in any given 80-100ns window only 2 or 4 cores can continue resume after a cache miss. DDR-6000 vs DDR-7800 doesn't change that much, you still wait the 80-100ns, you just get the cache line in 8 (16 for ddr5) cycles @ 7800MT/sec instead of 8 (16 for DDR5) cycles @ 6000MT/sec. So the faster DDR5 means more bandwidth (good for GPUs), but not more cache transactions in flight (good for CPUs).

With better memory systems (like the Apple m3 max) you could have 32 cache misses per 80-100ns. I believe about half of those are reserved for the GPU, but even 16 would mean that all of the 9800X3Ds 16 threads could resolve a cache miss per 80-100ns instead of just 2 or 4.

That's part of why a M4 max does so well on multithreaded code. M4 max does better on geekbench 6 multithread than not only the 9800x3d (with 16 threads) but also a 9950x (with 16c/32 threads). Pretty impressive for a low TDP chip that fits in thin/light laptop with great battery life and competes well against Zen 5 chips with a 170 watt TDP that often use water cooling.

[-]

Dylan16807 8 months ago

> only 2 (DDR4) or 4 (DDR5) cache misses pending with a 128 bit wide system

Isn't that the purpose of banks and bank groups, letting a bunch of independent requests work in parallel on the same channel?

[-]

sliken 8 months ago

Dimms are dumb. Not sure, but maybe rambus helped improve this. Dimms are synchronous and each memory channel can have a single request pending. So upon a cache miss on the last level cache (usually L3) you send a row, column, wait 60ns or so, then get a cache line back. Each memory channel can only have a single memory transaction (read or write) in flight. The memory controller (usually sitting between the L3 and ram) can have numerous cache misses pending, each waiting for the right memory channel to free.

There are minor tweaks, I believe you can send a row, column, then on future accesses send only the column. There's also slight differences in memory pages (a dimm page != kernel page) that decrease latency with locality. But the differences are minor and don't really move the needle on main memory latency of 60 ns (not including the L1/l2/l3 latency which have to miss before getting to the memory controller).

There are of course smarter connections, like AMD's hypertransport or more recently infinity fabric (IF) that are async and can have many memory transactions in flight. But sadly the dimms are not connected to HT/IF. IBM's OMI is similar, fast async serial interface, with an OMI connection to each ram stick.

8 months ago

[deleted]

wmf 8 months ago

For AMD I think Infinity Fabric is the bottleneck so increasing memory clock without increasing IF clock does nothing. And it's also possible that 8 cores with massive cache simply don't need more bandwidth.

[-]

sliken 8 months ago

My understanding is the single CCD chips (like the 9800x3d) have 2 IF links, while the dual CCD chips (like the 9950x) have 1. Keep in mind these CCDs are shared with turin (12 channel), threadripper pro (8 channel), siena (6 channel), threadripper (4 channel).

The higher CCD configurations have 1 IF link per chip, the lower have 2 IF links per chip. Presumably AMD would bother with the 2 IF link chips unless it helped.

[-]

CobaltFire 8 months ago

This was only true for Epyc, and only true for a small number of low CCD SKUs.

Consumer platforms do NOT do this; this has actually been discussed in depth in the Threadripper Pro space. The low CCD parts were hamstrung by the shortage of IF links, meaning that they got a far smaller bump from more than 4 channels of populated RAM than they could have.

[-]

sliken 8 months ago

Ah, interesting and disappointing. I've been looking for more memory bandwidth. The M4 max is tempting, even if only half the bandwidth is available to the CPUs. I was also looking at the low end epyc, like the Epyc Turin 9115 (12 channel) or Siena 8124P (6 channel). Both in the $650-$750 range, but it's frustratingly hard to figure out what they are actually capable of.

I do look forward to the AMD Strix Halo (256 bit x 8533 MHz).

Dylan16807 8 months ago

I can't find anything to back that up.

That said, each link gives a CCD 64GB/s of read speed and 32GB/s of write speed. 8000MHz memory at 128 bits would get up to 128GB/s. So being stuck with one link would bottleneck badly enough to hide the effects of memory speed.

[-]

sliken 8 months ago

I've been paying close attention, found various hints at anandtech (RIP), chips and cheese, and STH.

It doesn't make much difference to most apps, but I believe the single CCD (like the 9700x) has better bandwidth to IOD then their dual CCD chips, like the 9900x and 9950x

Similarly on the server chips you can get 2,4,8, or 16 CCDs. To get 16 cores you can use 2 CCDs or 16 CCDs! But the sweet spot (max bandwidth per CCD) is at 8 CCDs where you get a decent number of cores and twice the bandwidth per CCD. Keep in mind the genoa/turin EPYC chips have 24 channels (32 bit x 24) for a 768 bit wide memory interface. Not nearly as constrained as their desktops.

Wish I could paste in a diagram, but check out:

https://www.amd.com/content/dam/amd/en/documents/epyc-techni...

Page 7 has a diagram of 96 core with one GMI (IF) port per CCD and a 32 core chip two GMI ports per CCD.

That's a gen old I believe, the max CCDs is now 16, not 12 with turin.

[-]

Dylan16807 8 months ago

So "GMI3-wide" and similar terms are the important things to search for.

some diagrams: https://www.servethehome.com/amd-epyc-genoa-gaps-intel-xeon-...

From another page: "The most noteworthy aspect is that there is a new GMI3-Wide format. With Client Zen 4 and previous generations of Zen chiplets, there was 1 GMI link between the IOD and CCD. With Genoa, in the lower core count, lower CCD SKUs, multiple GMI links can be connected to the CCD."

And it seems like all the chiplets have two links, but everything I can find says they just don't hook up both on consumer parts.

[-]

sliken 8 months ago

Didn't find anything clearly stating one way or another, but the CCD is the same between ryzen and epyc, so there's certainly the possibility.

I dug around a bit, and it seems Ryzen doesn't get it. I guess that makes sense, if the IOD on ryzen gets 2 GMI links. On the single CCD parts there's no other CCD to talk to. On the dual CCD parts there's not enough GMI links to have both with GMI-wide.

Maybe this will be different on the pending Zen 5 part (Strix Halo) that will have 256 bits wide (16 x 32 bit) @ 8533 MHz = 266 GB/sec since there will be 2 CCDs and a significant bump to memory bandwidth.

[-]

wmf 8 months ago

I'm pretty sure that memory bandwidth is only for the GPU just like on Apple silicon.

[-]

sliken 8 months ago

Apple silicon manages around 50% (giver or take) for the CPUs.

Dylan16807 8 months ago

Yeah, the most relevant diagram I can find shows 32 bytes wide per core cluster and 128 bytes to the GPU.

8 months ago

[deleted]

bhouston 8 months ago

What would you suggest instead?

It is pretty competitive on the Multi-Core rating: https://browser.geekbench.com/v6/cpu/8633320 compared to other CPUs: https://browser.geekbench.com/processor-benchmarks

jandrese 8 months ago

The benchmarks in the article suggest that more cores are largely wasted on real world applications.

[-]

ThatMedicIsASpy 8 months ago

Yes so buy according to your needs? 8 cores do not cost $500.

[-]

behringer 8 months ago

They do when those cores are 2 to 4 times faster than the rest.

Hikikomori 8 months ago

Cores or "cores"?

LorenDB 8 months ago

As a C++ programmer, I just bought a 9900X for my first PC build. Sure, it won't game as well, but I like fast compile times, and the 9900X is on sale for $380 right now. That's $100 cheaper than the 9800X3D launch price.

[-]

jeffbee 8 months ago

Yeah, these Zen 5 are killer for that kind of workload. I also replaced my workstation with a 9900-series CPU since my Intel 14900K fried itself, and I am very pleased with every aspect, except idle power consumption which is a minor drawback.

It looks like the X3D is no better than the 9900X for non-game single-threaded workloads like browsers, and it's much worse than the 12 or 16 core parts in terms of overall throughput, so for a non-gamer the plain X seems much better than the X3D.

[-]

mdre 8 months ago

What's your idle power consumption for AMD vs Intel if you don't mind me asking? I'm getting avg 125W for my 13900k build, measured at the wall and it mildly bugs me when I think of it, I thought it'd be closer to 80. And power is very expensive where I live now.

[-]

ThatMedicIsASpy 8 months ago

7950X3D, 96G, 18TBx4, 4TB NVMe x2 my GPUs are gtx1080, rx570 and the 7950x3d, FSP 1000W ATX3 platinum

I use proxmox as my OS. I have a truenas VM with passed through storage. I have a few VMs and a couple of gaming VMs (Bazzite, Fedora, NixOS)

After boot idle is around 180-200W because the GPUs don't sleep. After VMs runnning with GPUs this goes down to 110W. My drives don't spin down so thats around 20W.

jeffbee 8 months ago

If you are getting 125W at the wall on a PC at idle, your machine or operating system is extremely broken, or you are running atmosphere physics simulations all the time. The SoC on my Intel box typically drew < 1W as measured by RAPL. The 9950X draws about 18W measured the same way. Because of platform overhead the difference in terms of ratio is not that large but the Ryzen system is drawing about 40W at the wall when it's just sitting there.

[-]

zokier 8 months ago

Discrete gpu can easily add 20-40w of idle power draw, so that's something to keep in mind. I believe that 60ish watts is pretty typical idle consumption for desktop system, Ryzens typically having 10w higher idle draw than Intel. Some random reviews with whole system idle measurements:

https://hothardware.com/reviews/amd-ryzen-7-9800x3d-processo...

https://www.techpowerup.com/review/amd-ryzen-7-9800x3d/23.ht...

[-]

jeffbee 8 months ago

Those comparisons are using a water cooling rig which already blows out the idle power budget. 60W is in no way typical of PC idle power. Your basic Intel PC draws no more power than a laptop, low single digits of watts at the load, low tens of watts at the wall. My NUC12, which is no slouch, draws <5W at the wall when the display is off and when using Wi-Fi instead of Ethernet.

mdre 8 months ago

Hmm. I’m using an AIO cooler, a 3090 and a 1600W platinum psu - might be a bit inefficient. I remember unplugging the PSU and 3090 and plugging in a 650W gold PSU — the system drew 70W IIRC. That’s a wild difference still!

[-]

jeffbee 8 months ago

Yeah, oversized power supplies are also responsible for high idle power. "Gold" etc ratings are for their efficiency at 50%-100% rated power, not how well they scale down to zero, unfortunately. I have never owned a real GPU, I use the IGP or a fanless Quadro, so I don't have firsthand experience with how that impacts idle power.

[-]

zokier 8 months ago

Gold rating is down to 20%, Titanium is to 10% https://en.wikipedia.org/wiki/80_Plus#Efficiency_level_certi...

[-]

ac29 8 months ago

Platinum is 90% efficient at 20%, but OP is using a 1600W power supply, so that 20% is 320W. Any load below that is going to be less efficient.

IAmGraydon 8 months ago

I'm about to build a new system and am planning on using the 9900X. It's primarily for coding, Adobe CC, and Ableton, with maybe a rare gaming session here and there. It seems that the 9900X is the best bang for the buck right now. It games just fine, BTW.

Wytwwww 8 months ago

Intel can still be kind of faster for "productivity" stuff? At least if you are willing to pay for the >8000 MHz CUDIMMs (which i don't think AMD even supports at full speed?) which can result in pretty impressive performance. Of course the value/price is probably not great...

drumhead 8 months ago

Just seen the figures, it's ridiculously good. The gap over it's competition is staggering. I hope the Intel hubris doesn't set in at Amd, especially with the ARM pack snapping at their heels.

whalesalad 8 months ago

The last Intel machine I will ever build was my 13900K, primarily because I liked the fact that I could use cheaper DDR4 memory.

Next rig and everything for the forseeable future will be AMD. I've been a fanboy since the Athlon XP days - took a detour for a bit - but can't wait to get back.

[-]

moffkalast 8 months ago

Even if Intel wasn't chugging so badly right now, their recent handling of the overvoltage and oxidation fiasco where they only thought about covering their asses instead of working the problem would leave me with a pretty disgusting taste in my mouth if I bought anything Intel for the foreseeable future. Customer relations should mean something, just look at Noctua.

TacticalCoder 8 months ago

> I've been a fanboy since the Athlon XP days - took a detour for a bit - but can't wait to get back.

Same. But already built a 3700X and then a 7700X.

I've got this feeling the wife she's gonna upgrade her 3700X to a 7700X soon, meaning I'll get build a 9000 series AMD!

Decabytes 8 months ago

It’s frustrating how I can get a 7950x3d and 32 gb of ram for less than the price of a RTX 4080, but it underperforms poorly vs a graphics card

TacticalCoder 8 months ago

The results for decompression, but no compression, are all surprisingly bad compared to other benchmarks, how comes? For example 7-zip decompression performs worse than my 7700X (84 K mips vs 93 K mips on my 7700X). Other decompression benchs are equally depressing. But compression performs as expected (as much as 30% faster than my 7700X).

What can explain those disappointing results but only on decompression?

[-]

kevingadd 8 months ago

Modern decompression is compute-bound typically (AFAIK), not memory-bound. It is in fact common to use compression as a workaround for memory-bound workloads to turn them into compute-bound ones.

pawelduda 8 months ago

Such a gap in these gaming benchmarks.. AMD killing it

heraldgeezer 8 months ago

King CPU. Time to build a new desktop PC!

globnomulous 8 months ago

Sharing links from websites with intrusive video advertisements should be prohibited. The websites should be banned, and those who share links to them should receive a paddling.

[-]

sliken 8 months ago

Or maybe you should follow the recommendations of various government agencies (including the FBI) and install an ad blocker.

[-]

rjsw 8 months ago

The last time I viewed this particular website it detected the adblocker and complained that I was depriving the owner of income.

[-]

sliken 8 months ago

I do wish I could pay $25 a month for my web content to be ad free. Portioned out to websites I actually spent time reading.

[-]

pizza234 8 months ago

This is precisely what Scroll (1) used to do. It seems it didn't end up well, unfortunately.

(1) https://en.wikipedia.org/wiki/Scroll_(web_service)

beeboop 8 months ago

ublock origin and annoyance filters works fine for me

[-]

rjsw 8 months ago

I was using uBlock origin.

Had also seen how he had editorialized some of my mailing list posts and I felt that I would be guilty of Gell-Mann amnesia if I carried on reading the site.