Oxide Cuts Data Center Power Consumption in Half

(oxide.computer)

157 points | by tosh 11 hours ago ago

121 comments

I really love Oxide to an unhealthy amount (it's become a bit of a meme among my colleagues), but sometimes I do wonder whether they went about their go-to-market the right way. They really tried to do everything at once - custom servers, custom router, custom rack, everything. Their accomplishments are technologically impressive, but, as somebody who is in a position to make purchasing decisions, not economically attractive. They're 3x more expensive than our existing hardware, two generations behind (I'm aware they're on track for a refresh) and don't have any GPUs. E.g. what I would have loved to see is just an after-market BMC/NIC/firmware solution using their stack. Plug it into a cheap Gigabyte system (their BMC is pluggable and NIC is OCP) and just have the control plane manage it as a whole box. I'd have easily paid serveral thousand $ per server just for that. All the rack scale integration, virtualization, migration, network storage, etc stuff is cool, but not everyone needs it. Get your foot in the door at customers, build up some volume for better deals with AMD, and then start building the custom rack stuff ... Of course it's easy to be a critic from the side lines. As I said, I do really love what the Oxide folks are doing, I just really hope it'll become possible for me to buy their gear at some point.

[-]

intelVISA 2 hours ago

Oxide are doing great work. Hoping they can probe the market a bit more for us out on the sidelines preparing to drop in and compete with some similar tech.

cdchn 2 hours ago

I'm curious what their burn rate is.

preisschild 4 hours ago

Id also wish I could get to play around with a cheaper version of their tech, but they probably havw enough customers that really want a large-scale solution that is completely customizable

unsnap_biceps 8 hours ago

> When we started Oxide, the DC bus bar stood as one of the most glaring differences between the rack-scale machines at the hyperscalers and the rack-and-stack servers that the rest of the market was stuck with. That a relatively simple piece of copper was unavailable to commercial buyers

It seems that 0xide was founded in 2019 and Open Compute Project had been specifying dc bus bars for 6 years at that point. People could purchase racks if they wanted, but it seems like, by large, people didn't care enough to go whole hog in on it.

Wonder if the economics have changed or if it's still just neat but won't move the needle.

[-]

indrora 6 hours ago

You simply can't buy OCP hardware is part of the issue, not new anyway. What you're going to find is "OCP Inspired" hardware that has some overlap with the full OCP specification but is almost always meant to run on 240VAC on 19in racks because nobody wants to invest the money in something that can't be bought from CDW.

[-]

p_l 5 hours ago

I remember the one time I had OCP hardware in data center, and how it was essentially rumoured it's better to not ask too much how it got there - not the level of "fell of a truck", but some possibility it was ex-(big tech) equipment acquired through favours, or some really insistent negotiating with Quanta till "to be sold to (big tech)" racks ended up with us

zamalek 8 hours ago

It's normally incredibly difficult for employees to disrupt at massive companies that would be the type which runs a data center. Disruption usually enters the corp in a sales deck, much like the one Oxide would have.

It's stupid, but that's why we all have jobs.

[-]

hnthrowaway0328 7 hours ago

I think engineers should be more forceful to lead their own visions instead being led by accountants and lawyers.

After engineers have the power of implementation and de-implementstion. They need to step into dirty politics and bend other people's views.

It's either theirs or ours. Win-win is a fallacy.

[-]

andrewjf 6 hours ago

Being able to navigate this is what differentiates a very senior IC (principal, distinguished, etc) and random employees.

[-]

orochimaaru 5 hours ago

Yes. I think as an engineer at this level you need to also have the patience to deal with the bean counters.

But as I’ve grown in my career I’ve actually found that line of thinking refreshing. Can you quantify benefit? If it requires too many assumptions it’s probably not worth it.

But then again there’s always the Vp or the svp who wants to “showcase his towers’ innovative spirit” and then there goes money that could be used for better things. The innovative spirit of the day is random Llm apps.

philipov 7 hours ago

Let me know how that works out for you!

walrus01 8 hours ago

Things like -48VDC bus bars in the 'telco' world significantly predate the OCP, all the way back to like 1952 in the Bell system.

In general, the telco world concept hasn't changed much. You have AC grid power coming from your local utility into some BIG ASS RECTIFIERS which create -48VDC (and are responsible for charging your BIG ASS BATTERY BANK to float voltage), then various DC fuses/breakers going to distribution of -48VDC bus bars powering the equipment in a CO.

Re: Open Compute, the general concept of what they did was go to a bunch of 1U/2U server power supply manufacturers and get them to make a series of 48VDC-to-12VDC power supplies (which can be 92%+ efficient), and cut out the need for legacy 5VDC feed from power supply into ATX-derived-design x86-64 motherboards.

bigfatkitten 8 hours ago

OCP hardware is only really accessible to hyperscalers. You can't go out and just buy a rack or two, the Taiwanese OEMs don't do direct deals that small. Even if they did, no integration is done for you. You would have to integrate the compute hardware from one company, the network fabric from another company, and then the OS and everything else from yet another. That's a lot of risk, a lot of engineering resources, a lot of procurement overhead, and a lot of different vendors pointing fingers at each other when something doesn't work.

If you're Amazon or Google, you can do this stuff yourself. If you're a normal company, you probably won't have the inhouse expertise.

On the other hand, Oxide sells a turnkey IaaS platform that you can just roll off the pallet, plug in and start using immediately. You only need to pay one company, and you have one company to yell at if something goes wrong.

You can buy a rack of 1-2U machines from Dell, HPE or Cisco with VMware or some other HCI platform, but you don't get that power efficiency or the really nice control plane Oxide have on their platform.

[-]

leoc 3 hours ago

But isn’t it a little surprising (I’m not an expert) that Dell or Supermicro or somefirm like that hadn’t already started offering an approachable access to either OCP gear or a proprietary knockoff of it? Presumably that may still happen if Oxide is seen to have proven the market.

[-]

kjellsbells 42 minutes ago

Azure tried this, not with their hyperscaler stuff, but with Azure Operator Nexus.

Basically an "opinionated" combination of Dell, Arista, and Pure storage with a special Azure AKS running on top and a metric ton of management and orchestration smarts. The target customer base was telcos who needed local capabilities in their data centers and who might otherwise have gone to OCP.

As far as I can surmise, it's dead, but not EOLed. Microsoft nuked the operator business unit earlier in the year, and judging by recent job postings from contract shops, AT&T might be the only customer.

unsnap_biceps 2 hours ago

Supermicro does sell OCP racks.

https://www.supermicro.com/solutions/Solution-Brief-Supermic...

I recall them offering older versions of the specs but can't easily find a reference, so I might be wrong about how accessible they were.

TZubiri 6 hours ago

One is the specs and the other is an actual implementation, what am I missing?

LeoPanthera 6 hours ago

I really wish Oxide had homelab/prosumer grade stuff. I'd be sending them so much money.

zcw100 8 hours ago

I believe the telco’s did dc power for years so I don’t think this anything new. Any old hands out there want to school us on how it was done in the old days?

[-]

iamthepieman 8 hours ago

Every old telco technician had a story about dropping a wrench on a busbar or other bare piece of high powered transmission equipment and having to shut that center down, get out the heavy equipment, and cut it off because the wrench had been welded to the bus bars.

[-]

jclulow 7 hours ago

Note that the rack doesn't accept DC input, like lots of (e.g., NEBS certified) telco equipment. There's a bus bar, but it's enclosed within the rack itself. The rack takes single- or three-phase AC inputs to power the rectifiers, which are then attached to the internal bus bar.

walrus01 8 hours ago

big ass rectifiers

big ass solid copper busbars

huge gauge copper cables going around a central office (google "telcoflex IV")

big DC breaker/fuse panels

specialized dc fuse panels for power distribution at the top of racks, using little tiny fuses

100% overhead steel ladder rack type cable trays, since your typical telco CO was never a raised floor type environment (UNLIKE legacy 1960s/1970s mainframe computer rooms), so all the power was kept accessible by a team of people working on stepladders.

The same general thing continues today in serious telco/ISP operations, with tech features to bring it into the modern era. The rectifiers are modular now, and there's also rectiverters. Monitoring is much better. People are moving rapidly away from wet cell 2V lead acid battery banks and AGM sealed lead acid stuff to LiFePo4 battery systems.

DC fuse panels can come with network-based monitoring, ability to turn on/off devices remotely.

equipment is a whole lot less power hungry now, a telco CO that has decommed a 5ESS will find itself with a ton of empty thermal and power budget.

when I say serious telco stuff is a lot less power hungry, it's by huge margins. randomly chosen example of radio transport equipment. For instance back in the day a powerful, very expensive point to point microwave radio system might be a full 42U rack, 800W in load, with waveguide going out to antennas on a roof. It would carry one, two or three DS3 equivalent of capacity (45 Mbps each).

now, that same telco might have a radio on its CO roof in the same microwave bands that is 1.3 Gbps FDD capacity, pure ethernet with a SFP+ fiber interface built into it, and the whole radio is a 40W electrical load. The radio is mounted directly on the antenna with some UV/IR resistant weatherproof 16 gauge DC power cable running down into the CO and plugged into a fuse panel.

throw0101d 7 hours ago

See perhaps "Oxide Cloud Computer Tour - Rear":

* https://www.youtube.com/watch?v=lJmw9OICH-4

farawayea 8 hours ago

Their tech may be more than adequate today. Bigger businesses may not buy from a small startup company. They expect a lot more. Illumos is a less popular OS. It wouldn't be the first choice for the OS I'd rely on. Who writes the security mitigations for speculative execution bugs? Who patches CVEs in the shipped software which doesn't use Rust?

[-]

AlotOfReading 8 hours ago

The answer to "who does X" is Oxide. That's the point. You're not going to Dell who's integrating multiple vendors in the same box in a way that "should" work. You're getting a rack where everything is designed to work together from top to bottom.

The goal is that you can email Oxide and they'll be able fix it regardless of where it is in the stack, even down to the processor ROM.

[-]

toomuchtodo 7 hours ago

This. If you want on prem cloud infra without having to roll it yourself, Oxide is the solution.

(no affiliation, just a fan)

[-]

carlhjerpe 7 hours ago

If you want on prem infra in exactly the shape and form Oxide delivers*

I've read and understood from Joyent and SmartOS that they believe fault tolerant block devices / filesystems is the wrong abstraction, your software should handle losing storage.

[-]

eaasen 6 hours ago

We do not put the onus on customers to tolerate data loss. Our storage is redundant and spread through the rack so that if you lose drives or even an entire computer, your data is still safe. https://oxide.computer/product/storage

yencabulator 3 hours ago

And a big enough customer will evaluate Oxide's resources and consider for themselves whether they think Oxide can provide a quick enough turnaround for everything. That's what GP is talking about.

throw0101d 6 hours ago

> Bigger businesses may not buy from a small startup company.

What would you classify Shopify as?

> One existing Oxide user is e-commerce giant Shopify, which indicates the growth potential for the systems available.

* https://blocksandfiles.com/2024/07/04/oxide-ships-first-clou...

Their CEO has tweeted about it:

* https://twitter.com/tobi/status/1793798092212367669

> Who writes the security mitigations for speculative execution bugs? Who patches CVEs in the shipped software which doesn't use Rust?

Oxide.

This is all a pre-canned solution: just use the API like you would an off-prem cloud. Do you worry about AWS patching stuff? And how many people purchasing 'traditional' servers from Dell/HPe/Lenovo worry about patching links like the LOM?

Further, all of Oxide's stuff is on Github, so you're in better shape for old stuff, whereas if the traditional server vendors EO(S)L something firmware-wise you have no recourse.

[-]

cdchn 2 hours ago

How much did Shopify buy? Sounds like from what the CEO is saying they bought 1 unit.

>We learned that Oxide has so far shipped “under 20 racks,” which illustrates the selective markets its powerful systems are aimed at.

>B&F understands most of those systems were deployed as single units at customer sites. Therefore, Oxide hopes these and new customers will scale up their operations in response to positive outcomes.

Yikes. If they sold 20 racks in July, how many are they up to now?

packetlost 7 hours ago

Illumos is the OS for the hypervisor and core services, they don't expect their customers to run their code directly on that OS, but inside VMs.

steveklabnik 5 hours ago

> Bigger businesses may not buy from a small startup company.

Our early customers include government, finance, and places like Shopify.

You’re not wrong that some places may prefer older companies but that doesn’t mean they all do.

Illumos is not really directly relevant to the customer, it’s a non user facing implementation detail.

We provide security updates.

sunshowers 7 hours ago

The illumos bare-metal OS is not directly visible to customers.

mycoliza 6 hours ago

We write the security mitigations. We patch the CVEs. Oxide employs many, perhaps most, of the currently active illumos maintainers --- although I don't work on the illumos kernel personally, I talk to those folks every day.

A big part of what we're offering our customers is the promise that there's one vendor who's responsible for everything in the rack. We want to be the responsible party for all the software we ship, whether it's firmware, the host operating system, the hypervisor, and everything else. Arguably, the promise that there's one vendor you can yell at for everything is a more important differentiator for us than any particular technical aspect of our hardware or software.

arpinum 6 hours ago

How long before a VPS pops up running Oxide racks? Or, why wouldn't a VPS build on top of Oxide if they offer better efficiency and server management?

[-]

steveklabnik 5 hours ago

Someone could if they wanted to! We’ll see if anyone does.

INTPenis 6 hours ago

Because they use such esoteric software that you'll forever be reliant on Oxide.

I'd rather they use more standardized open source software like Linux, Talos, k8s, Ceph, KubeVirt. Instead of rolling it all themselves on an OS that has a very small niche ecosystem.

[-]

AceJohnny2 3 hours ago

Oxide is providing an x86 platform to run VMs/containers on. That's a commoditized market.

The value they're offering is that the rack-level consumption and management is improved over the competition, but you should be able to run whatever you want on the actual compute, k8s or whatnot.

This also means you'd not be forever reliant on Oxide.

shivak 8 hours ago

> > The power shelf distributes DC power up and down the rack via a bus bar. This eliminates the 70 total AC power supplies found in an equivalent legacy server rack within 32 servers, two top-of-rack switches, and one out-of-band switch, each with two AC power supplies

This creates a single point of failure, trading robustness for efficiency. There's nothing wrong with that, but software/ops might have to accommodate by making the opposite tradeoff. In general, the cost savings advertised by cloud infrastructure should be more holistic.

[-]

dralley 8 hours ago

>This creates a single point of failure, trading robustness for efficiency. There's nothing wrong with that, but software/ops might have to accommodate by making the opposite tradeoff.

I'll happily take a single high qualify power supply (which may have internal redundancy FWIW) over 70 much more cheaply made power supplies that stress other parts of my datacenter via sheer inefficiency, and also costs more in aggregate. Nobody drives down the highway with 10 spare tires for their SUV.

[-]

shivak 7 hours ago

A DC busbar can propagate a short circuit across the rack, and DC circuit protection is harder than AC. So of course each server now needs its own current limiter, or a cheap fuse.

But I’m not debating the merits of this engineering tradeoff - which seems fine, and pretty widely adopted - just its advertisement. The healthcare industry understands the importance of assessing clinical endpoints (like mortality) rather than surrogate measures (like lab results). Whenever we replace “legacy” with “cloud”, it’d be nice to estimate the change in TCO.

[-]

malfist 6 hours ago

DC circuit protection is absolutely not harder than AC. DC has the advantage in current flowing in only one direction, not two

[-]

paddy_m 5 hours ago

Which makes it much harder to break the circuit vs AC

[-]

wbl 2 hours ago

At 48 volts arcing shorts aren't the concern.

fracus 7 hours ago

No one drives down the highway with one tire either.

[-]

AcerbicZero 7 hours ago

Careful, unicyclists are an unforgiving bunch.

hn-throw 7 hours ago

Let's say your high quality supply's yearly failure rate is 100 times less than the cheap ones

The probability of at least a single failure is 1-(1-r)^70.

This is quite high even w/out considering the higher quality of the one supply.

The probability of all 70 going down is

r^70 which is absurdly low.

Let's say r = 0.05 or one failed supply every 20 in a year.

1-(1-r)^70 = 97% r^70 < 1E-91

The high quality supply has r = 0.0005, in between no failure and all failing. If you code can handle node failure, very many, cheaper supplies appears to be more robust.

(Assuming uncorrelated events. YMMV)

[-]

carlhjerpe 7 hours ago

Yeah but the failure rate of an analog piece of copper is pretty low, it'll keep being copper unless you do stupid things. You'll have multiple power supplies provide power on the same piece of copper

[-]

hn-throw 6 hours ago

TL/DR, isnt there a single, shared, DC supply that supplies said piece of copper? Presumably connected to mains?

Or are the running on SOFCs?

[-]

mycoliza 6 hours ago

The big piece of copper is fed by redundant rectifiers. Each power shelf has six independent rectifiers which are 5+1 redundant if the rack is fully loaded with compute sleds, or 3+3 redundant if the rack is half-populated. Customers who want more redundancy can also have a second power shelf with six more rectifiers.

[-]

hn-throw 2 hours ago

I'm going to assume this is on 3 phase power, but how is the ripple filtered?

sunshowers 7 hours ago

Look very carefully at the picture of the rack at https://oxide.computer/ :) there are two power shelves in the middle, not one.

We're absolutely aware of the tradeoffs here and have made quite considered decisions!

jsolson 7 hours ago

The bus bar itself is an SPoF, but it's also just dumb copper. That doesn't mean that nothing can go wrong, but it's pretty far into the tail of the failure distribution.

The power shelf that keeps the busbar fed will have multiple rectifiers, often with at least N+1 redundancy so that you can have a rectifier fail and swap it without the rack itself failing. Similar things apply to the battery shelves.

[-]

immibis 7 hours ago

It's also plausible to have multiple power supplies feeding the same bus bar in parallel (if they're designed to support this) e.g. one at each end of a row.

[-]

eaasen 6 hours ago

This is how our rack works (Oxide employee). In each power shelf, there are 6 power supplies and only 5 need to be functional to run at full load. If you want even more redundancy, you can use both power shelves with independent power feeds to each so even if you lose a feed, the rack still has 5+1 redundant power supplies.

sidewndr46 7 hours ago

This isn't even remotely close. Unless all 32 servers have redundant AC power feeds present, you've traded one single point of failure for another single point of failure.

In the event that all 32 servers had redundant AC power feeds, you could just install a pair of redundant DC power feeds.

[-]

gruez 7 hours ago

>Unless all 32 servers have redundant AC power feeds present, you've traded one single point of failure for another single point of failure.

Is this not standard? I vaguely remember that rack severs typically have two PSUs for this reason.

[-]

glitchcrab 7 hours ago

It's highly dependent on the individual server model and quite often how you spec it too. Most 1U Dell machines I worked with in the past only had a single slot for a PSU, whereas the beefier 2U (and above) machines generally came with 2 PSUs.

[-]

thfuran 6 hours ago

But 2 PSUs plugged into the same AC supply still have a single point of failure.

sidewndr46 7 hours ago

you could have 15 PSUs in a server. It doesn't mean they have redundant power feeds

jeffbee 6 hours ago

Rack servers have two PSUs because enterprise buyers are gullible and will buy anything. Generally what happens in case of a single PSU failure is the other PSU also fails or it asserts PROCHOT which means instead of a clean hard down server you have a slow server derping along at 400MHz which is worse in every possible way.

walrus01 8 hours ago

The whole thing with eliminating 70 discrete 1U server size AC-to-DC power supplies is nothing new. It's the same general concept as the power distribution unit in the center of an open compute platform rack design from 10+ years ago.

Everyone who's doing serious datacenter stuff at scale knows that one of the absolute least efficient, labor intensive and cabling intensive/annoying ways of powering stuff is to have something like a 42U cabinet with 36 servers in it, each of them with dual power supplies, with power leads going to a pair of 208V 30A vertical PDUs in the rear of the cabinet. It gets ugly fast in terms of efficiency.

The single point of failure isn't really a problem as long as the software is architected to be tolerant of the disappearance of an entire node (mapping to a single motherboard that is a single or dual cpu socket config with a ton of DDR4 on it).

[-]

formerly_proven 8 hours ago

That’s one reason why 2U4N systems are kinda popular. 1/4 the cabling in legacy infrastructure.

jeffbee 6 hours ago

PDUs are also very failure-prone and not worth the trouble.

MisterTea 7 hours ago

> This creates a single point of failure,

Who told you there is only one PSU in the power shelf?

walrus01 8 hours ago

They do have a good point here. If you do the total power budget on a typical 1U (discrete chassis, not blade) server which is packed full of a wall of 40mm fans pushing air, the highest speed screaming 40mm 12VDC fans can be 20W electrical load each. It's easy to "spend" at least 120W at maximum heat from the CPUs, in a dual socket system, just on the fans to pull air from the front/cold side of the server through to the rear heat exhaust.

Just going up to 60mm or 80mm standard size DC fans can be a huge efficiency increase in watt-hours spent per cubic meters of air moved per hour.

I am extremely skeptical of the "12x" but using larger fans is more efficient.

from the URL linked:

> Bigger fans = bigger efficiency gains Oxide server sleds are designed to a custom form factor to accommodate larger fans than legacy servers typically use. These fans can move more air more efficiently, cooling the systems using 12x less energy than legacy servers, which each contain as many as 7 fans, which must work much harder to move air over system components.

[-]

eaasen 6 hours ago

FWIW, we had to have the idle speed of our fans lowered because the usual idle of around 5k RPM was WAY too much cooling. We generally run our fans at around 2.5kRPM (barely above idle). This is due to not only the larger fans, but also the fact that we optimized and prioritized as little restriction on airflow as possible. If you’ve taken apart a current gen 1U/2U server and then compare that to how little our airflow is restricted and how little our fans have to work, the 12X reduction becomes a bit clearer.

ZeroCool2u 7 hours ago

If any Oxide staff are here, I'm just curious, is BlueSky a customer? Seems like it would fit well with their on-prem setup.

[-]

mkeeter 6 hours ago

Nope, but many of us (Oxide staff) are big fans of what Bluesky is doing!

One of the Bluesky team members posted about their requirements earlier this month, and why Oxide isn't a great fit for them at the moment:

https://bsky.app/profile/jaz.bsky.social/post/3laha2upw3k2z

[-]

AceJohnny2 3 hours ago

> Also prices don't make sense for us.

Oof.

danpalmer 7 hours ago

Not Oxide or Bluesky, but firstly I'd suggest that asking the company about their customers is unlikely to get a response, most companies don't disclose their customers. Secondly, Bluesky have been growing quickly, I can only assume their hardware is too, and that means long lead time products like an Oxide rack aren't going to work, especially when you can have an off the shelf machine from Dell delivered in a few days.

[-]

steveklabnik 5 hours ago

Oxide is very open, we are happy to talk about customers that allow us to talk about them. Some don’t want to, others are very happy to be mentioned, just like any other company.

[-]

danpalmer 4 hours ago

> we are happy to talk about customers that allow us to talk about them

This is what I meant by "don't disclose", I didn't mean that Oxide was in any way secretive, but that usually this stuff doesn't get agreed, and that it would make more sense to ask the customer rather than the company selling as Oxide won't want to disclose unless there's already an agreement in place (formal or otherwise).

[-]

steveklabnik 4 hours ago

Gotcha. That totally makes sense, I would t have thought about it that way.

ramon156 6 hours ago

> most companies dont disclose their customers

In my head I'm imagining an average landing page. They slap their customers on there like stickers. I doubt bluesky would stay secretive about using oxide if they did

[-]

slyall 5 hours ago

Those customers listed on the front page of companies are there as part of an agreement. Usually something like a discount. Certainly they are not listed without permission. 10x that if it is a case study.

[-]

danpalmer 4 hours ago

I think they often are listed without permission unfortunately, and often literally based on on the the email addresses of people signing up for a trial. I see my company's logo on the landing page of many products that we don't use or may even have a policy preventing our use of.

tptacek 6 hours ago

events.bsky appears to be hosted on OVH. Single-product SAAS companies less than a few years old are unlikely to be a major customer cohort for Oxide.

renewiltord 8 hours ago

What I don't get is why tie to such an ancient platform. AMD Milan is my home lab. The new 9004 Epycs are so much better on power efficiency. I'm sure they've done their market research and the gains must be so significant. We used to have a few petabytes and tens of thousands of cores almost ten years ago and it's crazy how much higher data and compute density you can get with modern 30 TiB disks and Epyc 9654s. 100 such nodes and you have 10k cores and really fast data. I can't see myself running a 7003-series datacenter anymore unless the Oxide gains are that big.

[-]

farawayea 8 hours ago

They've built this a while ago. A hardware refresh takes time. The good news is that they may be able to upgrade the existing equipment with newer sleds.

[-]

jclulow 7 hours ago

Yes we're definitely building the next generation of equipment to fit into the existing racks!

louwrentius 6 hours ago

I’m rooting for solutions like this as an alternative to the public cloud. I do see that an org would rely on one company that theoretically can do a ‘Broadcom VMware’ on them but I don’t get this vibe from 0x1d3 at all.

But they target large orgs, I wish a solution like this would be accessible for smaller companies.

I wish I could throw their stack on my second hand cots hardware, rent a few U’s in two colos for geo redundancy and cry of happiness each month realizing how much money we save on public cloud cost, yet having cloud capabilities/benefits

grecy 9 hours ago

I'm amazed Apple don't have a rack mount version of their M series chips yet.

Even for their own internal use in their data centers they'd have to save an absolute boat load on power and cooling given their performance per watt compared to legacy stuff.

[-]

bayindirh 9 hours ago

Oxide is not touching DLC systems in their post even with a 100ft barge pole.

Lenovo's DLC systems use 45 degrees C water to directly cool the power supplies and the servers themselves (water goes through them) for > 97% heat transfer to water. In cooler climates, you can just pump this to your drycoolers, and in winter you can freecool them with just air convection.

Yes, the TDP doesn't go down, but cooling costs and efficiency shots up considerably, reducing POE to 1.03 levels. You can put tremendous amount of compute or GPU power in one rack, and cool them efficiently.

Every chassis handles its own power, but IIRC, all the chassis electricity is DC. and the PSUs are extremely efficient.

rincebrain 9 hours ago

I don't think they'd admit much about it even if they had one internally, both because Apple isn't known for their openness about many things, and because they already exited the dedicated server hardware business years ago, so I think they're likely averse to re-entering it without very strong evidence that it would be beneficial for more than a brief period.

In particular, while I'd enjoy such a device, Apple's whole thing is their whole-system integration and charging a premium because of it, and I'm not sure the markets that want to sell people access to Apple CPUs will pay a premium for a 1U over shoving multiple Mac Minis in the same 1U footprint, especially if they've already been doing that for years at this point...

...I might also speculate that if they did this, they'd have a serious problem, because if they're buying exclusive access to all TSMC's newest fab for extended intervals to meet demand on their existing products, they'd have issues finding sources to meet a potentially substantial demand in people wanting their machines for dense compute. (They could always opt to lag the server platforms behind on a previous fab that's not as competed with, of course, but that feels like self-sabotage if they're already competing with people shoving Mac Minis in a rack, and now the Mac Minis get to be a generation ahead, too?)

[-]

AceJohnny2 8 hours ago

I will add that consumer macOS is a piss-poor server OS.

At one point, for many years, it would just sometimes fail to `exec()` a process. This would manifest as a random failure on our build farm about once/twice a month. (This would manifest as "/bin/sh: fail to exec binary file" because the error type from the kernel would have the libc fall back to trying to run the binary as a script, as normal for a Unix, but it isn't a script)

This is likely stemming from their exiting the server business years ago, and focusing on consumer appeal more than robustness (see various terrible releases, security- and stability-wise).

(I'll grant that macOS has many features that would make it a great server OS, but it's just not polished enough in that direction)

[-]

AceJohnny2 8 hours ago

> as normal for a Unix

veering offtopic, did you know macOS is a certified Unix?

https://www.opengroup.org/openbrand/register/brand3581.htm

As I recall, Apple advertised macOS as a Unix without such certification, got sued, and then scrambled to implement the required features to get certification as a result. Here's the story as told by the lead engineer of the project:

https://www.quora.com/What-goes-into-making-an-OS-to-be-Unix...

[-]

jorams 6 hours ago

This comes up rather often, and on the last significant post about it I saw on HN someone pointed out that the certification is kind of meaningless[1]. macOS poll(2) is not Unix-compliant, hasn't been since forever, yet every new version of macOS gets certified regardless.

[1]: https://news.ycombinator.com/item?id=41823078

autoexecbat 8 hours ago

and Windows used to be certified for posix, but none of that matters theses days if it's not bug-compatible with Linux

rincebrain 8 hours ago

Did that ever get fixed? That...seems like a pretty critical problem.

[-]

AceJohnny2 5 hours ago

Yes, it quietly stopped happening a few years ago, sometime since 2020.

outworlder 5 hours ago

> I will add that consumer macOS is a piss-poor server OS.

Windows is also abysmal but it hasn't stopped people from using it.

But yes, it is too much of a desktop OS.

thatfrenchguy 9 hours ago

There is a rack mount version of the Mac Pro you can buy

[-]

bigfatkitten 7 hours ago

That's designed for the broadcast market, where they rack mount everything in the studio environment. It's not really a server, it has no out of band management, redundant power etc.

There are third party rack mounts available for the Mac Mini and Mac Studio also.

[-]

wpm 5 hours ago

Rack mount models have LOM over MDM.

walrus01 8 hours ago

Companies buying massive cloud scale server hardware want to be able to choose from a dozen different Taiwanese motherboard manufacturers. Apple is in no way motivated to release or sell the M3/M4 CPUs as a product that major east asia motherboard manufacturers can design their own platform for. Apple is highly invested in tightly integrated ecosystems where everything is soldered down together in one package as a consumer product (take a look at a macbook air or pro motherboard for instance).

jauntywundrkind 9 hours ago

For who? How would this help their core mission?

Maybe it becomes a big enough profit center to matter. Maybe. At the risk of taking focus away, splitting attention from the mission they're on today: building end user systems.

Maybe they build them for themselves. For what upside? Maybe somewhat better compute efficiency maybe, but I think if you have big workloads the huge massive AMD Turin super-chips are going to be incredibly hard to beat.

It's hard to emphasize just how efficient AMD is, with 192 very high performance cores on a 350-500W chip.

[-]

favorited 6 hours ago

> Maybe they build them for themselves. For what upside?

They do build it for themselves. From their security blog:

"The root of trust for Private Cloud Compute is our compute node: custom-built server hardware that brings the power and security of Apple silicon to the data center, with the same hardware security technologies used in iPhone, including the Secure Enclave and Secure Boot. We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing."

<https://security.apple.com/blog/private-cloud-compute/>

[-]

jauntywundrkind 2 hours ago

This is such a narrow narrow tiny corner of computing needs. That has such serious need for ownership, no matter the cost. And has extremely fantastically chill as shit overall computing needs, is un-perfomamce-sensitive as it gets.

I could not be less convinced by this information that this is a useful indicator for the other 99.999999999% of computing needs.

throawayonthe 9 hours ago

(some of?) their servers do run apple silicon: https://security.apple.com/blog/private-cloud-compute/

einpoklum 8 hours ago

> How can organizations reduce power consumption and corresponding carbon emissions?

Stop running so much useless stuff.

Also maybe ARM over x86_64 and similar power-efficiency-oriented hardware.

Rack-level system design, or at least power & cooling design, is certainly also a reasonable thing to do. But standardization is probably important here, rather than some bespoke solution which only one provider/supplier offers.

> How can organizations keep pace with AI innovation as existing data centers run out of available power?

Waste less energy on LLM chatbots?

[-]

zamadatix 7 hours ago

Current ARM servers actually generally offer "on par" (varies by workload) perf/Watt for generally worse absolute performance (varies by workload) i.e. require more other overhead to achieve the same total perf despite "on par" perf/Watt.

Need either Apple to get into the general market server business or someone to start designing CPUs as well as Apple (based on the comparison between different ARM cores I'm not sure it really matters if they do so using a specific architecture or not).

[-]

p_l 5 hours ago

It's more a case of selection of optimization parameters and corresponding economy. It's not so much that apple towers over others in design (though they are absolutely no slouches and have wins there) but their design team is in position to coordinate with product directly and as such isn't as limited by "but will it sell in high enough numbers for the excel sheet at investor's desk?"

The real show stopper for years is that ARM servers are just not prepared to be a proper platform. uBoot with grudgingly included FDT (after getting kicked out of Linux kernel) does not make a proper platform, and often there's also no BMC, unique approaches to various parts making the server that one annoying weirdo in the data center, etc.

Cloud providers can spend the effort to backfill necessary features with custom parts, but doing so on your own on-prem is hard

[-]

zamadatix 40 minutes ago

Not sure what you mean wrt to Apple's uniqueness. AMD/Mediatek/Intel/Qualcomm/Samsung only make margin on how well they invest on their designs vs their competitors and they'd all love to be outshipping each other and Apple in any market. All, including Apple, also rely on the same manufacturer for their top products and the ones (Intel/Samsung) with alternatives have not been able to use that as an advantage for top performing products. Sure, Apple can work directly with their own product... but at the end of the day the goal and available customer pool to fight over is the same and they still ship fewer units than the others.

I'm not hands-on familiar with other serious ARM server market players but for several years now Ampere ARM server CPUs at least are nothing like you describe. Phoronix says it best in https://www.phoronix.com/review/linux-os-ampereone

> All the Linux distributions I attempted worked out effortlessly on this Supermicro AmpereOne server. Like with Ampere Altra and Ampere eMAG before that, it's a seamless AArch64 Linux experience. Thanks to supporting open standards like UEFI, Arm SBSA/SBBR and ACPI and not having to rely on DeviceTrees or other nuisances, installing an AArch64 Linux distribution on Ampere hardware is as easy as in the x86_64 space.

kev009 6 hours ago

Where is the GPU?

[-]

steveklabnik 5 hours ago

We don’t currently have GPUs in the product. The closed-ness of the GPU space is a bit of a cultural difference, but we’ll surely have something eventually. As a small company, we have to focus on our strengths, and there’s plenty of folks who don’t need GPUs right now.

[-]

kev009 5 hours ago

That's fine, just awkward because the GS report shows the TAM or problem depending on your perspective being accelerated computing.

[-]

steveklabnik 5 hours ago

For sure. It’s not just GPUs; given that we have one product with three SKUs, there’s a variety of workloads we won’t be appropriate for just yet. Just takes time to diversify the offering.

kev507 6 hours ago

maybe the real GPU was the friends we made along the way

PreInternet01 9 hours ago

"If only they used DC from the wall socket, all those H100s would be green" is, not, I think, the hill you want to die on.

But, yeah, my three 18MW/y racks agree that more power efficiency would be nice, it's just that Rewrite It In (Safe) Rust is unlikely to help with that...

[-]

yjftsjthsd-h 9 hours ago

> it's just that Rewrite It In (Safe) Rust is unlikely to help with that...

I didn't see any mention of Rust in the article?

[-]

PreInternet01 8 hours ago

It's pretty much the raison d'être of Oxide. But carry on...

[-]

bigfatkitten 8 hours ago

They wrote their own BMC and various other bits and pieces in Rust. That's an extremely tiny part of the whole picture.

[-]

steveklabnik 5 hours ago

It’s significantly more than that, but it’s also true that we include stuff in other languages where appropriate. CockroachDB is in Go, and illumos is in C, as two examples. But almost all new code we write is in Rust. That is the stuff you’re talking about, but also like, our control plane.

Oh and we write a lot of Typescript too.

transpute 7 hours ago

OSS Rust in Rack trenchcoat.

sophacles 8 hours ago

That's an interesting take. What's your reasoning? Whats your evidence?

[-]

0x457 4 hours ago

Pretty much everything Oxide publishes on github is either in rust or it's an sdk to service in rust. Well and web panel isn'tin rust, so negative points for that, true evangelists would have used WASM.

But Oxide reason to exist is to keep memory of cool racks from Sun running Solaris alive forever.

murderfs 5 hours ago

The raison d'être of Oxide isn't Rust, it's continuing to pretend that the bloated corpse of Solaris still has some signs of life.

[-]

yjftsjthsd-h 39 minutes ago

https://github.com/illumos/illumos-gate/commits/master/ looks alive to me.

(And for that matter, Oracle's proprietary Solaris seems better maintained than I ever expected, though in this context I think the open source fork is the relevant thing to look at.)