One of the more detrimental aspects of the Cloud Tax is that it constrains the types of solutions engineers even consider.
Picking an arbitrary price point of $200/mo, you can get 4(!) vCPUs and 16GB of RAM at AWS. Architectures are different etc., but this is roughly a mid-spec dev laptop of 5 or so years ago.
At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money. It's hard to overstate how far apart these machines are in raw computational capacity.
There are approaches to problems that make sense with 10x the capacity that don't make sense on the much smaller node. Critically, those approaches can sometimes save engineering time that would otherwise go into building a more complex system to manage around artificial constraints.
Yes, there are other factors like durability etc. that need to be designed for. But going the other way, dedicated boxes can deliver more consistent performance without worries of noisy neighbors.
It's more than that - it's all the latency that you can remove from the equation with your bare-metal server.
No network latency between nodes, less memory bandwidth latency/contention as there is in VMs, no caching architecture latency needed when you can just tell e.g. Postgres to use gigs of RAM and then let Linux's disk caching take care of the rest (and not need a separate caching architecture).
The difference between a fairly expensive ($300) RDS instance + EC2 in the same region vs a $90 dedicated server with a NVME drive and postgres in a container is absolutely insane.
A fair comparison would include the cost of the DBA who will be responsible for backups, updates, monitoring, security and access control. That’s what RDS is actually competing with.
Paying someone $2000 to set that up once should result in the costs being recovered in what, 18 months?
If you’re running Postgres locally you can turn off the TCP/IP part; nothing more to audit there.
SSH based copying of backups to a remote server is simple.
If not accessible via network, you can stay on whatever version of Postgres you want.
I’ve heard these arguments since AWS launched, and all that time I’ve been running Postgres (since 2004 actually) and have never encountered all these phantom issues that are claimed as being expensive or extremely difficult.
I do consulting in this space, and we consistently make more money from people who insist on using cloud services, because their setups tend to need far more work.
Yeah but AWS SRE are what making the big bucks! Soooo what can you do? It is nice to see many people here on HN are supporting open network and platform and making very drastic comments as to encouraging google engineers to quite their jobs.
I totally also understand why some people with family to support mortgage to pay they can't just walk way from a job at FAANG or MAMAA type place.
Looking at your comparison, this point it just seems like a scam.
100% this add an embedded database like sqlite and optimise writes to batch and you can go really really far with hetzner. It's also why I find the "what about overprovisioning" argument silly (once you look outside of AWS you can get insane cost/perf ratio).
Also in my experience more complex systems tend to have much less reliability/resilience than simple single node systems. Things rarely fail in isolation.
I think it’s the other way around. I’m a huge fan of Hetzner for small sites with a few users. However, for bigger projects, the cloud seems to offer a complete lack of constraints. For projects that can pay for my time, $200/m or $2000/m in hosting costs is a negligible difference.
What’s the development cost difference between AWS CDK / Terraform + GitHub Actions vs. Docker / K8s / Ansible + any CI pipeline? I don’t know; in my experience, I don’t see how “bare metal” saves much engineering time. I also don’t see anything complicated about using an IaC Fargate + RDS template.
Now, if you actually need to decouple your file storage and make it durable and scalable, or need to dynamically create subdomains, or any number of other things… The effort of learning and integrating different dedicated services at the infrastructure level to run all this seems much more constraining.
I’ve been doing this since before the “Cloud,” and in my view, if you have a project that makes money, cloud costs are a worthwhile investment that will be the last thing that constrains your project. If cloud costs feel too constraining for your project, then perhaps it’s more of a hobby than a business—at least in my experience.
Just thinking about maintaining multiple cluster filesystems and disk arrays—it’s just not what I would want to be doing with most companies’ resources or my time. Maybe it’s like the difference between folks who prefer Arch and setting up Emacs just right, versus those happy with a MacBook. If I felt like changing my kernel scheduler was a constraint, I might recommend Arch; but otherwise, I recommend a MacBook. :)
On the flip side, I’ve also tried to turn a startup idea into a profitable project with no budget, where raw throughput was integral to the idea. In that situation, a dedicated server was absolutely the right choice, saving us thousands of dollars. But the idea did not pan out. If we had gotten more traction, I suspect we would have just vertically scaled for a while. But it’s unusual.
> I really don't see how "bare metal" saves any engineering time
This is because you are looking only at provisioning/deployment. And you are right -- node size does not impact DevOps all that much.
I am looking at the solution space available to the engineers who write the software that ultimately gets deployed on the nodes. And that solution space is different when the nodes have 10x the capability. Yes, cloud providers have tons of aggregate capability. But designing software to run on a fleet of small machines is very different from accomplishing the same tasks on a single large machine.
It would not be controversial to suggest that targeting code at an Apple Watch or Raspberry Pi imposes constraints on developers that do not exist when targeting desktops. I am saying the same dynamic now applies to targeting cloud providers.
This isn't to say there's a single best solution for everything. But there are tradeoffs that are now always apparent. The art is knowing when it makes sense to pay the Cloud Tax, and whether to go 100% Cloud vs some proportion of dedicated.
It really depends on what you are doing. But when you factor the network features, the ability to scale the solution, etc you get alot of stuff inside that $200/mo EC2 device. The product is more than the VM.
That said, with a defined workload without a ton of variation or segmentation needs there are lots of ways to deliver a cheaper solution.
I agree that AWS EC2 is probably too expensive on the whole. It also doesn't really provide any of the greater benefits of the cloud that come from "someone else's server".
However, to the point of microservices as the article mentions, you probably should look at lambda (or fargate, or a mix) unless you can really saturate the capacity of multiple servers.
When we swapped to ECS+EC2 running microservices over to lambda our costs dropped sharply. Even serving millions of requests a day we spend a lot of time in between idle, especially spread across the services.
Additionally, we have 0 outages now from hardware in the last 5 years. As an engineer, this has made my QoL significantly better.
On AWS if you want raw computational capacity you use Lambda and not EC2. EC2 is for legacy type workloads and doesn't have nearly the same scaling power and speed that Lambda does.
I have several workloads that just invoke Lambda in parallel. Now I effectively have a 1000 core machine and can blast through large workloads without even thinking about it. I have no VM to maintain or OS image to consider or worry about.
Which highlights the other difference that you failed to mention. Hertzner charges a "one time setup" fee to create that VM. That puts a lot of back pressure on infrastructure decisions and removes any scalability you could otherwise enjoy in the cloud.
If you want to just rent a server then Hertzner is great. If you actually want to run "in the cloud" then Hertzner is a non-starter.
Strong disagree here. Lambda is significantly more expensive per vCPU hour and introduces tight restrictions on your workflow and architecture, one of the most significant being maximum runtime duration.
Lambda is a decent choice when you need fast, spiky scaling for a lot simple self-contained tasks. It is a bad choice for heavy tasks like transcoding long videos, training a model, data analysis, and other compute-heavy tasks.
It's almost exactly the same price as EC2. What you don't get to control is the mix of vCPU and RAM. Lambda ties those two together. For equivalent EC2 instances the cost difference is astronomically small, on the order of pennies per month.
> like transcoding long videos, [...] data analysis, and other compute-heavy tasks
If you aren't breaking these up into multiple smaller independent segments then I would suggest that you're doing this wrong in the first place.
> training a model
You're going to want more than what a basic EC2 instance affords you in this case. The scaling factors and velocity are far less of a factor.
That's fine, except for all of Lambda's weird limitations: request and response sizes, deployment .zip sizes, max execution time, etc. For anything complicated you'll eventually you run into all this stuff. Plus you'll be locked into AWS.
If either of these exceed the limitations of the call, which is 6MB or 256kB depending on call type, then you can just use S3. For large distributed task coordination you're going to be doing this anyways.
> deployment .zip sizes
Overlays exist and are powerful.
> max execution time
If your workload depends on long uninterrupted runs of time on single CPUs then you have other problems.
> Plus you'll be locked into AWS.
In the world of serverless your interface to the endpoints and semantics of Lambda are minimal and easily changed.
I don't think that negates the point I was making. Most don't, for example none of the providers on https://www.serversearcher.com/ seem to charge setup.
The one big box assumes that you know how to configure everything for high performance. I suspect that skill has been lost, for the most part.
You really need to tweak the TCP/IP stack, buffer sizes, and various other things to get everything to work really well under heavy load. I'm not sure if the various sites that used to talk about this have been updated in the last decade or so, because I don't do that anymore.
I mean, you'll run out of file descriptors pretty quickly if you try to handle a few hundred simultaneous connections. Doesn't matter how big your box is at that point.
I helped bootstrap a company that made an enterprise automation engine. The team wanted to make the service available as SaaS for boosting sales.
They could have got the job done by hosting the service in a vps with a multi-tenant database schema. Instead, they went about learning kubernetes and drillingg deep into "cloud-native" stack. Spent a year trying to setup the perfect devops pipeline.
Not surprisingly the company went out of business within the next few years.
This is my experience too—there’s too much time wasted trying to solve a problem that might exist 5 years down the road. So many projects and early-stage companies would be just fine either with a PaaS or nginx in front of a docker container. You’ll know when you hit your pain point.
Yep, this is why I'm a proponent of paas until the bill actually hurts. Just pay the heroku/render/fly tax and focus on product market fit. Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
> Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
I mean, of the two, the PaaS route certainly burns more money, the exception being the rare shop that is so incompetent they can't even get their own infrastructure configured correctly, like in GP's situation.
There are guaranteed more shops that would be better off self-hosting and saving on their current massive cloud bills than the rare one-offs that actually save so much time using cloud services, it takes them from bankruptcy to being functional.
Does it? Vercel is $20/month and Neon starts at $5/month. That obviously goes up as you scale up, but $25/month seems like a fairly cheap place to start to me.
(I don't work for Vercel or Neon, just a happy customer)
Yeah, same. Vercel + Neon and then if you actually have customers and actually end up paying them enough money that it becomes significant, then you can refactor and move platforms, but until you do, there are bigger fish to fry.
A lot of the time businesses just aren't that important. The amount places I've seen that stress over uptime when nothing they run is at all critical. Hell you could drop the production environment in the middle of the day and yes it would suck and you'd get a few phone calls but life would go on.
These companies all ended up massively increasing their budgets switching to cloud workloads when a simple server in the office was easily enough for their 250 users. Cloud is amazing for some uses and pure marketing BS for others but it seems like a lot of engineers aim for a perfect scalable solution instead of one that is good enough.
A thoroughly good article. It's probably worth also considering adding a CDN if you take this approach at scale. You get to use their WAF and DNS failover.
A big pain point that I personally don't love is that this non-cloud approach normally means running my own database. It's worth considering a provider who also provides cloud databases.
If you go for an 'active/passive' setup, consider saving even more money by using a cloud VM with auto scaling for the 'passive' part.
In terms of pricing the deals available these days on servers are amazing you can get 4GB RAM VPSs with decent CPU and bandwidth for ~$6 or bare metal for ~$90 for 32GB RAM quad core worth using sites like serversearcher.com to compare.
no PITB, but mostly just 'it's hassle' for the application server I literally don't need backups, just automated provisioning/docker container etc. Adding postgres then means I need full backups including PITB because I don't even want to lose an hours data.
If you're running on a single machine then you'll get way more performance with something like sqlite (instead of postgres/MySQL) which also makes managing the database quite trivial.
SQLite has serious concurrency concerns which have to be evaluated. You should consider running postgres or mysql/mariadb even if it's on the same server.
SQLite uses one reader/writer lock over the whole database. When any thread is writing the database, no other thread is reading it. If one thread is waiting to write, new reads can't begin. Additionally, every read transaction starts by checking if the database has changed since last time, and then re-loading a bunch of caches.
This is suitable for SQLite's intended use case. It's most likely not suitable for a server with 256 hardware threads and a 50Gbps network card. You need proper transaction and concurrency control for heavy workloads.
Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
SQLite is lite. Use it for lite things, not hevy things.
Is any SQL database suitable for 50GBps of network traffic hitting it?
Most if not all of your concerns with SQLite are simply a matter of not using the default configuration. Enable WAL mode, enable strict mode, etc. and it's a lot better.
Agree on many things here, but SQLite does support WAL mode which supports 1 writer/N writer readers with snapshot isolation on reads. Writes are serialized but still quite fast.
SQLite (actually SQL-ite, like a mineral) maybe be light, but so are many workloads these days. Even 1000 queries per second is quite doable with SQLite and modest hardware, and I've worked at billion dollar businesses handling fewer queries than that.
Regardless of the cost and capacity analysis, it's just hard to fight the industry trends. The benefits of "just don't think about hardware" are real. I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front). And above all, if an AWS region goes down, it doesn't seem like your org's fault, but if your bespoke private hosting arrangement goes down, then that kinda does seem like your org's fault.
> Can you explain on this claim, beyond what the article mentioned?
I run a lambda behind a load balancer, hardware dies, its redundant, it gets replaced. I have a database server fail, while it re provisions it doesn't saturate read IO on the SAN causing noisy neighbor issues.
I don't deal with any of it, I don't deal with depreciation, I don't deal with data center maintenance.
> I don't deal with depreciation, I don't deal with data center maintenance.
You don't deal with that either if you rent a dedicated server from a hosting provider. They handle the datacenter and maintenance for you for a flat monthly fee.
They do rely on you to tell them if hardware fails, however, and they'll still unplug your server and physically fix it. And there's a risk they'll replace the wrong drive in your RAID pair and you'll lose all your data - this happens sometimes - it's not a theoretical risk.
But the cloud premium needs reiteration: twenty five times. For the price of the cloud server, you can have twenty-five-way redundancy.
> I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front).
Yes, there is.
Honestly, it looks to me that this school of thought is mostly adopted by people that can't do arithmetic or use a calculator. But it does absolutely exist.
That said, no, servers are not nearly expensive enough to move the needle on a company nowadays. The room that often goes around them is, and that's why way more people rent the room than the servers in it.
I ran the IT side of a media company once, and it all worked on a half-empty rack of hardware in a small closet... except for the servers that needed bandwidth. These were colocated. Until we realized that the hoster did not have enough bandwidth, at which point we migrated to two bare metal servers at Hetzner.
To be clear - this isn't an endorsement on my part, just observations of why cloud-only deployment seems common. I guess we shouldn't neglect the pressure towards resume-oriented development either, as it undoubtedly plays a part in infra folks' careers. It probably makes you sound obsolete to be someone who works in a physical data center.
I for one really miss being able to go see the servers that my code runs on. I thought data centers were really interesting places. But I don't see a lot of effort to decide things based on pure dollar cost analysis at this point. There's a lot of other industry forces besides the microeconomics that predetermine people's hosting choices.
I often wonder if my home NAS/Server would be better off put onto a rented box or a cloud server somewhere, especially since I now have 1gbit/s internet. Even now the 20TB of drive space and 6 Cores with 32GB on Hetzner with a dedicated is about twice the price of buying the hardware over a 5 year period. I suspect the hardware will actually last longer than that and its the same level of redundancy (RAID) on a rented dedicated so the backup is the same cost between the two.
Using cloud and box storage on Hetzner is more expensive than the dedicated server, 4x owning the hardware and paying the power bill. AWS and Azure are just nuts, >100x the price because they charge so much for storage even with hard drives. Contabo nor Netcup can do this, its too much storage for them.
Every time I look at this I come to the same basic conclusion, the overhead of renting someone else’s machine is quite high compared to the hardware and power cost and it would be a worse solution than having that performance on the local network for bandwidth and latency. The problem isn't so much the compute performance, that is relatively fairly priced, its the storage costs and data transfer that bites.
Not really what the article was necessarily about but cloud is sort of meant to be good for low end hardware but its actually kind of not, the storage costs are just too high even a Hetzner Storage box.
It really depends on your power costs. In certain parts of Europe, power is so expensive that Hetzner actually works out cheaper (despite them providing you the entire machine and datacenter-grade internet connection).
I think I’ve settled on both being the answer - Hetzner is affordable enough that I can have a full backup of my NAS (using ZFS snapshots and incremental backups), and as a bonus can host some services there instead of at home. My home network still has much lower latency and so is preferable for ie. my Lightroom library.
Microservices vs not is (almost) orthogonal to N servers vs one. You can make 10 microservices and rent a huge server and run all 10 services. It's more an organizational thing than a deployment thing. You can't do the opposite though, make a monolith and spread it out on 10 servers.
> You can't do the opposite though, make a monolith and spread it out on 10 servers.
Yes you can. Its called having multiple applications servers. They all run the same application, just more of them. Maybe they connect to the same DB, maybe not, maybe you shard the DB.
This isn't even the end game for "one big server". AMD will give the most bang per rack, but there are other factors.
An IBM z17 is effectively one big server too, but provides levels of reliability that are simply not available in most IT environments. It won't outperform the AMD rack, but it will definitely keep up for most practical workloads.
If you sit down and really think honestly about the cost of engineering your systems to an equivalent level of reliability, you may find the cost of the IBM stack to be competitive in a surprising number of cases.
That’s because 75% (citation: wild-ass estimate) of tech workers are incapable of critical thinking, and blindly parrot whatever they’ve heard / read. The number of times I’ve seen something on HN, thought “that doesn’t sound right,” and then spent a day disproving it locally is too damn high. Of course, by then no one gives a shit, and they’ve all moved on patting each other on the back about how New Shiny is better.
A lot of these articles look at on-demand pricing for AWS. But you're rarely paying on-demand prices 24/7. If you have a stable workload, you probably buy reserved instances or a compute savings plan. At larger scales, you use third party services to get better deals with more flexibility.
A while back I looked into renting hardware, and found that we would save about 20% compared to what we actually paid AWS – in partially because location and RAM requirements made the rental more expensive than anticipated, and partially because we were paying a lot less than on-demand price for AWS.
20% is still significant, but it's a lot less than the ~80% that this and other articles suggest.
This is usually only true of you lift and shift your AWS setup exactly as-is, instead of looking at what hardware will run your setup most efficiently.
The biggest cost with AWS also isn't compute, but egress - for bandwidth heavy setups you can sometimes finance the entirety of the servers from a fraction of the savings in egress.
I cost optimize setups with guaranteed caps at a proportion of savings a lot of the time, and I've yet to see a setup where we couldn't cut the cost far more than that.
These days we have more meta-software than software. Instead of Apache with virtualhosts, we have a VM running Docker instances, each with an nginx of its own, all connected by a separate Docker of nginx acting as a proxy.
How much waste is there from all this meta-software?
In reality, I host more on Raspberry Pis with USB SSDs than some people host on hundred-plus watt Dells.
At the same time, people constantly compare colo and hardware costs with the cost per month of cloud and say cloud is "cheaper". I don't even bother to point out the broken thinking that leads to that. In reality, we can ignore gatekeepers and run things out of our homes, using VPSes for public IPs when our home ISPs won't allow certain services, and we can still have excellent uptimes, often better than cloud uptimes.
Yes, we can consolidate many, many services in to one machine because most services aren't resource heavy constantly.
Two machines on two different home ISP networks backing each other up can offer greater aggregate uptime than a single "enterprise" (a misnomer, if you ask me, if you're talking about most x86 vendors) server in colo. A single five minute reboot of a Dell a year drops uptime from 100% to 99.999%.
Cloud is such bullshit that it's exhausting even just engaging with people who "but what if" everything, showing they've never even thought about it for more than a minute themselves.
Right now, my plan is to move from a bunch of separate VPSes, to one dedicated server from Hetzner and run a few VMs inside of it with separate public IPs assigned to them alongside some resource limits. You can get them for pretty affordable prices, if you don't need the latest hardware: https://www.hetzner.com/sb/
That way I can limit the blast range if I mess things up inside of a VM, but at the same time benefit from an otherwise pretty simple setup for hosting personal stuff, a CPU with 8 threads and 64 GB of RAM ought to be enough for most stuff I might want to do.
I did this (well, a large-r VPS for $120/month) for my Rails-based sports streaming website. I had a significant amount of throughput too, especially at peak (6-10pm ET).
My biggest takeaway was to have my core database tables (user, subscription, etc) backed up every 10 minutes, and the rest every hour, and test their restoration. (When I shut down the site it was 1.2TB.) Having a script to quickly provision a new node—in case I ever needed it—would have something up within 8 minutes from hitting enter.
When I compare this to the startups I’ve consulted for, who choose k8s because it’s what Google uses yet they only push out 1000s of database queries per day with a handful of background jobs and still try to optimize burn, I shake my head.
I’d do it again. Like many of us I don’t have the need for higher-complexity setups. When I did need to scale, I just added more vCPUs and RAM.
Is there somewhere I can read more about your setup/experience with your streaming site? I currently run a (legal :) streaming site but have it hosted on AWS and have been exploring moving everything over to a big server. At this point it just seems like more work to move it than to just pay the cloud tax.
The problem is sizing and consistency. When you're small, it's not cost effective to overprovision 2-3 big servers (for HA).
And when you need to move fast (or things break), you can't wait a day for a dedicated server to come up, or worse, have your provider run out of capacity (or have to pick a different specced server)
IME, having to go multi cloud/provider is a way worse problem to have.
There are a number of providers who provision dedicated servers via API in minutes these days. Given a dedicated server starts at around $90/Month it probably does make sense for alot of people.
>Unfortunately, since all of your services run on servers (whether you like it or not), someone in that supply chain is charging you based on their peak load.
This seems fundamentally incorrect to me? If I need 100 units of peak compute during 8 hours of work hours, I get that from Big Cloud, and they have two other clients needing same in offset timezones then in theory the aggregate cost of that is 1/3rd of everyone buying their own peak needs.
Whether big cloud passes on that saving is another matter, but it's there.
i.e. big cloud throws enough small customers together so that they don't have "peak" per se just a pretty noisy average load that is in aggregate mostly stable
But they generally don't. Most people don't have large enough daily fluctuations for these demand curves to flatten out enough. And the providers also need enough capacity to handle unforeseen spikes. Which is also why none of them will let you scale however far you want - they still impose limits so they can plan the excess they need.
My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures. One server is the simplest and most reliable setup, and if you have backup and automated provisioning you can just re-deploy your entire environment in less than the time it takes to debug a complex multi-server setup.
I'm not saying everybody should do this. There are of-course a lot of services that can't afford even a minute of downtime. But there is also a lot of companies that would benefit from a simpler setup.
Yep. I know people will say, “it’s just a homelab,” but hear me out: I’ve ran positively ancient Dell R620s in a Proxmox cluster for years. At least five. Other than moving them from TX to NC, the cluster has had 100% uptime. When I’ve needed to do maintenance, I drop one at a time, and it maintains quorum, as expected. I’ll reiterate that this is on circa-2012 hardware.
In all those years, I’ve had precisely one actual hardware failure: a PSU went out. They’re redundant, so nothing happened, and I replaced it.
Servers are remarkably resilient.
EDIT: 100% uptime modulo power failure. I have a rack UPS, and a generator, but once I discovered the hard way that the UPS batteries couldn’t hold a charge long enough to keep the rack up while I brought the generator online.
Being as I love minor disaster anecdotes where doing all the "right things" seem to not make any difference :).
We had a rack in data center, and we wanted to put local UPS on critical machines in the rack.
But the data center went on and on about their awesome power grid (shared with a fire station, so no administrative power loss), on site generators, etc., and wouldn't let us.
Sure enough, one day the entire rack went dark.
It was the power strip on the data centers rack that failed. All the backups grids in the world can't get through a dead power strip.
(FYI, family member lost their home due to a power strip, so, again, anecdotally, if you have any older power strips (5-7+ years) sitting under your desk at home, you may want to consider swapping it out for a new one.)
> My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures.
I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.
These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.
don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner?
and even if, you could just run a provisioned secondary server that jumps in if the first becomes unavailable and still be much cheaper.
> don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner?
You're not getting the point. The point is that if you use a single node to host your whole web app, you are creating a system where many failure modes, which otherwise could not even be an issue, can easily trigger high-severity outages.
> and even if, you could just run a provisioned secondary server (...)
Congratulations, you are no longer using "one big server", thus defeating the whole purpose behind this approach and learning the lesson that everyone doing cloud engineering work is already well aware.
I don't know about Hetzner, but the failure case isn't usually tripping over power plugs. It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server.
Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p
It's unlikely, but it happens. In the mid 2000's I had some servers at a colo. They were doing electrical work and took out power to a bunch of racks, including ours. Those environments are not static.
My single on-premise Exchange server is drastically more reliable than Microsoft's massive globally resilient whatever Exchange Online, and it costs me a couple hours of work on occasion. I probably have half their downtime, and most of mine is scheduled when nobody needs the server anyhow.
I'm not a better engineer, I just have drastically fewer failure modes.
I also have seem the opposite somewhat frenquently: some team screws up the server and unrelated stable services that are running since forever (on the same server) are now affected due messing up the environment.
A lot of this attitude comes from the bad old days of 90s and early 2000s spinning disk. Those things failed a lot. It made everyone think you are going to have constant outages if you don’t cluster everything.
Today’s systems don’t fail nearly as often if you use high quality stuff and don’t beat the absolute hell out of SSD. Another trick is to overprovision SSD to allow wear leveling to work better and reduce overall write load.
Do that and a typical box will run years and years with no issues.
UPSes always seem to have strange failure modes. I've had a couple fail after a power failure. The batteries died and they wouldn't come back up automatically when the power came back. They didn't warn me about the dead battery until after...
This was written in 2022, but looks
like it's most still relevant today. Would be interesting to see updated numbers on the expected costs of various hosting providers.
Those servers are mainly designed for enterprise use cases. For hobby projects, I can understand why someone would choose Hetzner over AWS.
For enterprise environments, however, there is much more to consider. One of the biggest costs you face is your operations team. If you go with Hetzner, you essentially have to rebuild a wide range of infrastructure components yourself (WAF, globally distributed CDN, EFS, RDS, EKS, Transit Gateways, Direct Connect and more).
Of course, you can create your own solutions for all of these. At my company, a mid-size enterprise, we once tried to do exactly that.
and 20+ more moving targets of infra software stack and support systems
The result was hiring more than 10 freelancers in addition to 5 of our DevOps engineers to build it all and handling the complexity of such a setup and the keep everything up-to-date, spending hundreds of thousands of dollars. Meanwhile, our AWS team, consisting of only three people working with Terraform, proved far more cost-effective. Not in terms of dollars per CPU core, but in terms of average per project spending dollars once staff costs and everything were included.
I think many of the HN posts that say things like "I saved 90% of my infra bill by moving from AWS to a single Hetzner server" are a bit misleading.
Most of those things you listed are work arounds for having a slow server/system.
For example, if you serve your assets from the server you can skip a cors round trip. If you use an embedded database like sqlite you can shave off 50ms, use dedicated CPU (another 50ms), now you don't need to sever anything from the edge. Because your global latency is much better.
I’ve found that it’s hard to even hire engineers who aren’t all in on cloud and who even know how to build without it.
Even the ones who do know have been conditioned to tremble with fear at the thought of administrating things like a database or storage. These are people who can code cryptography kernels and network protocols and kernel modules, but the thought of running a K8S cluster or Postgres fills them with terror.
“But what if we have downtime!”
That would be a good argument if the cloud didn’t have downtime, but it does. Most of our downtime in previous years has been the cloud, not us.
“What if we have to scale!” If we are big enough to outgrow a 256 core database with terabytes of SSD, we can afford to hire a full time DBA or two and have them babysit a cluster. It’ll still be cheaper.
“What if we lose data?” Ever heard of backups? Streaming backups? Hot spares? Multiple concurrent backup systems? None of this is complex.
“But admin is hard!” So is administrating cloud. I’ve seen the horror of Terraform and Helm and all that shit. Cloud doesn’t make admin easy, just different. It promised simplicity and did not deliver.
… and so on.
So we pay about 1000X what we should pay for hosting.
Every time I look at the numbers I curse myself for letting the camel get its nose under the tent.
If I had it to do over again I’d forbid use of big cloud from day one, no exceptions, no argument, use it and you’re fired. Put it in the articles of incorporation and bylaws.
I have also found this happening. It's actually really funny because I think even I'm less inclined to run postgres myself these days, when I used to run literally hundreds of instances with not much more than PG_DUMP, cron and two read only replicas.
These days probably the best way of getting these 'cloudy' engineers on board is just to tell them its Kubernetes and run all of your servers as K3s.
I’m convinced that cloud companies have been intentionally shaping dev culture. Microservices in particular seem like a pattern designed to push managed cloud lock in. It’s not that you have to have cloud to use them, but it creates a lot of opportunities to reach for managed services like event queues to replace what used to be a simple function call or queue.
Dev culture is totally fad driven and devs are sheep, so this works.
I'm in the process of breaking up a legacy deployment on "one big server" into something cloud native like Kubernetes.
The problem with one big server is that few customers have ONE (1) app that needs that much capacity. They have many small apps that add up to that much capacity, but that's a very different scenario with different problems and solutions.
For example, one of the big servers I'm in the process of teasing apart has about 100 distinct code bases deployed to it, written by dozens of developers over decades.
If any one of those apps gets hacked and this is escalated to a server takeover, the other 99 apps get hacked too. Some of those apps deal with PII or transfer money!
Because a single big server uses a single shared IP address for outbound comms[1] this means that the firewall rules for 100 apps end up looking like "ALLOW: ANY -> ANY" for two dozen protocols.
Because upgrading anything system-wide on the One Big Server is a massive Big Bang Change, nobody has had the bravery to put their hand up and volunteer for this task. Hence it has been kept alive running 13 year old platform components because 2 or 3 of the 100 apps might need some of those components... but nobody knows which two or three apps those are, because testing this is also big-bang and would need all 100 apps tested all at once.
It actually turned out that even Two Big (old) Servers in a HA pair aren't quite enough to run all of the apps so they're being migrated to newer and better Azure VMs.
During the interim migration phase instead of Two Big Server s there are Four Big Servers... in PRD. And then four more in TST, etc... Each time a SysOps person deploys a new server somewhere, they have to go tell each of the dozens of developers where they need to deploy their apps today.
Don't think DevOps automation will rescue you from this problem! For example in Azure DevOps those 100 apps have 100 projects. Each project has 3 environments (=300 total) and each of those would need a DevOps Agent VM link to the 2x VMs = 600 VM registrations to keep up to date. These also expire every 6 months!
Kubernetes, Azure App Service, AWS App Runner, and GCP App Engine serve a purpose: They solve these problems.
They provide developers with a single stable "place" to dump their code even if the underlying compute is scaled, rebuilt, or upgraded.
They isolate tiny little apps but also allow the compute to be shared for efficient hosting.
They provide per-app networking and firewall rules.
Etc...
[1] It's easy to bind distinct ingress IP addresses on even a single NIC (or multipe), but it's weirdly difficult to split the outbound path. Maybe this is easier on Linux, but on Windows and IIS it is essentially impossible.
and now consider 6th Gen EPYC will have 256 cores also you can have
32 hot-swap SSDs with like 10mil plus of random write IOPS and 60mil plus random read IOPS in a single 2U box
I work for a cloud provider and I'll tell you, one of the reasons for the cloud premium is that it is a total pain in the ass to run hardware. Last week I installed two servers and between them had four mysterious problems that had to be solved by reseating cards, messing with BIOS settings, etc. Last year we had to deal with a 7 site, 5 country RMA for 150 100gb copper cables with incorrect coding in their EEPROMs.
I tell my colleagues: it's a good thing that hardware sucks: the harder it is to run bare metal, the happier our customers are that they choose the cloud. :)
(But also: this is an excellent article, full of excellent facts. Luckily, my customers choose differently.)
And then boom, all your services are gone due to a pesky capacitor on the motherboard. Also good luck trying to change even one software component of that monolith without disrupting and jeopardizing the whole operation.
While it is a useful advice to some people in certain conditions, it should be taken with a grain of salt.
Capacitor problem or not, hardware does fail. Power supplies crap out. SSDs die in strange ways. A failure of a supposedly "redundant" SSD might cause your system to freeze up.
One thing that we ran into back in the day was EEC failure on reboot.
We had a few Dell servers that ran great for a year or two. We rebooted one for some reason or another and it refused to POST due to an EEC failure.
Hauled down to the colo at 3AM and ripped the fucking ram out of the box and hoped it would restart.
Hardware fails. The RAM was fine for years, but something happened to it. Even Dell had no idea and just shipped us another stick, which we stuck in at the next downtime window.
To top it off, we dropped the failing RAM into another box at the office and it worked fine. <shrug>.
> Part of the "cloud premium" for load balancers, serverless computing, and small VMs is based on how much extra capacity your cloud provider needs to build in order to handle their peak load. You're paying for someone's peak load anyway!
Eh, sort of. The difference is that the cloud can go find other workloads to fill the trough from off peak load. They won’t pay as much as peak load does, but it helps offset the cost of maintaining peak capacity. Your personal big server likely can’t find paying workloads for your troughs.
I also have recently come to the opposite conclusion for my personal home setup. I run a number of services on my home network (media streaming, email, a few personal websites and games I have written, my frigate NVR, etc). I had been thinking about building out a big server for expansion, but after looking into the costs I bought 3 mini pcs instead. They are remarkably powerful for their cost and size, and I am able to spread them around my house to minimize footprint and heat. I just added them all to my home Kubernetes cluster, and now I have capacity and the ability to take nodes down for maintenance and updates. I don’t have to worry about hardware failures as much. I don’t have a giant server heating up one part of my house.
One of the more detrimental aspects of the Cloud Tax is that it constrains the types of solutions engineers even consider.
Picking an arbitrary price point of $200/mo, you can get 4(!) vCPUs and 16GB of RAM at AWS. Architectures are different etc., but this is roughly a mid-spec dev laptop of 5 or so years ago.
At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money. It's hard to overstate how far apart these machines are in raw computational capacity.
There are approaches to problems that make sense with 10x the capacity that don't make sense on the much smaller node. Critically, those approaches can sometimes save engineering time that would otherwise go into building a more complex system to manage around artificial constraints.
Yes, there are other factors like durability etc. that need to be designed for. But going the other way, dedicated boxes can deliver more consistent performance without worries of noisy neighbors.
It's more than that - it's all the latency that you can remove from the equation with your bare-metal server.
No network latency between nodes, less memory bandwidth latency/contention as there is in VMs, no caching architecture latency needed when you can just tell e.g. Postgres to use gigs of RAM and then let Linux's disk caching take care of the rest (and not need a separate caching architecture).
The difference between a fairly expensive ($300) RDS instance + EC2 in the same region vs a $90 dedicated server with a NVME drive and postgres in a container is absolutely insane.
A fair comparison would include the cost of the DBA who will be responsible for backups, updates, monitoring, security and access control. That’s what RDS is actually competing with.
Paying someone $2000 to set that up once should result in the costs being recovered in what, 18 months?
If you’re running Postgres locally you can turn off the TCP/IP part; nothing more to audit there.
SSH based copying of backups to a remote server is simple.
If not accessible via network, you can stay on whatever version of Postgres you want.
I’ve heard these arguments since AWS launched, and all that time I’ve been running Postgres (since 2004 actually) and have never encountered all these phantom issues that are claimed as being expensive or extremely difficult.
$2k? That’s a $100k project for a medium size Corp
I do consulting in this space, and we consistently make more money from people who insist on using cloud services, because their setups tend to need far more work.
As long as you also include the Cloud Certified DevOps Engineer™[0] to set up that RDS instance.
[0] A normal sysadmin remains vaguely bemused at their job title and the way it changes every couple years.
You don’t need a DBA for any of those, you need someone who can read some docs. It’s not witchcraft.
Totally. My frustration isn't even price though RDS is literally just dog slow.
Yeah but AWS SRE are what making the big bucks! Soooo what can you do? It is nice to see many people here on HN are supporting open network and platform and making very drastic comments as to encouraging google engineers to quite their jobs.
I totally also understand why some people with family to support mortgage to pay they can't just walk way from a job at FAANG or MAMAA type place.
Looking at your comparison, this point it just seems like a scam.
Right now the big bucks are in managing massive bare metal GPU clusters.
This. Clustering and managing Nvidia at scale is the new hotness demanding half-million dollar salaries.
100% this add an embedded database like sqlite and optimise writes to batch and you can go really really far with hetzner. It's also why I find the "what about overprovisioning" argument silly (once you look outside of AWS you can get insane cost/perf ratio).
Also in my experience more complex systems tend to have much less reliability/resilience than simple single node systems. Things rarely fail in isolation.
I think it’s the other way around. I’m a huge fan of Hetzner for small sites with a few users. However, for bigger projects, the cloud seems to offer a complete lack of constraints. For projects that can pay for my time, $200/m or $2000/m in hosting costs is a negligible difference. What’s the development cost difference between AWS CDK / Terraform + GitHub Actions vs. Docker / K8s / Ansible + any CI pipeline? I don’t know; in my experience, I don’t see how “bare metal” saves much engineering time. I also don’t see anything complicated about using an IaC Fargate + RDS template.
Now, if you actually need to decouple your file storage and make it durable and scalable, or need to dynamically create subdomains, or any number of other things… The effort of learning and integrating different dedicated services at the infrastructure level to run all this seems much more constraining.
I’ve been doing this since before the “Cloud,” and in my view, if you have a project that makes money, cloud costs are a worthwhile investment that will be the last thing that constrains your project. If cloud costs feel too constraining for your project, then perhaps it’s more of a hobby than a business—at least in my experience.
Just thinking about maintaining multiple cluster filesystems and disk arrays—it’s just not what I would want to be doing with most companies’ resources or my time. Maybe it’s like the difference between folks who prefer Arch and setting up Emacs just right, versus those happy with a MacBook. If I felt like changing my kernel scheduler was a constraint, I might recommend Arch; but otherwise, I recommend a MacBook. :)
On the flip side, I’ve also tried to turn a startup idea into a profitable project with no budget, where raw throughput was integral to the idea. In that situation, a dedicated server was absolutely the right choice, saving us thousands of dollars. But the idea did not pan out. If we had gotten more traction, I suspect we would have just vertically scaled for a while. But it’s unusual.
> I really don't see how "bare metal" saves any engineering time
This is because you are looking only at provisioning/deployment. And you are right -- node size does not impact DevOps all that much.
I am looking at the solution space available to the engineers who write the software that ultimately gets deployed on the nodes. And that solution space is different when the nodes have 10x the capability. Yes, cloud providers have tons of aggregate capability. But designing software to run on a fleet of small machines is very different from accomplishing the same tasks on a single large machine.
It would not be controversial to suggest that targeting code at an Apple Watch or Raspberry Pi imposes constraints on developers that do not exist when targeting desktops. I am saying the same dynamic now applies to targeting cloud providers.
This isn't to say there's a single best solution for everything. But there are tradeoffs that are now always apparent. The art is knowing when it makes sense to pay the Cloud Tax, and whether to go 100% Cloud vs some proportion of dedicated.
Overall, I agree that most people underestimate the runway that the modern dedicated server can give you.
It really depends on what you are doing. But when you factor the network features, the ability to scale the solution, etc you get alot of stuff inside that $200/mo EC2 device. The product is more than the VM.
That said, with a defined workload without a ton of variation or segmentation needs there are lots of ways to deliver a cheaper solution.
> you get alot of stuff inside that $200/mo EC2 device. The product is more than the VM.
What are you getting, and do you need it?
Probably not for $200/mo EC2, but AWS/GCP in general
* Centralized logging, log search, log based alerting
* Secrets manager
* Managed kubernetes
* Object store
* Managed load balancers
* Database HA
* Cache solutions
... Can I run all these by myself? Sure. But I'm not in this business. I just want to write software and run that.
And yes, I have needed most of this from day 1 for my startup.
For a personal toy project, or when you reach a certain scale, it may makes sense to go the other way. U
I agree that AWS EC2 is probably too expensive on the whole. It also doesn't really provide any of the greater benefits of the cloud that come from "someone else's server".
However, to the point of microservices as the article mentions, you probably should look at lambda (or fargate, or a mix) unless you can really saturate the capacity of multiple servers.
When we swapped to ECS+EC2 running microservices over to lambda our costs dropped sharply. Even serving millions of requests a day we spend a lot of time in between idle, especially spread across the services.
Additionally, we have 0 outages now from hardware in the last 5 years. As an engineer, this has made my QoL significantly better.
On AWS if you want raw computational capacity you use Lambda and not EC2. EC2 is for legacy type workloads and doesn't have nearly the same scaling power and speed that Lambda does.
I have several workloads that just invoke Lambda in parallel. Now I effectively have a 1000 core machine and can blast through large workloads without even thinking about it. I have no VM to maintain or OS image to consider or worry about.
Which highlights the other difference that you failed to mention. Hertzner charges a "one time setup" fee to create that VM. That puts a lot of back pressure on infrastructure decisions and removes any scalability you could otherwise enjoy in the cloud.
If you want to just rent a server then Hertzner is great. If you actually want to run "in the cloud" then Hertzner is a non-starter.
Strong disagree here. Lambda is significantly more expensive per vCPU hour and introduces tight restrictions on your workflow and architecture, one of the most significant being maximum runtime duration.
Lambda is a decent choice when you need fast, spiky scaling for a lot simple self-contained tasks. It is a bad choice for heavy tasks like transcoding long videos, training a model, data analysis, and other compute-heavy tasks.
> significantly more expensive per vCPU hour
It's almost exactly the same price as EC2. What you don't get to control is the mix of vCPU and RAM. Lambda ties those two together. For equivalent EC2 instances the cost difference is astronomically small, on the order of pennies per month.
> like transcoding long videos, [...] data analysis, and other compute-heavy tasks
If you aren't breaking these up into multiple smaller independent segments then I would suggest that you're doing this wrong in the first place.
> training a model
You're going to want more than what a basic EC2 instance affords you in this case. The scaling factors and velocity are far less of a factor.
That's fine, except for all of Lambda's weird limitations: request and response sizes, deployment .zip sizes, max execution time, etc. For anything complicated you'll eventually you run into all this stuff. Plus you'll be locked into AWS.
> request and response sizes
If either of these exceed the limitations of the call, which is 6MB or 256kB depending on call type, then you can just use S3. For large distributed task coordination you're going to be doing this anyways.
> deployment .zip sizes
Overlays exist and are powerful.
> max execution time
If your workload depends on long uninterrupted runs of time on single CPUs then you have other problems.
> Plus you'll be locked into AWS.
In the world of serverless your interface to the endpoints and semantics of Lambda are minimal and easily changed.
Very few providers charge setup, some will provision a server within a 90s of an api call.
Hertzner does on the server the OP was referencing:
https://www.hetzner.com/dedicated-rootserver/ax162-s/
I don't think that negates the point I was making. Most don't, for example none of the providers on https://www.serversearcher.com/ seem to charge setup.
HN uses two—one live and one backup, so we can fail over if there's a hardware issue or we need to upgrade something.
It's a nice pattern. Just don't make them clones of each other, or they might go BLAM at the same time!
https://news.ycombinator.com/item?id=32049205
https://news.ycombinator.com/item?id=32032235
https://news.ycombinator.com/item?id=32028511 (<-- this is where it got figured out)
---
Edit: both these points are mentioned in the OP.
Whilst not as fatal as a failing SSD, AMD also had a fun errata where a CPU core would hang in CC6 after ~1044 days.
https://www.servethehome.com/amd-epyc-7002-rome-cpus-hang-af...
The one big box assumes that you know how to configure everything for high performance. I suspect that skill has been lost, for the most part.
You really need to tweak the TCP/IP stack, buffer sizes, and various other things to get everything to work really well under heavy load. I'm not sure if the various sites that used to talk about this have been updated in the last decade or so, because I don't do that anymore.
I mean, you'll run out of file descriptors pretty quickly if you try to handle a few hundred simultaneous connections. Doesn't matter how big your box is at that point.
I helped bootstrap a company that made an enterprise automation engine. The team wanted to make the service available as SaaS for boosting sales.
They could have got the job done by hosting the service in a vps with a multi-tenant database schema. Instead, they went about learning kubernetes and drillingg deep into "cloud-native" stack. Spent a year trying to setup the perfect devops pipeline.
Not surprisingly the company went out of business within the next few years.
> Not surprisingly the company went out of business within the next few years.
But the engineers could find new jobs thanks to their acquired k8s experience.
This is my experience too—there’s too much time wasted trying to solve a problem that might exist 5 years down the road. So many projects and early-stage companies would be just fine either with a PaaS or nginx in front of a docker container. You’ll know when you hit your pain point.
Yep, this is why I'm a proponent of paas until the bill actually hurts. Just pay the heroku/render/fly tax and focus on product market fit. Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
> Or, play with servers and K8s, burning your investors money, then move on to the next gig and repeat...
I mean, of the two, the PaaS route certainly burns more money, the exception being the rare shop that is so incompetent they can't even get their own infrastructure configured correctly, like in GP's situation.
There are guaranteed more shops that would be better off self-hosting and saving on their current massive cloud bills than the rare one-offs that actually save so much time using cloud services, it takes them from bankruptcy to being functional.
> the PaaS route certainly burns more money,
Does it? Vercel is $20/month and Neon starts at $5/month. That obviously goes up as you scale up, but $25/month seems like a fairly cheap place to start to me.
(I don't work for Vercel or Neon, just a happy customer)
Yeah, also a happy neon customer - but they can get pricy. Still prefer them over AWS. For compute, Fly is pretty competitive.
I’m using Neon too and upgraded to the scale up version today. Curious, what do you mean rhat they can get pricey?
Yeah, same. Vercel + Neon and then if you actually have customers and actually end up paying them enough money that it becomes significant, then you can refactor and move platforms, but until you do, there are bigger fish to fry.
100%. Making it a docker container and deploying it is literally a few hours at most.
A lot of the time businesses just aren't that important. The amount places I've seen that stress over uptime when nothing they run is at all critical. Hell you could drop the production environment in the middle of the day and yes it would suck and you'd get a few phone calls but life would go on.
These companies all ended up massively increasing their budgets switching to cloud workloads when a simple server in the office was easily enough for their 250 users. Cloud is amazing for some uses and pure marketing BS for others but it seems like a lot of engineers aim for a perfect scalable solution instead of one that is good enough.
A thoroughly good article. It's probably worth also considering adding a CDN if you take this approach at scale. You get to use their WAF and DNS failover.
A big pain point that I personally don't love is that this non-cloud approach normally means running my own database. It's worth considering a provider who also provides cloud databases.
If you go for an 'active/passive' setup, consider saving even more money by using a cloud VM with auto scaling for the 'passive' part.
In terms of pricing the deals available these days on servers are amazing you can get 4GB RAM VPSs with decent CPU and bandwidth for ~$6 or bare metal for ~$90 for 32GB RAM quad core worth using sites like serversearcher.com to compare.
What’s the issue with running Postgres inside a docker container + regular backups? Never had problem and relatively easy to manage.
no PITB, but mostly just 'it's hassle' for the application server I literally don't need backups, just automated provisioning/docker container etc. Adding postgres then means I need full backups including PITB because I don't even want to lose an hours data.
If you're running on a single machine then you'll get way more performance with something like sqlite (instead of postgres/MySQL) which also makes managing the database quite trivial.
If you have a single request at a time and need little integrity checks.
SQLite has serious concurrency concerns which have to be evaluated. You should consider running postgres or mysql/mariadb even if it's on the same server.
SQLite uses one reader/writer lock over the whole database. When any thread is writing the database, no other thread is reading it. If one thread is waiting to write, new reads can't begin. Additionally, every read transaction starts by checking if the database has changed since last time, and then re-loading a bunch of caches.
This is suitable for SQLite's intended use case. It's most likely not suitable for a server with 256 hardware threads and a 50Gbps network card. You need proper transaction and concurrency control for heavy workloads.
Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
SQLite is lite. Use it for lite things, not hevy things.
Is any SQL database suitable for 50GBps of network traffic hitting it?
Most if not all of your concerns with SQLite are simply a matter of not using the default configuration. Enable WAL mode, enable strict mode, etc. and it's a lot better.
Agree on many things here, but SQLite does support WAL mode which supports 1 writer/N writer readers with snapshot isolation on reads. Writes are serialized but still quite fast.
SQLite (actually SQL-ite, like a mineral) maybe be light, but so are many workloads these days. Even 1000 queries per second is quite doable with SQLite and modest hardware, and I've worked at billion dollar businesses handling fewer queries than that.
Regardless of the cost and capacity analysis, it's just hard to fight the industry trends. The benefits of "just don't think about hardware" are real. I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front). And above all, if an AWS region goes down, it doesn't seem like your org's fault, but if your bespoke private hosting arrangement goes down, then that kinda does seem like your org's fault.
> and server hardware is expensive up front
You don't need to buy server hardware(!), the article specifically mentions renting from eg Hetzner.
> The benefits of "just don't think about hardware" are real
Can you explain on this claim, beyond what the article mentioned?
> Can you explain on this claim, beyond what the article mentioned?
I run a lambda behind a load balancer, hardware dies, its redundant, it gets replaced. I have a database server fail, while it re provisions it doesn't saturate read IO on the SAN causing noisy neighbor issues.
I don't deal with any of it, I don't deal with depreciation, I don't deal with data center maintenance.
> I don't deal with depreciation, I don't deal with data center maintenance.
You don't deal with that either if you rent a dedicated server from a hosting provider. They handle the datacenter and maintenance for you for a flat monthly fee.
They do rely on you to tell them if hardware fails, however, and they'll still unplug your server and physically fix it. And there's a risk they'll replace the wrong drive in your RAID pair and you'll lose all your data - this happens sometimes - it's not a theoretical risk.
But the cloud premium needs reiteration: twenty five times. For the price of the cloud server, you can have twenty-five-way redundancy.
> I think there is a school of thought that capex should be avoided at all costs (and server hardware is expensive up front).
Yes, there is.
Honestly, it looks to me that this school of thought is mostly adopted by people that can't do arithmetic or use a calculator. But it does absolutely exist.
That said, no, servers are not nearly expensive enough to move the needle on a company nowadays. The room that often goes around them is, and that's why way more people rent the room than the servers in it.
Connectivity is a problem, not the room.
I ran the IT side of a media company once, and it all worked on a half-empty rack of hardware in a small closet... except for the servers that needed bandwidth. These were colocated. Until we realized that the hoster did not have enough bandwidth, at which point we migrated to two bare metal servers at Hetzner.
It's connectivity, reliable power, reliable cooling, and security.
The actual space isn't a big deal, but the entire environment has large fixed costs.
If you rent dedicated servers, then you're not worrying about any of the capex or maintenance stuff.
the benefits of don't write a distributed system unless you really have to are also very real
For anything up to about 128GB RAM you can still easily avoid capex by just renting servers. Above that it gets a bit trickier
It's not like it's a huge capex for that level of server anyway. Probably less than the cost of one employee's laptop.
Renting (hosted) servers above 128GB RAM is still pretty easy, but I agree pricing levels out. 128GB RAM server ~$200/Month, 384 GB ~$580, 1024 GB ~$940/Month
To be clear - this isn't an endorsement on my part, just observations of why cloud-only deployment seems common. I guess we shouldn't neglect the pressure towards resume-oriented development either, as it undoubtedly plays a part in infra folks' careers. It probably makes you sound obsolete to be someone who works in a physical data center.
I for one really miss being able to go see the servers that my code runs on. I thought data centers were really interesting places. But I don't see a lot of effort to decide things based on pure dollar cost analysis at this point. There's a lot of other industry forces besides the microeconomics that predetermine people's hosting choices.
I often wonder if my home NAS/Server would be better off put onto a rented box or a cloud server somewhere, especially since I now have 1gbit/s internet. Even now the 20TB of drive space and 6 Cores with 32GB on Hetzner with a dedicated is about twice the price of buying the hardware over a 5 year period. I suspect the hardware will actually last longer than that and its the same level of redundancy (RAID) on a rented dedicated so the backup is the same cost between the two.
Using cloud and box storage on Hetzner is more expensive than the dedicated server, 4x owning the hardware and paying the power bill. AWS and Azure are just nuts, >100x the price because they charge so much for storage even with hard drives. Contabo nor Netcup can do this, its too much storage for them.
Every time I look at this I come to the same basic conclusion, the overhead of renting someone else’s machine is quite high compared to the hardware and power cost and it would be a worse solution than having that performance on the local network for bandwidth and latency. The problem isn't so much the compute performance, that is relatively fairly priced, its the storage costs and data transfer that bites.
Not really what the article was necessarily about but cloud is sort of meant to be good for low end hardware but its actually kind of not, the storage costs are just too high even a Hetzner Storage box.
It really depends on your power costs. In certain parts of Europe, power is so expensive that Hetzner actually works out cheaper (despite them providing you the entire machine and datacenter-grade internet connection).
I think I’ve settled on both being the answer - Hetzner is affordable enough that I can have a full backup of my NAS (using ZFS snapshots and incremental backups), and as a bonus can host some services there instead of at home. My home network still has much lower latency and so is preferable for ie. my Lightroom library.
Related ongoing thread:
How many HTTP requests/second can a single machine handle? (2024) - https://news.ycombinator.com/item?id=45085446 - Aug 2025 (32 comments)
Microservices vs not is (almost) orthogonal to N servers vs one. You can make 10 microservices and rent a huge server and run all 10 services. It's more an organizational thing than a deployment thing. You can't do the opposite though, make a monolith and spread it out on 10 servers.
> You can't do the opposite though, make a monolith and spread it out on 10 servers.
You absolutely can, and it has been the most common practice for scaling them for decades.
> You can't do the opposite though, make a monolith and spread it out on 10 servers.
Yes you can. Its called having multiple applications servers. They all run the same application, just more of them. Maybe they connect to the same DB, maybe not, maybe you shard the DB.
This isn't even the end game for "one big server". AMD will give the most bang per rack, but there are other factors.
An IBM z17 is effectively one big server too, but provides levels of reliability that are simply not available in most IT environments. It won't outperform the AMD rack, but it will definitely keep up for most practical workloads.
If you sit down and really think honestly about the cost of engineering your systems to an equivalent level of reliability, you may find the cost of the IBM stack to be competitive in a surprising number of cases.
At what cost politically? I would expect political battles to be far more intense than any of the technical ones.
That’s because 75% (citation: wild-ass estimate) of tech workers are incapable of critical thinking, and blindly parrot whatever they’ve heard / read. The number of times I’ve seen something on HN, thought “that doesn’t sound right,” and then spent a day disproving it locally is too damn high. Of course, by then no one gives a shit, and they’ve all moved on patting each other on the back about how New Shiny is better.
A lot of these articles look at on-demand pricing for AWS. But you're rarely paying on-demand prices 24/7. If you have a stable workload, you probably buy reserved instances or a compute savings plan. At larger scales, you use third party services to get better deals with more flexibility.
A while back I looked into renting hardware, and found that we would save about 20% compared to what we actually paid AWS – in partially because location and RAM requirements made the rental more expensive than anticipated, and partially because we were paying a lot less than on-demand price for AWS.
20% is still significant, but it's a lot less than the ~80% that this and other articles suggest.
This is usually only true of you lift and shift your AWS setup exactly as-is, instead of looking at what hardware will run your setup most efficiently.
The biggest cost with AWS also isn't compute, but egress - for bandwidth heavy setups you can sometimes finance the entirety of the servers from a fraction of the savings in egress.
I cost optimize setups with guaranteed caps at a proportion of savings a lot of the time, and I've yet to see a setup where we couldn't cut the cost far more than that.
These days we have more meta-software than software. Instead of Apache with virtualhosts, we have a VM running Docker instances, each with an nginx of its own, all connected by a separate Docker of nginx acting as a proxy.
How much waste is there from all this meta-software?
In reality, I host more on Raspberry Pis with USB SSDs than some people host on hundred-plus watt Dells.
At the same time, people constantly compare colo and hardware costs with the cost per month of cloud and say cloud is "cheaper". I don't even bother to point out the broken thinking that leads to that. In reality, we can ignore gatekeepers and run things out of our homes, using VPSes for public IPs when our home ISPs won't allow certain services, and we can still have excellent uptimes, often better than cloud uptimes.
Yes, we can consolidate many, many services in to one machine because most services aren't resource heavy constantly.
Two machines on two different home ISP networks backing each other up can offer greater aggregate uptime than a single "enterprise" (a misnomer, if you ask me, if you're talking about most x86 vendors) server in colo. A single five minute reboot of a Dell a year drops uptime from 100% to 99.999%.
Cloud is such bullshit that it's exhausting even just engaging with people who "but what if" everything, showing they've never even thought about it for more than a minute themselves.
Just today I wasted some time due to an unexpected Tailscale key expiry and some other issues related to running a container cluster: https://blog.kronis.dev/blog/the-great-container-crashout
Right now, my plan is to move from a bunch of separate VPSes, to one dedicated server from Hetzner and run a few VMs inside of it with separate public IPs assigned to them alongside some resource limits. You can get them for pretty affordable prices, if you don't need the latest hardware: https://www.hetzner.com/sb/
That way I can limit the blast range if I mess things up inside of a VM, but at the same time benefit from an otherwise pretty simple setup for hosting personal stuff, a CPU with 8 threads and 64 GB of RAM ought to be enough for most stuff I might want to do.
Previously: https://news.ycombinator.com/item?id=32319147
Thanks! Macroexpanded:
Use one big server - https://news.ycombinator.com/item?id=32319147 - Aug 2022 (585 comments)
I did this (well, a large-r VPS for $120/month) for my Rails-based sports streaming website. I had a significant amount of throughput too, especially at peak (6-10pm ET).
My biggest takeaway was to have my core database tables (user, subscription, etc) backed up every 10 minutes, and the rest every hour, and test their restoration. (When I shut down the site it was 1.2TB.) Having a script to quickly provision a new node—in case I ever needed it—would have something up within 8 minutes from hitting enter.
When I compare this to the startups I’ve consulted for, who choose k8s because it’s what Google uses yet they only push out 1000s of database queries per day with a handful of background jobs and still try to optimize burn, I shake my head.
I’d do it again. Like many of us I don’t have the need for higher-complexity setups. When I did need to scale, I just added more vCPUs and RAM.
Is there somewhere I can read more about your setup/experience with your streaming site? I currently run a (legal :) streaming site but have it hosted on AWS and have been exploring moving everything over to a big server. At this point it just seems like more work to move it than to just pay the cloud tax.
Do a search for HeheStreams on your favorite search engine.
The technical bits aren’t all there, though, and there’s a plethora of noise and misinformation. Happy to talk via email though.
Will do, thank you!
The problem is sizing and consistency. When you're small, it's not cost effective to overprovision 2-3 big servers (for HA).
And when you need to move fast (or things break), you can't wait a day for a dedicated server to come up, or worse, have your provider run out of capacity (or have to pick a different specced server)
IME, having to go multi cloud/provider is a way worse problem to have.
Most industries are not bursty. Overprovision in not expensive for most businesses. You can handle 30000+ updates a second on a 15$ VPS.
A multi node system tends to be less reliable and more failure points than a single box system. Failures rarely happen in isolation.
You can do zero downtime deployment with a single machine if you need to.
There are a number of providers who provision dedicated servers via API in minutes these days. Given a dedicated server starts at around $90/Month it probably does make sense for alot of people.
>Unfortunately, since all of your services run on servers (whether you like it or not), someone in that supply chain is charging you based on their peak load.
This seems fundamentally incorrect to me? If I need 100 units of peak compute during 8 hours of work hours, I get that from Big Cloud, and they have two other clients needing same in offset timezones then in theory the aggregate cost of that is 1/3rd of everyone buying their own peak needs.
Whether big cloud passes on that saving is another matter, but it's there.
i.e. big cloud throws enough small customers together so that they don't have "peak" per se just a pretty noisy average load that is in aggregate mostly stable
But they generally don't. Most people don't have large enough daily fluctuations for these demand curves to flatten out enough. And the providers also need enough capacity to handle unforeseen spikes. Which is also why none of them will let you scale however far you want - they still impose limits so they can plan the excess they need.
Don't forget the cost of managing your one big server and the risk of having such single point of failure.
My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures. One server is the simplest and most reliable setup, and if you have backup and automated provisioning you can just re-deploy your entire environment in less than the time it takes to debug a complex multi-server setup.
I'm not saying everybody should do this. There are of-course a lot of services that can't afford even a minute of downtime. But there is also a lot of companies that would benefit from a simpler setup.
Yep. I know people will say, “it’s just a homelab,” but hear me out: I’ve ran positively ancient Dell R620s in a Proxmox cluster for years. At least five. Other than moving them from TX to NC, the cluster has had 100% uptime. When I’ve needed to do maintenance, I drop one at a time, and it maintains quorum, as expected. I’ll reiterate that this is on circa-2012 hardware.
In all those years, I’ve had precisely one actual hardware failure: a PSU went out. They’re redundant, so nothing happened, and I replaced it.
Servers are remarkably resilient.
EDIT: 100% uptime modulo power failure. I have a rack UPS, and a generator, but once I discovered the hard way that the UPS batteries couldn’t hold a charge long enough to keep the rack up while I brought the generator online.
Being as I love minor disaster anecdotes where doing all the "right things" seem to not make any difference :).
We had a rack in data center, and we wanted to put local UPS on critical machines in the rack.
But the data center went on and on about their awesome power grid (shared with a fire station, so no administrative power loss), on site generators, etc., and wouldn't let us.
Sure enough, one day the entire rack went dark.
It was the power strip on the data centers rack that failed. All the backups grids in the world can't get through a dead power strip.
(FYI, family member lost their home due to a power strip, so, again, anecdotally, if you have any older power strips (5-7+ years) sitting under your desk at home, you may want to consider swapping it out for a new one.)
> My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures.
I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.
These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.
don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner? and even if, you could just run a provisioned secondary server that jumps in if the first becomes unavailable and still be much cheaper.
> don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner?
You're not getting the point. The point is that if you use a single node to host your whole web app, you are creating a system where many failure modes, which otherwise could not even be an issue, can easily trigger high-severity outages.
> and even if, you could just run a provisioned secondary server (...)
Congratulations, you are no longer using "one big server", thus defeating the whole purpose behind this approach and learning the lesson that everyone doing cloud engineering work is already well aware.
I don't know about Hetzner, but the failure case isn't usually tripping over power plugs. It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server.
Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p
It's unlikely, but it happens. In the mid 2000's I had some servers at a colo. They were doing electrical work and took out power to a bunch of racks, including ours. Those environments are not static.
My single on-premise Exchange server is drastically more reliable than Microsoft's massive globally resilient whatever Exchange Online, and it costs me a couple hours of work on occasion. I probably have half their downtime, and most of mine is scheduled when nobody needs the server anyhow.
I'm not a better engineer, I just have drastically fewer failure modes.
Do you develop and manage the server alone? It's a quite a different reality when you have a big team.
Mostly myself but I am able to grab a few additional resources when needed. (Server migration is still, in fact, not fun!)
I also have seem the opposite somewhat frenquently: some team screws up the server and unrelated stable services that are running since forever (on the same server) are now affected due messing up the environment.
A lot of this attitude comes from the bad old days of 90s and early 2000s spinning disk. Those things failed a lot. It made everyone think you are going to have constant outages if you don’t cluster everything.
Today’s systems don’t fail nearly as often if you use high quality stuff and don’t beat the absolute hell out of SSD. Another trick is to overprovision SSD to allow wear leveling to work better and reduce overall write load.
Do that and a typical box will run years and years with no issues.
Not to mention the other leading cause of outages: UPS's.
Sigh.
UPSes always seem to have strange failure modes. I've had a couple fail after a power failure. The batteries died and they wouldn't come back up automatically when the power came back. They didn't warn me about the dead battery until after...
That’s why they have self-tests. Learned that one the hard way myself.
Related: https://brooker.co.za/blog/2024/06/04/scale.html
The last 4-5 years taught me that my most often single point of failure where I can't do a thing is Cloudflare not my on premise servers
Don't forget to read the article.
I'll take a (lone) single point of failure over (multiple) single points of failure.
AWS has also been a single point of failure multiple times in history, and there's no reason to believe this will never happen again.
This was written in 2022, but looks like it's most still relevant today. Would be interesting to see updated numbers on the expected costs of various hosting providers.
Those servers are mainly designed for enterprise use cases. For hobby projects, I can understand why someone would choose Hetzner over AWS.
For enterprise environments, however, there is much more to consider. One of the biggest costs you face is your operations team. If you go with Hetzner, you essentially have to rebuild a wide range of infrastructure components yourself (WAF, globally distributed CDN, EFS, RDS, EKS, Transit Gateways, Direct Connect and more).
Of course, you can create your own solutions for all of these. At my company, a mid-size enterprise, we once tried to do exactly that.
WAF: https://github.com/TecharoHQ/anubis
CDN: Hetzner Nodes with Cache in Finnland, USA and GER
RDS: Self-hosted MySQL from Bitnami
EFS: https://github.com/rook/rook
EKS: https://github.com/vitobotta/hetzner-k3s
and 20+ more moving targets of infra software stack and support systems
The result was hiring more than 10 freelancers in addition to 5 of our DevOps engineers to build it all and handling the complexity of such a setup and the keep everything up-to-date, spending hundreds of thousands of dollars. Meanwhile, our AWS team, consisting of only three people working with Terraform, proved far more cost-effective. Not in terms of dollars per CPU core, but in terms of average per project spending dollars once staff costs and everything were included.
I think many of the HN posts that say things like "I saved 90% of my infra bill by moving from AWS to a single Hetzner server" are a bit misleading.
Most of those things you listed are work arounds for having a slow server/system.
For example, if you serve your assets from the server you can skip a cors round trip. If you use an embedded database like sqlite you can shave off 50ms, use dedicated CPU (another 50ms), now you don't need to sever anything from the edge. Because your global latency is much better.
Managing a single VPS is trivial compared to AWS.
I’ve found that it’s hard to even hire engineers who aren’t all in on cloud and who even know how to build without it.
Even the ones who do know have been conditioned to tremble with fear at the thought of administrating things like a database or storage. These are people who can code cryptography kernels and network protocols and kernel modules, but the thought of running a K8S cluster or Postgres fills them with terror.
“But what if we have downtime!” That would be a good argument if the cloud didn’t have downtime, but it does. Most of our downtime in previous years has been the cloud, not us.
“What if we have to scale!” If we are big enough to outgrow a 256 core database with terabytes of SSD, we can afford to hire a full time DBA or two and have them babysit a cluster. It’ll still be cheaper.
“What if we lose data?” Ever heard of backups? Streaming backups? Hot spares? Multiple concurrent backup systems? None of this is complex.
“But admin is hard!” So is administrating cloud. I’ve seen the horror of Terraform and Helm and all that shit. Cloud doesn’t make admin easy, just different. It promised simplicity and did not deliver.
… and so on.
So we pay about 1000X what we should pay for hosting.
Every time I look at the numbers I curse myself for letting the camel get its nose under the tent.
If I had it to do over again I’d forbid use of big cloud from day one, no exceptions, no argument, use it and you’re fired. Put it in the articles of incorporation and bylaws.
I have also found this happening. It's actually really funny because I think even I'm less inclined to run postgres myself these days, when I used to run literally hundreds of instances with not much more than PG_DUMP, cron and two read only replicas.
These days probably the best way of getting these 'cloudy' engineers on board is just to tell them its Kubernetes and run all of your servers as K3s.
I’m convinced that cloud companies have been intentionally shaping dev culture. Microservices in particular seem like a pattern designed to push managed cloud lock in. It’s not that you have to have cloud to use them, but it creates a lot of opportunities to reach for managed services like event queues to replace what used to be a simple function call or queue.
Dev culture is totally fad driven and devs are sheep, so this works.
Yeah I think that's fair. I'm very pro containers though, that's a genuine step forward from deploy scrips or vm images.
I'm in the process of breaking up a legacy deployment on "one big server" into something cloud native like Kubernetes.
The problem with one big server is that few customers have ONE (1) app that needs that much capacity. They have many small apps that add up to that much capacity, but that's a very different scenario with different problems and solutions.
For example, one of the big servers I'm in the process of teasing apart has about 100 distinct code bases deployed to it, written by dozens of developers over decades.
If any one of those apps gets hacked and this is escalated to a server takeover, the other 99 apps get hacked too. Some of those apps deal with PII or transfer money!
Because a single big server uses a single shared IP address for outbound comms[1] this means that the firewall rules for 100 apps end up looking like "ALLOW: ANY -> ANY" for two dozen protocols.
Because upgrading anything system-wide on the One Big Server is a massive Big Bang Change, nobody has had the bravery to put their hand up and volunteer for this task. Hence it has been kept alive running 13 year old platform components because 2 or 3 of the 100 apps might need some of those components... but nobody knows which two or three apps those are, because testing this is also big-bang and would need all 100 apps tested all at once.
It actually turned out that even Two Big (old) Servers in a HA pair aren't quite enough to run all of the apps so they're being migrated to newer and better Azure VMs.
During the interim migration phase instead of Two Big Server s there are Four Big Servers... in PRD. And then four more in TST, etc... Each time a SysOps person deploys a new server somewhere, they have to go tell each of the dozens of developers where they need to deploy their apps today.
Don't think DevOps automation will rescue you from this problem! For example in Azure DevOps those 100 apps have 100 projects. Each project has 3 environments (=300 total) and each of those would need a DevOps Agent VM link to the 2x VMs = 600 VM registrations to keep up to date. These also expire every 6 months!
Kubernetes, Azure App Service, AWS App Runner, and GCP App Engine serve a purpose: They solve these problems.
They provide developers with a single stable "place" to dump their code even if the underlying compute is scaled, rebuilt, or upgraded.
They isolate tiny little apps but also allow the compute to be shared for efficient hosting.
They provide per-app networking and firewall rules.
Etc...
[1] It's easy to bind distinct ingress IP addresses on even a single NIC (or multipe), but it's weirdly difficult to split the outbound path. Maybe this is easier on Linux, but on Windows and IIS it is essentially impossible.
and now consider 6th Gen EPYC will have 256 cores also you can have 32 hot-swap SSDs with like 10mil plus of random write IOPS and 60mil plus random read IOPS in a single 2U box
I work for a cloud provider and I'll tell you, one of the reasons for the cloud premium is that it is a total pain in the ass to run hardware. Last week I installed two servers and between them had four mysterious problems that had to be solved by reseating cards, messing with BIOS settings, etc. Last year we had to deal with a 7 site, 5 country RMA for 150 100gb copper cables with incorrect coding in their EEPROMs.
I tell my colleagues: it's a good thing that hardware sucks: the harder it is to run bare metal, the happier our customers are that they choose the cloud. :)
(But also: this is an excellent article, full of excellent facts. Luckily, my customers choose differently.)
Fortunately, companies like Hetzner/OVH/etc will handle all this bullshit for you for a flat monthly fee.
And then boom, all your services are gone due to a pesky capacitor on the motherboard. Also good luck trying to change even one software component of that monolith without disrupting and jeopardizing the whole operation.
While it is a useful advice to some people in certain conditions, it should be taken with a grain of salt.
That capacitor thing hasn't been true since the 90's.
Capacitor problem or not, hardware does fail. Power supplies crap out. SSDs die in strange ways. A failure of a supposedly "redundant" SSD might cause your system to freeze up.
One thing that we ran into back in the day was EEC failure on reboot.
We had a few Dell servers that ran great for a year or two. We rebooted one for some reason or another and it refused to POST due to an EEC failure.
Hauled down to the colo at 3AM and ripped the fucking ram out of the box and hoped it would restart.
Hardware fails. The RAM was fine for years, but something happened to it. Even Dell had no idea and just shipped us another stick, which we stuck in at the next downtime window.
To top it off, we dropped the failing RAM into another box at the office and it worked fine. <shrug>.
Hardware still fails. It isn't a question of "if", it's a question of "when". Nothing lasts forever, the naivety lasts only so long too.
> Part of the "cloud premium" for load balancers, serverless computing, and small VMs is based on how much extra capacity your cloud provider needs to build in order to handle their peak load. You're paying for someone's peak load anyway!
Eh, sort of. The difference is that the cloud can go find other workloads to fill the trough from off peak load. They won’t pay as much as peak load does, but it helps offset the cost of maintaining peak capacity. Your personal big server likely can’t find paying workloads for your troughs.
I also have recently come to the opposite conclusion for my personal home setup. I run a number of services on my home network (media streaming, email, a few personal websites and games I have written, my frigate NVR, etc). I had been thinking about building out a big server for expansion, but after looking into the costs I bought 3 mini pcs instead. They are remarkably powerful for their cost and size, and I am able to spread them around my house to minimize footprint and heat. I just added them all to my home Kubernetes cluster, and now I have capacity and the ability to take nodes down for maintenance and updates. I don’t have to worry about hardware failures as much. I don’t have a giant server heating up one part of my house.
It has been great.