How much of that is what technologists would consider "cloud" (IAAS, PAAS) versus what someone on the business side of things would consider "cloud" - office365, google gsuite, etc?
I’d suspect there is significant growth of businesses acting as intermediaries for cloud storage. I think that other software providers have also realized that ransoming users data is a great way to extract predictable, hedge-fund-owner-pleasing revenue without performing useful work.
AEC software providers all do this. ProjectWise is worse than owning or renting a plain file server in every way I can imagine, yet every consultant in transportation dutifully cuts Bentley a five-figure check or larger every year so they can hold your project files hostage and pretend to develop software.
Given that AWS is doing $100B in annual revenue and still growing at 17% YoY ... and they do NOT have a collaboration suite (office/gsuite) - it'd say at least for AWS it's nearly all IaaS/PaaS.
I'd agree on IaaS/PaaS being the main driver. Id guess that everyone is running away from serverless offerings from all the main cloud providers. It's just day 1 lock in to a platform with no shared standards. It's very uncompetitive and kind of slow to innovate.
That's really where you see that no answer is right across the board.
I worked at a very small startup years ago that leaned heavily on EC2. Our usage was pretty bipolar, the service was along the lines of a real-time game so we either had a very heavy work load or nothing. We stood up EC2 instances when games were lice and wound them down after.
We did use Lambda for a few things, mainly APIs that were rarely used or for processing jobs in an event queue.
Serverless has its place for sure, but in my experience it have been heavily over used the last 3-5 years.
We’re migrating over a hundred apps to Azure App Service.
One has an issue with the platform-enforced HTTP timeout maximum values.
I migrated that app back to a VM in an hour.
It turns out that the “integration” for something like App Service (or CloudRun or whatever) is mostly just best practices for any kind of hosting: parameters read from environment variables, immutable binaries with external config, stateless servers, read only web app folders, monitoring with APMs, etc…
Sure, you’ll experience lockin if you use Durable Functions or the similar Lambda features… but no worse than any other workflow or business rules platform.
Ask people how easy it is to get off BizTalk or MuleSoft…
I’ve worked with a few organisations that I’d call “late adopters” to the cloud, and it’s rare for them to use IAAS or even PAAS. It’s all SAAS and serverless, and while they all say they’re doing devops it’s almost always clickops.
For Azure, all of it. Microsoft clumps Azure together with their server software (e.g. Windows Server, SQL Server) licensing when reporting the revenue, but give more fine-grained information on growth rates. This is the latter. (We also know the Azure business was already massive at $34 billion in 2022, since it got revealed during one of Microsoft's ongoing antitrust cases.)
For Google, I'm not aware of a reliable way of estimating the GCP vs. Workspace numbers. But they get asked it during earnings calls, and the answer has always been that the GCP growth is substantially faster than the Workspace growth.
"In parallel, GEICO, one of the largest automotive insurers in the United States, is actively repatriating many workloads from the cloud as part of a comprehensive architectural overhaul."
> That doesn't seem compatible with any kind of major cloud repatriation trend.
Agreed. I don't think this is a real trend, at least not right now.
Also, fwiw, I'm really not a fan of these types of articles that identify like a small handful of people or organizations doing something different and calling it a "trend".
aws and other hyperscalers will keep growing, no doubt. Public cloud adoption is at around 20%. So the new companies that migrate into the cloud will keep the growth going. That doesn't deny the fact that some might be repatriating though. Especially ones that couldn't get the benefits out of the cloud.
One thing I've seen in every startup I've been in over the last decade is that cloud asset management is relatively poor. Now I'm not certain that enterprise is better or worse, but ultimately when I think back 10+ years ago resources were finite. With that limitation came self-imposed policing of utilization.
Looking at cloud infrastructure today it is very easy for organizations to lose sight on production vs frivolous workloads. I happen to work for an automation company that has cloud infrastructure monitoring deployed such that we get notified about the resources we've deployed and can terminate workloads via ChatOps. Even though I know that everyone in the org is continuously nagged about these workloads I still see tons of resources deployed that I know are doing nothing or could be commingled on an individual instance. But, since the cloud makes it easy to deploy we seem to gravitate towards creating a separation of work efforts by just deploying more.
This is/was rampant in every organization I've been a part of for the last decade with respect to cloud. The percentage of actual required, production workloads in a lot of these types of accounts is, I'd gather, less than 50% in many cases. And so I really do wonder how many organizations are just paying the bill. I would gather the Big cloud providers know this based on utilization metrics and I wonder how much cloud growth is actually stagnant workloads piling up.
> Weekly explains that “just running legacy applications in the cloud is prohibitively expensive,” highlighting how lift-and-shift approaches often fail to deliver expected benefits.
Yes, if you have a mature business without active development at a scale where compute/storage costs is a substantial accounting line item, then it makes sense to run on hardware that doesn't have the flexibility and cost of the cloud.
There is an in-between that makes much more sense for most though. Running on provisioned bare metal. Lots of providers offer this as a better performance/price option where you don't have to deal with provisioning hardware but do everything else from the OS+maintenance and up.
At one company we used large bare-metal machine instances provisioned for stable parts of the application architecture (e.g. database and webapp instances) and the cloud for new development where it made sense to leverage capabilities, e.g. DynamoDB with cross-region replication.
I can't tell you how often I've run into cloud deployments that were lift-and-shifts, pushed on by bean counters wanting OPEX instead of CAPEX. They then run into actual cashflow expenses, less stability, more complex security (now you get IAM on top of basic networking!), and the ability for one underpaid person to easily do a lot of damage - because you're certainly not going to hire top-tier cloud talent - these are bean counters running things after all.
It makes it really clear why you so many data leaks via badly configured s3 buckets of dynamo tables...
Very large mature businesses that don’t see IT as a core function have probably outsourced management to a third party. There’s not much daylight between that third party’s margin and just paying a hyperscaler.
What I was surprised to find in some big orgs is the processes have not evolved to be cloud first. There is lack of maturity, still a chain of committees, approvals, and manual processes; risk management still treats the services as a giant intranet, deployments are not scripted, ad hoc designs. Resources are placed in vnets so that they resemble a system they already know, and comes with all the associated risks.
This is the reality IME. I'm currently in an org that has been "in the cloud" for over ten years but is only now architecting (some) new projects in a cloud-first way. Meanwhile there is big pressure to get out of our rented cages so there is even more lift-and-shift migration happening. My guess is that we eat roughly 5x as much compute as we would need with proper scaling, and paying cloud prices for almost all of it.
I've never worked in a data center that did cooling and power correctly. Everyone thinks they're doing it right, and then street power gets cut - there's significant impact, ops teams scramble to contain, and finally there's the finger-pointing.
The first time we tested cutting the power back in the day, the backup generator didn't fire! Turns out someone had pushed the big red stop button, which remains pushed in until reset.
That would have been a major problem if we'd had a nighttime power outage.
After that we ran regular switchover testing :)
The other time we ran into trouble was after someone drove a car into the local power substation. Our systems all ran fine for the immediate outage, but the power company's short term fix was to re-route power, which caused our voltage to be low enough for our UPS batteries to slowly drain without tripping over to the generator.
That was a week or two of manually pumping diesel into the generator tank so we could keep the UPS batteries topped up.
Or the electrician doing maintenance on the backup generator doesn't properly connect the bypass and no one notices until he disconnects the generator and the entire DC instantly goes quiet.
Or your DC provisions rack space without knowing which servers are redundant with which other servers, and suddenly when two services go from 10% CPU use to 100% CPU across ten servers the breaker for that circuit gives up entirely and takes down your entire business.
Yes it's possible. But it's not cheap. If you buy a bunch of UPS and a few generators (you need more than one, in case it doesn't start) and don't maintain them regularly and test them regularly that's when you get some bad surprises.
I mean; it's impossible to plan for everything, and I'd argue that if you actually did plan for everything; it would be so extraordinarily overbuilt that it couldn't be considered 'correct'.
You can have a 100Gb uplink on a dedicated fibre for less than 1000$/month now. Which is insanely less than cloud bandwidth. Of course there are tons of other costs, but that alone can suffice to justify moving out of the cloud for bandwidth intensive app.
We went to cloud because 1) we only need 3 infra guys to run our entire platform and 2) we can trivially scale up or down as needed. The first saves us hundreds of thousands in skilled labor and the second lets us take on new customers with thousands of agents in a matter of days without having to provision in advance.
1) You may more than pay for that labor in cloud costs, but you can also pretty easily operate rented dedicated hardware with a 3-man team if they know how to do it, the tools to scale are there they're just different.
2) I don't know what your setup looks like, but renting a dedicated server off of Hetzner takes a few minutes, maybe hours at most.
My personal opinion is that most workloads that have a load balancer anyways would be best suited to a mix of dedicated/owned infrastructure for baseline operation and dynamic scaling to a cloud for burst. The downsides to that approach are it requires all of skillset A (systems administration, devops) and some amount of skillset B (public cloud), and the networking constraints can be challenging depending on how state is managed.
With 3 people it’s basically impossible to build a ha storage solution that can scale to a certain amount - it’s also impossible to keep that maintained.
Just curious where do you get 100Gb with Internet transit and dedicated fiber for 1000$/month? I'm in a small town in eastern Germany and looked for a simple Gigabit fiber access for our office without any bandwidth guarantees and it's 1000€/month for 1Gb here with the most budget provider but with some nebulous bandwidth guarantees. I'm not talking about residential fiber that also very expensive after a certain threshold. I know there is init7 in Switzerland but it's the exception to the rule in Europe it seems. Getting a fast fiber and good transit is still expensive?
I'm in Switzerland, so maybe I am biased, I have 10Gbit/s dedicated on a 100Gbit/s link for about 600$/month. In practice I have 25Gbit/s with most datacenters in europe, 100Gbit/s with some that are close (OVH, Hetzner), and 10Gbit/s with the rest of the world.
Yes, but for example a 10Gbit/s pipe is about 3PB of transfer capacity per month which is about 150 000$/month in S3 traffic. A 40kW UPS which can handle about 2 racks (2x42U) of high density servers, with a generator cost about 50k$. A redundant link with your own AS so you can BGP should cost about 5k$ per month (at least here in switzerland).
Of course it really depends on the application, but if you host something like a streaming video service where bandwidth is the main factor, you can quickly reach a point where self hosting is cheaper.
I am in Switzerland where you have 25Gbit/s for about 70$/month so I understand that this might be an exception. But even if it is 10 000$/month, it is still widely cheaper than cloud bandwidth.
There are certain workloads that have never been really economical to run in cloud. Cloud economics is based on multi-tenancy, eg if you have a lot of hardware that is sitting idle a lot of the time, then cloud may be economical for you as the cloud provider can share it between you and others.
Cloud is also good for episodic use of expensive exotic systems like HPC and GPU fleets, if you don’t need them all the time- I call this serial multi-tenancy.
Cloud is not economical for massive storage, especially if you’re not willing to use backup solutions and reduced availability. For example, AWS S3 default keeps multiple copies of uploaded data; this is not comparable to typical on-premises RAID 1 or RAID 3. You can save money with reduced redundancy storage but then you have to take on more of the reliability burden. Likewise compute is cheap if you’re buying multi-tenant instances, but if you want dedicated instances or bare metal, then the economics aren’t nearly as attractive.
Cloud is also good for experimentation and rapid development - it’s so much faster to click a few buttons than to go through the hardware acquisition processes at many enterprises.
The companies that regret cloud due to financial concerns usually make two mistakes.
First, as noted above, they pay for premium services that are not directly comparable to on-prem, or they use workloads in cloud that are not cloud economical, or both.
Second, they don’t constrain random usage enough. It is super easy for a developer doing some testing to spin up thousands of dollars of bill. And it’s even worse if they leave it at the end of the day and go home- it’s still racking up hourly usage. And it’s downright ugly if they forget it and move on to something else. You have to be super disciplined to not spin up more than you need and turn it off as soon as you’re done with it.
> but if you want dedicated instances or bare metal
Multitenant instances on AWS statically partition the hardware (CPU, RAM, network), so tenants don't really share all that much. Memory bandwidth is probably the only really affected resource.
> Second, they don’t constrain random usage enough.
AWS now has billing alerts with per-hour resolution and automatic anomaly detection. There are third-party tools that do the same.
> Multitenant instances on AWS statically partition the hardware (CPU, RAM, network), so tenants don't really share all that much.
You are missing several points:
First, density. Cloud providers have huge machines that can run lots of VMs, and AWS in particular uses hardware (”Nitro”) for hypervisor functionality so they have very low overhead.
Cloud providers also don’t do “hardware” partitioning for many instance types. AWS sells “VCPUs” as the capacity unit; this is not necessarily a core, it may be time on a core.
Cloud providers can also over-provision; like airlines can sell more seats than exist on a plane, cloud providers can sell more VCPUs than cores on a machine, assuming (correctly) that the vast majority of instances will be idle most of the time, and they can manage noisy neighbors via live migration.
S3 has two more cost saving dimensions: How long will you commit to storing these exact bytes and how long are you willing to wait to get them. Either of those will allow you to reduce S3 costs without having to chance data loss due to AZ failure.
Most enterprises on prem already run VMware for virtualisation, it is the antiquated way of provisioning that affects how slow it is to spin something up on prem. And frequently these antiquated practices are carried to the cloud, negating any benefit.
> And frequently these antiquated practices are carried to the cloud, negating any benefit.
I should have brought that up too. Airlifting your stuff to the cloud and expecting cloud to run like your data center is a way to set yourself up for disappointment and expense. The cloud is something very different than your on-premise datacenter and many things that make sense on prem, do not make sense in cloud.
Chat, feeds and moderation run on AWS for us. Video on the other hand is bandwidth intensive. So we run the coordinator infra on AWS, but the SFU edge network on many different providers.
I think the cloud is good for some things, and not so great for others. S3 is fairly cost effective. RDS is expensive, bandwidth is crazy etc.
It’s the same old MBA cycle we had with onshoring / offshoring. Everyone wants to build their resume so they have to change things.
In this cycle a new MBA comes in wants to make an impact so does a cloud transition. Then they move on and the next guy comes in, wants to make an impact so moves things back in house. Repeat until some new fad comes along.
Recently, i've come to realize one real use of those clouds was to provide a good US-EU network connection. If you want to provide both continent users with correct bandwidth to your service, you have no choice but to have them connect to a datacenter on their own continent. Public data transit across the atlantic is simply miserable.
Then, because they probably have private atlantic cables, you can replicate at good reliable speed.
It doesn't seem to say in the article and it's not really discussed in these "LEAVING THE CLOUDS!!" articles, but what are these orgs doing for on-prem? Given the broadcom acquisition of vmware, rebuilding massive vsphere clusters like it's 2010 doesn't seem like a good long term play. Are they moving to kubernetes? Some other hypervisor?
At least in the case of 37signals, they went with colocated servers, some type of KVM and their own tool, Kamal, for containerized deployments without the complexity of kubernetes.
You can find one post here with many links at the bottom
This is partially the result of cloud providers and partially business leadership. They, for whatever reason, insufficiently educated their clients on migration requirements. Lift & shift from on-premises to cloud only work for emergency. The shifted resources must be converted to cloud stack, or the cost will be multiples of on-prem costs. Business leadership was (is?) ignoring IT teams screaming of the problem with lift & shift.
Now, businesses shifting back to on-prem because they are still uneducated on how to make cloud useful. They will just shift all non-core activities to XaaS vendors, reducing their own cloud managed solutions.
Source: dealing with multiple non-software, tech firms that are doing just that, shifting own things back to on-prem, non-core resources to XaaS.
I would guess that all of these companies that are moving back are throwing in the towel on their cloud migration/modernization plans under the guise of "repatriation" when it's really poor execution without any responsibility.
It was easy when everyone was spending cheap money for marketing and other vanity around moving to the cloud. But now that money costs something, and everyone has to control costs, repatriation is the new hotness when you want to save opex with capex. Cloud margins are org savings.
The trick is to not care, and be proficient as a technologist; you make money either way riding the hype cycle wave. Shades of Three Envelopes for the CIO and whomever these decisions and budgets roll up to.
(If you genuinely get value out of premium compute and storage at a cloud provider, you're likely going to keep doing that of course, startups, unpredictable workloads, etc)
> “Ten years into that journey, GEICO still hadn’t migrated everything to the cloud, their bills went up 2.5x, and their reliability challenges went up quite a lot too.”
yes this would make cloud cost a lot without any of the benefits lol
1. Egress cost. Cloud hosting providers have absolutely insane egress pricing. It's beyond stupid at this point, if you want to host anything bandwidth-intensive.
"Storage is cheap, but moving it ain't" is a quote a former co-worker frequently liked to remind people. The quote applied at the low level (eg between CPUs and their caches) all the way up to networking.
Anyways, cloud provider egress costs can be ridiculous. Amazon charges for egress transfer out of AWS, then quite a bit for NAT gateway transfer, and AWS network firewall on top of that (we dropped the firewall and moved our bulk traffic to a specific outer subnet because of that). Oh, and you can't give many serverless products (eg lambda) elastic IPs, so out the NAT gateway it goes...
They will want cloud-like APIs on-premises and most will implement OpenStack. The second wave of migrations to the cloud will be even quicker for these companies making their way back to on premises.
* AWS segment sales increased 19% year-over-year to $27.5 billion.
That means AWS brought in $4.3 BILLION more dollars in Q3 2024 vs 2023.
That's a huge amount of incremental revenue growth. If the net movement of workloads were out of the cloud, then it would have to show up in the results of Intel / TSMC / Equinix et. al.
I just took a look, and Equinix quarterly revenue is $2.1B.
Almost any story about cloud repatriation is a story about a failure of the market to act competitively rather than someone actually able to do it for less money than the cloud providers can.
The big providers margins are crazy, like over 50% which is normal for a software / service business but they are essentially hardware businesses.
"Major organizations like 37signals and GEICO". Sorry, what? Citing two companies? And how does a $37bn company compare to 37signals?
Such an odd pair of companies to choose. Is DHH friends with the author?
I'd be more interested in statistics about total cloud vs onprem spend across all companies, over time, to support assertion that "companies are ditching the cloud"
The statistics can be found in the public earnings of AWS vs the companies that would get paid for on-prem workloads (Equinix, Dell/HP/IBM, Intel etc).
I don't know that 37Signals counts as a "major enterprise". Their Cloud exodus can't have been more than a few dozen servers, right?
Meanwhile AWS is growing at 20%/year, Azure at 33% and GCP at 35%. That doesn't seem compatible with any kind of major cloud repatriation trend.
You can have multiple trends at once. Veteran cloud users leaving, international business onboarding.
How much of that is what technologists would consider "cloud" (IAAS, PAAS) versus what someone on the business side of things would consider "cloud" - office365, google gsuite, etc?
I’d suspect there is significant growth of businesses acting as intermediaries for cloud storage. I think that other software providers have also realized that ransoming users data is a great way to extract predictable, hedge-fund-owner-pleasing revenue without performing useful work.
AEC software providers all do this. ProjectWise is worse than owning or renting a plain file server in every way I can imagine, yet every consultant in transportation dutifully cuts Bentley a five-figure check or larger every year so they can hold your project files hostage and pretend to develop software.
I pray for a merciful asteroid to end it all.
Given that AWS is doing $100B in annual revenue and still growing at 17% YoY ... and they do NOT have a collaboration suite (office/gsuite) - it'd say at least for AWS it's nearly all IaaS/PaaS.
https://www.theregister.com/2024/05/01/amazon_q1_2024/
Not to naysay, any idea of that includes their own website? Just curious. I don’t az itself is the largest aws customer anymore.
I'd agree on IaaS/PaaS being the main driver. Id guess that everyone is running away from serverless offerings from all the main cloud providers. It's just day 1 lock in to a platform with no shared standards. It's very uncompetitive and kind of slow to innovate.
Amazon loves it when you run idle EC2 instances ($$$) rather than using Lambda.
Most real workloads I've seen (at 3 startups, and several teams at Amazon) have utilization under 10%.
That's really where you see that no answer is right across the board.
I worked at a very small startup years ago that leaned heavily on EC2. Our usage was pretty bipolar, the service was along the lines of a real-time game so we either had a very heavy work load or nothing. We stood up EC2 instances when games were lice and wound them down after.
We did use Lambda for a few things, mainly APIs that were rarely used or for processing jobs in an event queue.
Serverless has its place for sure, but in my experience it have been heavily over used the last 3-5 years.
We’re migrating over a hundred apps to Azure App Service.
One has an issue with the platform-enforced HTTP timeout maximum values.
I migrated that app back to a VM in an hour.
It turns out that the “integration” for something like App Service (or CloudRun or whatever) is mostly just best practices for any kind of hosting: parameters read from environment variables, immutable binaries with external config, stateless servers, read only web app folders, monitoring with APMs, etc…
Sure, you’ll experience lockin if you use Durable Functions or the similar Lambda features… but no worse than any other workflow or business rules platform.
Ask people how easy it is to get off BizTalk or MuleSoft…
I’ve worked with a few organisations that I’d call “late adopters” to the cloud, and it’s rare for them to use IAAS or even PAAS. It’s all SAAS and serverless, and while they all say they’re doing devops it’s almost always clickops.
For Azure, all of it. Microsoft clumps Azure together with their server software (e.g. Windows Server, SQL Server) licensing when reporting the revenue, but give more fine-grained information on growth rates. This is the latter. (We also know the Azure business was already massive at $34 billion in 2022, since it got revealed during one of Microsoft's ongoing antitrust cases.)
For Google, I'm not aware of a reliable way of estimating the GCP vs. Workspace numbers. But they get asked it during earnings calls, and the answer has always been that the GCP growth is substantially faster than the Workspace growth.
Afaik, MSFT shows growth in Azure and Office as separate things during earning reports, so the % mentioned before is just Azure, and 31% is huge.
"In parallel, GEICO, one of the largest automotive insurers in the United States, is actively repatriating many workloads from the cloud as part of a comprehensive architectural overhaul."
Is GEICO a major enterprise
> That doesn't seem compatible with any kind of major cloud repatriation trend.
Agreed. I don't think this is a real trend, at least not right now.
Also, fwiw, I'm really not a fan of these types of articles that identify like a small handful of people or organizations doing something different and calling it a "trend".
Submarine like articles trying to create a trend I suppose.
aws and other hyperscalers will keep growing, no doubt. Public cloud adoption is at around 20%. So the new companies that migrate into the cloud will keep the growth going. That doesn't deny the fact that some might be repatriating though. Especially ones that couldn't get the benefits out of the cloud.
One thing I've seen in every startup I've been in over the last decade is that cloud asset management is relatively poor. Now I'm not certain that enterprise is better or worse, but ultimately when I think back 10+ years ago resources were finite. With that limitation came self-imposed policing of utilization.
Looking at cloud infrastructure today it is very easy for organizations to lose sight on production vs frivolous workloads. I happen to work for an automation company that has cloud infrastructure monitoring deployed such that we get notified about the resources we've deployed and can terminate workloads via ChatOps. Even though I know that everyone in the org is continuously nagged about these workloads I still see tons of resources deployed that I know are doing nothing or could be commingled on an individual instance. But, since the cloud makes it easy to deploy we seem to gravitate towards creating a separation of work efforts by just deploying more.
This is/was rampant in every organization I've been a part of for the last decade with respect to cloud. The percentage of actual required, production workloads in a lot of these types of accounts is, I'd gather, less than 50% in many cases. And so I really do wonder how many organizations are just paying the bill. I would gather the Big cloud providers know this based on utilization metrics and I wonder how much cloud growth is actually stagnant workloads piling up.
It's a short simple post that comes down to this:
> Weekly explains that “just running legacy applications in the cloud is prohibitively expensive,” highlighting how lift-and-shift approaches often fail to deliver expected benefits.
Yes, if you have a mature business without active development at a scale where compute/storage costs is a substantial accounting line item, then it makes sense to run on hardware that doesn't have the flexibility and cost of the cloud.
There is an in-between that makes much more sense for most though. Running on provisioned bare metal. Lots of providers offer this as a better performance/price option where you don't have to deal with provisioning hardware but do everything else from the OS+maintenance and up.
At one company we used large bare-metal machine instances provisioned for stable parts of the application architecture (e.g. database and webapp instances) and the cloud for new development where it made sense to leverage capabilities, e.g. DynamoDB with cross-region replication.
I can't tell you how often I've run into cloud deployments that were lift-and-shifts, pushed on by bean counters wanting OPEX instead of CAPEX. They then run into actual cashflow expenses, less stability, more complex security (now you get IAM on top of basic networking!), and the ability for one underpaid person to easily do a lot of damage - because you're certainly not going to hire top-tier cloud talent - these are bean counters running things after all.
It makes it really clear why you so many data leaks via badly configured s3 buckets of dynamo tables...
Very large mature businesses that don’t see IT as a core function have probably outsourced management to a third party. There’s not much daylight between that third party’s margin and just paying a hyperscaler.
What I was surprised to find in some big orgs is the processes have not evolved to be cloud first. There is lack of maturity, still a chain of committees, approvals, and manual processes; risk management still treats the services as a giant intranet, deployments are not scripted, ad hoc designs. Resources are placed in vnets so that they resemble a system they already know, and comes with all the associated risks.
This is the reality IME. I'm currently in an org that has been "in the cloud" for over ten years but is only now architecting (some) new projects in a cloud-first way. Meanwhile there is big pressure to get out of our rented cages so there is even more lift-and-shift migration happening. My guess is that we eat roughly 5x as much compute as we would need with proper scaling, and paying cloud prices for almost all of it.
Kjell's Law: the cost of a platform eventually exceeds the cost of the one it replaced. But each cost is in a different budget.
We seem to have replaced cooling and power and a grumpy sysadmin with storage and architects and unhappy developers.
I've never worked in a data center that did cooling and power correctly. Everyone thinks they're doing it right, and then street power gets cut - there's significant impact, ops teams scramble to contain, and finally there's the finger-pointing.
The first time we tested cutting the power back in the day, the backup generator didn't fire! Turns out someone had pushed the big red stop button, which remains pushed in until reset.
That would have been a major problem if we'd had a nighttime power outage.
After that we ran regular switchover testing :)
The other time we ran into trouble was after someone drove a car into the local power substation. Our systems all ran fine for the immediate outage, but the power company's short term fix was to re-route power, which caused our voltage to be low enough for our UPS batteries to slowly drain without tripping over to the generator.
That was a week or two of manually pumping diesel into the generator tank so we could keep the UPS batteries topped up.
> then street power gets cut
Or the electrician doing maintenance on the backup generator doesn't properly connect the bypass and no one notices until he disconnects the generator and the entire DC instantly goes quiet.
Or your DC provisions rack space without knowing which servers are redundant with which other servers, and suddenly when two services go from 10% CPU use to 100% CPU across ten servers the breaker for that circuit gives up entirely and takes down your entire business.
The colo I’m used to has survived multiple switch overs to backup and then to diesel generators without a blip that I could detect.
I say “I’m used to” because having things there has spanned more than one job.
One power outage was days to a week. Don’t recall exactly.
It’s possible to do it right.
Yes it's possible. But it's not cheap. If you buy a bunch of UPS and a few generators (you need more than one, in case it doesn't start) and don't maintain them regularly and test them regularly that's when you get some bad surprises.
I mean; it's impossible to plan for everything, and I'd argue that if you actually did plan for everything; it would be so extraordinarily overbuilt that it couldn't be considered 'correct'.
We had happy developers before? Amazing.
They are Grumpy because now they are doing Sysadmin stuff
without the grumpy sysadmin they jump out more.
You can have a 100Gb uplink on a dedicated fibre for less than 1000$/month now. Which is insanely less than cloud bandwidth. Of course there are tons of other costs, but that alone can suffice to justify moving out of the cloud for bandwidth intensive app.
We went to cloud because 1) we only need 3 infra guys to run our entire platform and 2) we can trivially scale up or down as needed. The first saves us hundreds of thousands in skilled labor and the second lets us take on new customers with thousands of agents in a matter of days without having to provision in advance.
1) You may more than pay for that labor in cloud costs, but you can also pretty easily operate rented dedicated hardware with a 3-man team if they know how to do it, the tools to scale are there they're just different.
2) I don't know what your setup looks like, but renting a dedicated server off of Hetzner takes a few minutes, maybe hours at most.
My personal opinion is that most workloads that have a load balancer anyways would be best suited to a mix of dedicated/owned infrastructure for baseline operation and dynamic scaling to a cloud for burst. The downsides to that approach are it requires all of skillset A (systems administration, devops) and some amount of skillset B (public cloud), and the networking constraints can be challenging depending on how state is managed.
With 3 people it’s basically impossible to build a ha storage solution that can scale to a certain amount - it’s also impossible to keep that maintained.
Can you give a ballpark figure of what scale you have in mind?
Just curious where do you get 100Gb with Internet transit and dedicated fiber for 1000$/month? I'm in a small town in eastern Germany and looked for a simple Gigabit fiber access for our office without any bandwidth guarantees and it's 1000€/month for 1Gb here with the most budget provider but with some nebulous bandwidth guarantees. I'm not talking about residential fiber that also very expensive after a certain threshold. I know there is init7 in Switzerland but it's the exception to the rule in Europe it seems. Getting a fast fiber and good transit is still expensive?
I'm in Switzerland, so maybe I am biased, I have 10Gbit/s dedicated on a 100Gbit/s link for about 600$/month. In practice I have 25Gbit/s with most datacenters in europe, 100Gbit/s with some that are close (OVH, Hetzner), and 10Gbit/s with the rest of the world.
Running a service takes more than a fat pipe. You need to handle power outages, need redundant internet connections, ect, ect.
Yes, but for example a 10Gbit/s pipe is about 3PB of transfer capacity per month which is about 150 000$/month in S3 traffic. A 40kW UPS which can handle about 2 racks (2x42U) of high density servers, with a generator cost about 50k$. A redundant link with your own AS so you can BGP should cost about 5k$ per month (at least here in switzerland).
Of course it really depends on the application, but if you host something like a streaming video service where bandwidth is the main factor, you can quickly reach a point where self hosting is cheaper.
Yea, I call BS on 100Gb uplink for $1000. I have racked a lot of servers at different data centers. No way.
I am in Switzerland where you have 25Gbit/s for about 70$/month so I understand that this might be an exception. But even if it is 10 000$/month, it is still widely cheaper than cloud bandwidth.
https://www.init7.net/en/
There are certain workloads that have never been really economical to run in cloud. Cloud economics is based on multi-tenancy, eg if you have a lot of hardware that is sitting idle a lot of the time, then cloud may be economical for you as the cloud provider can share it between you and others.
Cloud is also good for episodic use of expensive exotic systems like HPC and GPU fleets, if you don’t need them all the time- I call this serial multi-tenancy.
Cloud is not economical for massive storage, especially if you’re not willing to use backup solutions and reduced availability. For example, AWS S3 default keeps multiple copies of uploaded data; this is not comparable to typical on-premises RAID 1 or RAID 3. You can save money with reduced redundancy storage but then you have to take on more of the reliability burden. Likewise compute is cheap if you’re buying multi-tenant instances, but if you want dedicated instances or bare metal, then the economics aren’t nearly as attractive.
Cloud is also good for experimentation and rapid development - it’s so much faster to click a few buttons than to go through the hardware acquisition processes at many enterprises.
The companies that regret cloud due to financial concerns usually make two mistakes.
First, as noted above, they pay for premium services that are not directly comparable to on-prem, or they use workloads in cloud that are not cloud economical, or both.
Second, they don’t constrain random usage enough. It is super easy for a developer doing some testing to spin up thousands of dollars of bill. And it’s even worse if they leave it at the end of the day and go home- it’s still racking up hourly usage. And it’s downright ugly if they forget it and move on to something else. You have to be super disciplined to not spin up more than you need and turn it off as soon as you’re done with it.
> but if you want dedicated instances or bare metal
Multitenant instances on AWS statically partition the hardware (CPU, RAM, network), so tenants don't really share all that much. Memory bandwidth is probably the only really affected resource.
> Second, they don’t constrain random usage enough.
AWS now has billing alerts with per-hour resolution and automatic anomaly detection. There are third-party tools that do the same.
> Multitenant instances on AWS statically partition the hardware (CPU, RAM, network), so tenants don't really share all that much.
You are missing several points:
First, density. Cloud providers have huge machines that can run lots of VMs, and AWS in particular uses hardware (”Nitro”) for hypervisor functionality so they have very low overhead.
Cloud providers also don’t do “hardware” partitioning for many instance types. AWS sells “VCPUs” as the capacity unit; this is not necessarily a core, it may be time on a core.
Cloud providers can also over-provision; like airlines can sell more seats than exist on a plane, cloud providers can sell more VCPUs than cores on a machine, assuming (correctly) that the vast majority of instances will be idle most of the time, and they can manage noisy neighbors via live migration.
And lots of other more esoteric stuff.
S3 has two more cost saving dimensions: How long will you commit to storing these exact bytes and how long are you willing to wait to get them. Either of those will allow you to reduce S3 costs without having to chance data loss due to AZ failure.
Most enterprises on prem already run VMware for virtualisation, it is the antiquated way of provisioning that affects how slow it is to spin something up on prem. And frequently these antiquated practices are carried to the cloud, negating any benefit.
> And frequently these antiquated practices are carried to the cloud, negating any benefit.
I should have brought that up too. Airlifting your stuff to the cloud and expecting cloud to run like your data center is a way to set yourself up for disappointment and expense. The cloud is something very different than your on-premise datacenter and many things that make sense on prem, do not make sense in cloud.
Chat, feeds and moderation run on AWS for us. Video on the other hand is bandwidth intensive. So we run the coordinator infra on AWS, but the SFU edge network on many different providers.
I think the cloud is good for some things, and not so great for others. S3 is fairly cost effective. RDS is expensive, bandwidth is crazy etc.
(5M a year spend on AWS atm.)
It’s the same old MBA cycle we had with onshoring / offshoring. Everyone wants to build their resume so they have to change things.
In this cycle a new MBA comes in wants to make an impact so does a cloud transition. Then they move on and the next guy comes in, wants to make an impact so moves things back in house. Repeat until some new fad comes along.
Recently, i've come to realize one real use of those clouds was to provide a good US-EU network connection. If you want to provide both continent users with correct bandwidth to your service, you have no choice but to have them connect to a datacenter on their own continent. Public data transit across the atlantic is simply miserable.
Then, because they probably have private atlantic cables, you can replicate at good reliable speed.
It doesn't seem to say in the article and it's not really discussed in these "LEAVING THE CLOUDS!!" articles, but what are these orgs doing for on-prem? Given the broadcom acquisition of vmware, rebuilding massive vsphere clusters like it's 2010 doesn't seem like a good long term play. Are they moving to kubernetes? Some other hypervisor?
At least in the case of 37signals, they went with colocated servers, some type of KVM and their own tool, Kamal, for containerized deployments without the complexity of kubernetes.
You can find one post here with many links at the bottom
https://basecamp.com/cloud-exit
Possibly some amount of Triton and Oxide
I prefer this https://blogs.idc.com/2024/10/28/storm-clouds-ahead-missed-e... more nuanced article.
I can see how AI workloads makes clouds look expensive.
This is partially the result of cloud providers and partially business leadership. They, for whatever reason, insufficiently educated their clients on migration requirements. Lift & shift from on-premises to cloud only work for emergency. The shifted resources must be converted to cloud stack, or the cost will be multiples of on-prem costs. Business leadership was (is?) ignoring IT teams screaming of the problem with lift & shift.
Now, businesses shifting back to on-prem because they are still uneducated on how to make cloud useful. They will just shift all non-core activities to XaaS vendors, reducing their own cloud managed solutions.
Source: dealing with multiple non-software, tech firms that are doing just that, shifting own things back to on-prem, non-core resources to XaaS.
Seems like CIOs are finally listening to the Grey beards.
I would guess that all of these companies that are moving back are throwing in the towel on their cloud migration/modernization plans under the guise of "repatriation" when it's really poor execution without any responsibility.
It was easy when everyone was spending cheap money for marketing and other vanity around moving to the cloud. But now that money costs something, and everyone has to control costs, repatriation is the new hotness when you want to save opex with capex. Cloud margins are org savings.
The trick is to not care, and be proficient as a technologist; you make money either way riding the hype cycle wave. Shades of Three Envelopes for the CIO and whomever these decisions and budgets roll up to.
https://kevinkruse.com/the-ceo-and-the-three-envelopes/
(If you genuinely get value out of premium compute and storage at a cloud provider, you're likely going to keep doing that of course, startups, unpredictable workloads, etc)
Our poor strategic planning for cases where migration wasn’t necessary/feasible in the first place
> “Ten years into that journey, GEICO still hadn’t migrated everything to the cloud, their bills went up 2.5x, and their reliability challenges went up quite a lot too.”
yes this would make cloud cost a lot without any of the benefits lol
The article is incredibly thin on details.
In my experience, it comes down to two factors:
1. Egress cost. Cloud hosting providers have absolutely insane egress pricing. It's beyond stupid at this point, if you want to host anything bandwidth-intensive.
2. Storage pricing.
"Storage is cheap, but moving it ain't" is a quote a former co-worker frequently liked to remind people. The quote applied at the low level (eg between CPUs and their caches) all the way up to networking.
Anyways, cloud provider egress costs can be ridiculous. Amazon charges for egress transfer out of AWS, then quite a bit for NAT gateway transfer, and AWS network firewall on top of that (we dropped the firewall and moved our bulk traffic to a specific outer subnet because of that). Oh, and you can't give many serverless products (eg lambda) elastic IPs, so out the NAT gateway it goes...
So. Frustrating.
I think non-cloud is the new monolith, which is fantastic.
They will want cloud-like APIs on-premises and most will implement OpenStack. The second wave of migrations to the cloud will be even quicker for these companies making their way back to on premises.
Meanwhile, from Q3 Amazon earnings:
* AWS segment sales increased 19% year-over-year to $27.5 billion.
That means AWS brought in $4.3 BILLION more dollars in Q3 2024 vs 2023.
That's a huge amount of incremental revenue growth. If the net movement of workloads were out of the cloud, then it would have to show up in the results of Intel / TSMC / Equinix et. al.
I just took a look, and Equinix quarterly revenue is $2.1B.
Here people like arguing their opinions as if they're facts instead of using evidence (public proof) to support their argument.
Oh, correlation.
Almost any story about cloud repatriation is a story about a failure of the market to act competitively rather than someone actually able to do it for less money than the cloud providers can. The big providers margins are crazy, like over 50% which is normal for a software / service business but they are essentially hardware businesses.
"Major organizations like 37signals and GEICO". Sorry, what? Citing two companies? And how does a $37bn company compare to 37signals?
Such an odd pair of companies to choose. Is DHH friends with the author?
I'd be more interested in statistics about total cloud vs onprem spend across all companies, over time, to support assertion that "companies are ditching the cloud"
A very poor article
The statistics can be found in the public earnings of AWS vs the companies that would get paid for on-prem workloads (Equinix, Dell/HP/IBM, Intel etc).
I don't think the article concludes that companies are ditching the cloud.. :)
It is not reputable article. Click bait.
Serious businesses are not doing this.