On the other hand, it's also a strong signal that someone believes it's enough of a problem that they're willing to take time and effort to do something about it.
This seems like a weird way to say that speed is a problem.
I think there's miscommunication. Maybe I'm misunderstanding. But when I say speed is a problem I mean that the world is complex and to go fast you have to cut corners. There's a fundamental truth that I think many people forget: as civilization advances, complexity increases. Think about this from a simple way. When you are in a new domain you can get away with low order approximations. They're better than... well... nothing. Which is what you had before. But as we get better you have to take higher order approximations to improve your results, right? And those higher order approximations almost always increase complexity. So that's the problem.
The need for speed makes you overlook small details. But as we get better those details matter more and more. They're counter to each other. There's strategies that are good but they have nuance too and it's like cliques were we forget the second half. Move fast and break things is great. You learn fast when tearing things apart. But you left a giant mess behind you. If you don't clean it up then the mess just builds. It's way faster and cheaper to clean up now than later. Just like cleaning a dish is way easier right after using it than when it's had time for the grime to set in and harden. Which, of course, this all compounds complexity, like the author states is the problem. Tech debt has interest but we want to pretend the debt doesn't exist.
I'm convinced this is what's leading to so much shitware. There's little pressure to actually improve because of centralization but everyone feels the need to move fast, so we move fast into nowhere. We don't want to fix problems, or even acknowledge them, because "new features" is more rewarding.
The SRE team at our company decided to adopt k8s and Helm chart and ArgoCD and whatever. That's cool. Do whatever you want. A few months later they told us, "since we are extremely busy and undermanned, we ask app teams to take care of those helm charts and TF codes from now on." Wait what?
With Helm, yea, that's awful to inflict on developers, use Kustomize with Templates instead (https://kustomize.io/)
However, like most companies, Ops is easy place to cut and they commonly reduce us below outstanding workload. However, when they do, they still push the dev teams to deliver so here we are.
Except that kustomize is very limited in annoying ways. Is very badly documented. And is not the same between the `kubectl apply` and `kustomize`. At the end you will pipe through `sed` anyways.
That limitation makes it easier for the devs to understand and use. I've never seen kubectl kustomize be any different locally then what GitOps/Build system will apply to the cluster. Helm is all over the place because templating engine could connect to Kubernetes API and what you see locally is not what you will get on cluster.
Helm is great idea and awesome for software that is not exactly sure what cluster environment will look. However, for internal software, almost all companies have clear defined Kubernetes environments where you don't have to worry about any of that and kustomize works great.
Use least amount of power instead of most amount of power. If kustomize no longer works for you, Helm is always there if you feel like torturing everyone. FluxCD/Argo support both happily.
To be clear, I didn't not mean to imply that I find Helm to be good at all. I am with you on all the negatives here.
I think the general idea behind kustomize is great. Sadly the execution is a mess of complicated code, with a billion of useless abstractions. Culminating into a tool that doesn't do what it promises. Including infinite loops. Yes I have had it get stuck on infinite loops during text matching.
And before anybody tells me to contribute to kustomize instead of complaining:
1) pay me for it.
2) actually don't, I don't want to touch this codebase. Reading it was traumatizing enough.
cdk8s+ ... you get type checking with TypeScript, you can do whatever because you have a normal runtime (Node/Deno/Bun) no need to live with crude tools
Would you consider that OPS or DevOps? It's DevOps that's insane with all that bullshit, but I always felt like regular OPS teams just kinda keep it pretty classic/standard. The under-reported story on HN is that devops has been just as bad as other more popular punching bags for churn. We went wrong somewhere in the mid 2010s with everything, that's just my feeling.
Welcome to the travelling SRE Circus! We travel to the smallest of villages! Our attractions are plenty, our maintainers are few, so please sign the waiver presented to you!
/ Greetings disillusioned dev once excited by the potential simplifications the “cloud” could bring, but all I got was a Fear and Loathing in Las Vegas level binge in bloat and this T-shirt.
My life is currently dominated by firefighting. I'd like to be at the point where I am worried about bad code changes and not infrastructure fires, but we will need to work our way through Maslow's Hierarchy first.
This is not a technical problem. Try to find out who is accountable. Who gets fired when the software doesn't perform the way it should. Notice nobody gets fired. Because the whole chain of command is afraid of accountability for themselves.
Accountability is definitely a problem. Now we blame the “process”. The solution to every problem created by the faulty “process” is another bureaucracy, another validation, another tool, or another person in the decision making meeting.
And so, everything from Hollywood blockbusters to software solutions is created by committee. The end products are all box-checking, lowest common denominator throwaway garbage. No risks are taken. And very little value is delivered.
Note that if you ask the CEO if you, the owner of a project can be accountable for it. They might say yes quite happily. But when you ask for the power to be able to enforce it, they will refuse.
You need power to exercice accountability. Otherwise anybody can come shit in your codebase, and you are now accountable for it. Because you do not have the power to refuse the turd.
Who do these teams report to? Either someone at a higher level instituted guidelines which allowed for inconsistent stacks, or someone at a lower level made the decision to disregard these guidelines and upper management took no action at the appropriate time to address the situation.
Either decision could easily be justifiable. The higher ups may have deliberately wanted to assure the teams were not stifled, or a team may have had a goal which demanded an exception to the rule. Still though, somebody had to have made the call and should therefore be able to justify it.
If an organization does not know who is responsible for something, then that is a failing at an even higher level.
People who don't know better often make the mistake of thinking there's a tunable tradeoff between quality and velocity. This is false. To go fast you need to execute competently. The result is both high quality and high velocity. If you try to trade one for the other you get neither.
Vim Startify recently showed me a quip it attributed to Alan Kay. It went something like this:
Modern software is similar to the pyramids: it consists of millions of blocks held together by nothing but sheer weight and the labor of thousands of slaves.
The problem is incentives and authority. It used to be that Ops owned uptime, and therefore it was their right to tell Engineering to go fuck themselves when they asked to put some spacecamp graph database or whatever in production. Now everyone is "devops" and nobody can tell anyone anything.
I've never actually worked in an org where infra could overrule product in such a way. If some director wants a whizbang new database your job is to facilitate the resulting tire fire.
Yep. It's all about the codebase. Even if you know the codebase very well, its complexity can still constrain you significantly in terms of speed.
For projects which I've built from scratch, I move blazingly fast. I can get more done on the weekend one day per week on a complex project than I can get done at my full time day job 5 days per week on a simpler project.
The difference is that in my side project, there is 0 unnecessary complexity. All the difficulties I face are intrinsic, unavoidable barriers to solving the problem.
In my day job, 90% of the complexity is not intrinsic to the problem but are created by the code /architecture itself.
I've been saying this over and over for years but it just doesn't seem to register in people's heads just how important the foundations are. The word "technical debt" is a highly accurate way to describe the issue because the costs compound, just like real debt which is not serviced regularly.
If you join a project which has too much technical debt, you can easily end up in a situation where you have 100 engineers moving at the same pace as 1 engineer could working on their own project.
So the 10x or even 100x engineer is real but you can only discover them if they can work on their own codebase from scratch.
I've seen several times people who were very average developers working for a company who later became 10x devs working on their own startup. I've also seen people who were same speed everywhere; those aren't 10x devs. At best they're a 2x dev because maybe they can churn out buggy tech-debt-ridden code at twice the rate as a normal dev. I've worked with devs who could code at twice the speed (and sustained); large amounts of code but it had bugs and eventually, everyone on the team ended up working at half the speed... Requires twice the amount of code to implement the same feature with lower quality.
Also, sometimes a company may have one person on the team who comes across as highly knowledgeable, good communicator but maybe a bit slow and they don't seem like a 10x dev... But remove that person from the team and after a year or two the whole team is much slower and nobody can figure out why. 10x devs are team productivity multipliers. They make everyone around them approach 10x speed.
So here is another dashboard because only thing prohibiting the chaos from getting under control was lack of dashboard. As if 200 Grafana dashboards were not enough.
What's preventing the Chaos from getting under control is political desire to do so. Google ships 5 times a day? Why don't we? /s
This one sentence throws the entire article's objectivity into question: "This is why today we’re announcing Earthly Lunar."
On the other hand, it's also a strong signal that someone believes it's enough of a problem that they're willing to take time and effort to do something about it.
Yep. Generally speaking, I think that blog posts under company domains are just advertisements.
Someone's investing money to solve this problem, hoping there's real value for customers here. It's not objective, but it's committed.
This seems like a weird way to say that speed is a problem.
I think there's miscommunication. Maybe I'm misunderstanding. But when I say speed is a problem I mean that the world is complex and to go fast you have to cut corners. There's a fundamental truth that I think many people forget: as civilization advances, complexity increases. Think about this from a simple way. When you are in a new domain you can get away with low order approximations. They're better than... well... nothing. Which is what you had before. But as we get better you have to take higher order approximations to improve your results, right? And those higher order approximations almost always increase complexity. So that's the problem.
The need for speed makes you overlook small details. But as we get better those details matter more and more. They're counter to each other. There's strategies that are good but they have nuance too and it's like cliques were we forget the second half. Move fast and break things is great. You learn fast when tearing things apart. But you left a giant mess behind you. If you don't clean it up then the mess just builds. It's way faster and cheaper to clean up now than later. Just like cleaning a dish is way easier right after using it than when it's had time for the grime to set in and harden. Which, of course, this all compounds complexity, like the author states is the problem. Tech debt has interest but we want to pretend the debt doesn't exist.
I'm convinced this is what's leading to so much shitware. There's little pressure to actually improve because of centralization but everyone feels the need to move fast, so we move fast into nowhere. We don't want to fix problems, or even acknowledge them, because "new features" is more rewarding.
It's just a waste of a lot of money and time
The SRE team at our company decided to adopt k8s and Helm chart and ArgoCD and whatever. That's cool. Do whatever you want. A few months later they told us, "since we are extremely busy and undermanned, we ask app teams to take care of those helm charts and TF codes from now on." Wait what?
As Ops person, I both feel for you and them.
With Helm, yea, that's awful to inflict on developers, use Kustomize with Templates instead (https://kustomize.io/)
However, like most companies, Ops is easy place to cut and they commonly reduce us below outstanding workload. However, when they do, they still push the dev teams to deliver so here we are.
Except that kustomize is very limited in annoying ways. Is very badly documented. And is not the same between the `kubectl apply` and `kustomize`. At the end you will pipe through `sed` anyways.
That limitation makes it easier for the devs to understand and use. I've never seen kubectl kustomize be any different locally then what GitOps/Build system will apply to the cluster. Helm is all over the place because templating engine could connect to Kubernetes API and what you see locally is not what you will get on cluster.
Helm is great idea and awesome for software that is not exactly sure what cluster environment will look. However, for internal software, almost all companies have clear defined Kubernetes environments where you don't have to worry about any of that and kustomize works great.
Use least amount of power instead of most amount of power. If kustomize no longer works for you, Helm is always there if you feel like torturing everyone. FluxCD/Argo support both happily.
To be clear, I didn't not mean to imply that I find Helm to be good at all. I am with you on all the negatives here.
I think the general idea behind kustomize is great. Sadly the execution is a mess of complicated code, with a billion of useless abstractions. Culminating into a tool that doesn't do what it promises. Including infinite loops. Yes I have had it get stuck on infinite loops during text matching.
And before anybody tells me to contribute to kustomize instead of complaining: 1) pay me for it. 2) actually don't, I don't want to touch this codebase. Reading it was traumatizing enough.
Both rub me the wrong way. Isn't there something better?
cdk8s+ ... you get type checking with TypeScript, you can do whatever because you have a normal runtime (Node/Deno/Bun) no need to live with crude tools
Would you consider that OPS or DevOps? It's DevOps that's insane with all that bullshit, but I always felt like regular OPS teams just kinda keep it pretty classic/standard. The under-reported story on HN is that devops has been just as bad as other more popular punching bags for churn. We went wrong somewhere in the mid 2010s with everything, that's just my feeling.
Companies thought devops was about headcount reduction.
There are now a lot of unpatched systems running out there that need their applications replatformed. Won't happen.
Welcome to the travelling SRE Circus! We travel to the smallest of villages! Our attractions are plenty, our maintainers are few, so please sign the waiver presented to you!
/ Greetings disillusioned dev once excited by the potential simplifications the “cloud” could bring, but all I got was a Fear and Loathing in Las Vegas level binge in bloat and this T-shirt.
My life is currently dominated by firefighting. I'd like to be at the point where I am worried about bad code changes and not infrastructure fires, but we will need to work our way through Maslow's Hierarchy first.
Have you tried positive affirmations for SREs? A classic by Krazam.
You will never get at that point, because attaining zen doesn't work that way.
Accountability is what you are looking for.
This is not a technical problem. Try to find out who is accountable. Who gets fired when the software doesn't perform the way it should. Notice nobody gets fired. Because the whole chain of command is afraid of accountability for themselves.
Accountability is definitely a problem. Now we blame the “process”. The solution to every problem created by the faulty “process” is another bureaucracy, another validation, another tool, or another person in the decision making meeting.
And so, everything from Hollywood blockbusters to software solutions is created by committee. The end products are all box-checking, lowest common denominator throwaway garbage. No risks are taken. And very little value is delivered.
Note that if you ask the CEO if you, the owner of a project can be accountable for it. They might say yes quite happily. But when you ask for the power to be able to enforce it, they will refuse.
You need power to exercice accountability. Otherwise anybody can come shit in your codebase, and you are now accountable for it. Because you do not have the power to refuse the turd.
And the flip side of that is gatekeeping, wherein nothing gets done. As usual, there is a balance to be struck.
IME a little judicious gatekeeping can actually be the thing that makes anything get done at all
Who do you hold accountable when all the product teams are choosing their own tech stack and there's no consistency?
It doesn't matter what they choose. As long as they are accountable for the consequences of their choice.
And the company probably pays a platform team to offer a limited but supported tech stack.
The platform team is accountable for balancing the deprecation of outdated dependencies and the productivity of their users.
Who do these teams report to? Either someone at a higher level instituted guidelines which allowed for inconsistent stacks, or someone at a lower level made the decision to disregard these guidelines and upper management took no action at the appropriate time to address the situation.
Either decision could easily be justifiable. The higher ups may have deliberately wanted to assure the teams were not stifled, or a team may have had a goal which demanded an exception to the rule. Still though, somebody had to have made the call and should therefore be able to justify it.
If an organization does not know who is responsible for something, then that is a failing at an even higher level.
That's good actually. Groupthink and forced consistency are way worse.
Speed is chaos.
Slow is smooth, smooth is fast.
It’s just not the only form of chaos.
To go fast, you must first be able to go slow......
People who don't know better often make the mistake of thinking there's a tunable tradeoff between quality and velocity. This is false. To go fast you need to execute competently. The result is both high quality and high velocity. If you try to trade one for the other you get neither.
Vim Startify recently showed me a quip it attributed to Alan Kay. It went something like this:
Modern software is similar to the pyramids: it consists of millions of blocks held together by nothing but sheer weight and the labor of thousands of slaves.
*The problem with modern software engineering
Boeing's Starliner team would like a word.
Actually a lot of teams at Boeing would like a word.
The claimed principle, if true might apply to other types of projects besides software.
The article (glorified ad piece), however, is entirely about software engineering.
Is Earthly going all in on Lunar? The Lunar link is redirecting to the home page: https://earthly.dev/earthly-lunar
The problem is incentives and authority. It used to be that Ops owned uptime, and therefore it was their right to tell Engineering to go fuck themselves when they asked to put some spacecamp graph database or whatever in production. Now everyone is "devops" and nobody can tell anyone anything.
I've never actually worked in an org where infra could overrule product in such a way. If some director wants a whizbang new database your job is to facilitate the resulting tire fire.
And you bet the director will never be held accountable.
Incentives: you lose a nine of uptime you (ops director) lose a finger.
Authority: you, ops director, decide what goes in production and what doesn't.
I've seen it but it's pretty rare. It got worse when Amazon/Google promotion systems started leaking into rest of business borg groupthink.
Yep. It's all about the codebase. Even if you know the codebase very well, its complexity can still constrain you significantly in terms of speed.
For projects which I've built from scratch, I move blazingly fast. I can get more done on the weekend one day per week on a complex project than I can get done at my full time day job 5 days per week on a simpler project.
The difference is that in my side project, there is 0 unnecessary complexity. All the difficulties I face are intrinsic, unavoidable barriers to solving the problem.
In my day job, 90% of the complexity is not intrinsic to the problem but are created by the code /architecture itself.
I've been saying this over and over for years but it just doesn't seem to register in people's heads just how important the foundations are. The word "technical debt" is a highly accurate way to describe the issue because the costs compound, just like real debt which is not serviced regularly.
If you join a project which has too much technical debt, you can easily end up in a situation where you have 100 engineers moving at the same pace as 1 engineer could working on their own project.
So the 10x or even 100x engineer is real but you can only discover them if they can work on their own codebase from scratch.
I've seen several times people who were very average developers working for a company who later became 10x devs working on their own startup. I've also seen people who were same speed everywhere; those aren't 10x devs. At best they're a 2x dev because maybe they can churn out buggy tech-debt-ridden code at twice the rate as a normal dev. I've worked with devs who could code at twice the speed (and sustained); large amounts of code but it had bugs and eventually, everyone on the team ended up working at half the speed... Requires twice the amount of code to implement the same feature with lower quality.
Also, sometimes a company may have one person on the team who comes across as highly knowledgeable, good communicator but maybe a bit slow and they don't seem like a 10x dev... But remove that person from the team and after a year or two the whole team is much slower and nobody can figure out why. 10x devs are team productivity multipliers. They make everyone around them approach 10x speed.
I wish I had that problem, instead I have a speed problem.
But if I solved the speed problem I might bitch about chaos. And if I solved the chaos problem I might bitch about freedom.
My definition of a solution is to trade a problem you have for a problem you'd rather have.
It's falling up all the way down!
So here is another dashboard because only thing prohibiting the chaos from getting under control was lack of dashboard. As if 200 Grafana dashboards were not enough.
What's preventing the Chaos from getting under control is political desire to do so. Google ships 5 times a day? Why don't we? /s
Link is glorified ad piece
[flagged]