I’ve been in this industry a long time. I’ve read Lying with Statistics, and a bunch of Tufte. I don’t think it would be too much hyperbole to say I’ve spent almost a half a year of cumulative professional time (2-3 hours a month) arguing with people about bad graphs. And it’s always about the same half dozen things or variants on them.
The starting slope of the line in your carefulness graph has no slope. Which means you’re basically telling X that we can turn carefulness to 6 with no real change in delivery date. Are you sure that’s the message you’re trying to send?
Managers go through the five stages of grief every time they ask for a pony and you counteroffer with a donkey. And the charts often offer them a pony instead of a donkey. Doing the denial, anger and bargaining in a room full of people becomes toxic, over time. It’s a self goal but bouncing it off the other team’s head. Don’t do that.
> The starting slope of the line in your carefulness graph has no slope. Which means you’re basically telling X that we can turn carefulness to 6 with no real change in delivery date.
This strikes me as a pedantic argument, since the graph was clearly drawn by hand and is meant to illustrate an upward curving line. Now, maybe there's essentially no clear difference between 5 and 5.1, but when you extrapolate out to where 6 would be (about 65 pixels to the right of 5, if I can be pedantic for a moment), there actually is a difference.
Doesn't the flat line in this context mean that you're at a local minimum, which is where you want to stay? Where being less careful would take more time due to increased number of incidents.
And I've done high school calculus. It's a picture of a parabola, the derivative is very small near the minimum, and it can look kinda flat if the picture isn't perfectly drawn.
Principles of Product Development makes the point that a lot of real world tradeoffs in your development process are U-shaped curves, which implies that you will have very small costs for missing the optimum by a little. A single decision that you get wrong by a lot is likely to dominate those small misses.
A more sensible way to present the idea is to put the turning point of the parabola at the origin of the graph and then show that 5 is somewhere on the line of super-linearly increasing schedule risk.
The article stipulates that 5 is the value that minimizes execution time.
It could have put that on the y-axis, and labeled the left “extremely rushed” and the right side “extremely careful”. Maybe that would’ve been clearer, though I really think it’s clear if are charitable and don’t assume the author has made a mistake.
It’s a picture of a parabola if someone put the y axis at the dotted line not the origin. If you want to bargain with people - and this fictional conversation is a negotiation - then don’t anchor the people on bad values.
The article stipulates that 5 is the value that minimizes execution time. So the value between 0 and 5 would’ve been higher. It doesn’t intersect the x-axis because you can’t finish the task in zero time.
See the quote below.
That said, while I didn’t think it was a confusing drawing, I now wish he’d drawn the rest of the parabola, because it would’ve prevented this whole conversation.
> EM: Woah! That’s no good. Wait, if we turn the carefulness knob down, does that mean that we can go even faster?
> TL: If we did that, we’d just be YOLO’ing our changes, not doing validation. Which means we’d increase the probability of incidents significantly, which end up taking a lot of time to deal with. I don’t think we’d actually end up delivering any faster if we chose to be less careful than we normally are.
In general, I found that when I've told people to be careful on that code path (because it has bitten me before) I don't get the sense that it is a welcomed warning.
It's almost as if I'm questioning their skill as a engineer.
I don't know about you but when I'm driving a road and there is black ice around the corner a warning from a fellow driver is welcomed.
I like the idea of having an actual 'carefulness knob' prop and making the manager asking for faster delivery/more checks actually turn the knob themselves, to emphasise that they're the one responsible for the decision.
Yep. The best way to pushback is to ask your manager, "We'll do it fast since you are asking for it. What is the plan for contingencies in case things break?"
I did a lot of the work in my 40 year software career as an individual, which meant it was on me to estimate the time of the task. My first estimate was almost always an "If nothing goes wrong" estimate. I would attempt to make a more accurate estimate by asking myself "is there a 50% chance I could finish early?". I considered that a 'true' estimate, and could rarely bring myself to offer that estimate 'up the chain' (I'm a wimp ...). When I hear "it's going to be tight for Q2", in the contexts I worked in, that meant "there's no hope". None of this invalidates the notion of a carefulness knob, but I do kinda laugh at the tenor of the imagined conversations that attribute a lot more accuracy to the original estimate that I ever found in reality in my career. Retired 5 years now, maybe some magic has happened while I wasn't looking.
More than once I've used the xkcd method (Pull a gut number out of thin air, then double the numerator and increment the unit e.g. 1 hour -> 2 days, 3 weeks -> 6 months). When dealing with certain customers this has proven disappointingly realistic.
Fwiw in a real world scenario it'd be more helpful to hear "the timeline has risks" alongside a statement of a concrete process you might not be doing given that timeline. Everyone already knows about diminishing returns, we don't need a lesson on that.
and you'd be amazed when you start really having these discussions with the client how often stuff ends up not only not being needed, but often going right past 'nice to have' and right to 'lets not'. The problem is often the initial problem being WAY overspecified in ways that don't ACTUALLY matter but generate tons of extra work.
Yeah, to me this kind of thing is much better than the carefulness knob.
Delaying or just not doing certain features that have low ROI can drastically shorten the development time without really affecting quality.
This is something that as an industry we seem to have unlearned. Sure, it still exists in the startup space, with MVPs, but elsewhere it's very difficult. In the last 20 years I feel like engineers have been pushed more and more away from the client, and very often you just get "overspecified everything" from non-technical Product Managers and have to sacrifice in quality instead.
I had one today that with one email went from "I want this whole new report over our main entity type with 3 user specified parameters" to "actually, just add this one existing (in the db) column to this one existing report and that totally solves my actual problem". My time going from something like 2 days to 15 minutes + 10 minutes to write the email.
Lorin is always on point, and I appreciate the academic backing he brings to the subject. But for how many years do we need to tell MBAs that "running with scissors is bad" before it becomes common knowledge? (Too damn many.)
It’s not the right approach. Structural engineers shouldn’t let management fiddle with their safety standards to increase speed. They will still blame you when things fail. In software, you can’t just throw in yolo projects with much lower “carefulness” than the rest of the product, everything has maintenance. The TL in this case needs to establish a certain set of standards and practices. That’s not a choice you give away to another team on a per-feature basis.
It’s also a ridiculous low bar for engineering managers to not even understand the most fundamental of tradeoffs in software. Of course they want things done faster, but then they can go escalate to the common boss/director and argue about prioritization against other things on the agenda. Not just “work faster”. Then they can go manage those whose work output is proportional to stress, not programmers.
Management decides whether they build a cheap wooden building, a brick one, or a steel skyscraper. These all have different risk profiles.
Safety is a business/management decision, even in structural engineering. A pedestrian bridge could be constructed to support tanks and withstand nuclear explosions, but why. Many engineered structures are actually extremely dangerous - for example mountain climbing trails.
Also yes, you have many opportunities to just YOLO without significant consequences in software. A hackathon is a good example - I love them, always great to see the incredible projects at the end. The last one I visited was sponsored by a corporation and they straight up incorporated a startup next day with the winning team.
If management intends expected use to be a low-load-quick-and-dirty-temporary-use prototype to be delivered in days, it seems the engineers are not doing their job if they calibrate their safety process to a heavy-duty-life-critical application. And vice versa.
Making the decision about the levels of use, durability, reuse-ability, scalability, AND RISK is all management. Implementing those decisions as decided by management is on engineering. It is not on engineering to fix a bad-trade-off management decision beyond what is reasonably possible (if you can, great, but go look to work someplace less exploitative).
One of my guys made a mistake while deploying some config changes to Production and caused a short outage for a Client.
There's a post-incident meeting and the client asks "what are we going to do to prevent this from happening in the future?" - probably wanting to tick some meeting boxes.
My response: "Nothing. We're not going to do anything."
The entire room (incl. my side) looks at me. What do I mean, "Nothing?!?".
I said something like "Look, people make mistakes. This is the first time that this kind of mistake had happened. I could tell people to double-check everything, but then everything will be done twice as slowly. Inventing new policies based on a one-off like this feels like an overreaction to me. For now I'd prefer to close this one as human error - wontfix. If we see a pattern of mistakes being made then we can talk about taking steps to prevent them."
In the end the conceded that yeah, the outage wasn't so bad and what I said made sense. Felt a bit proud for pushing back :)
If you want to go full corporate, and avoid those nervous laughs and frowns from people who can't tell if you're being serious or not, I recommend dressing it up a little.
Corollary is that Risk Management is a specialist field. The least risky thing to do is always to close down the business (can't cause an incident if you have no customers).
Engineers and product folk, in particular, I find struggle to understand Risk Management.
When juniors ask me what technical skill I think they should learn next my answers is always; Risk Management.
(Heavily recommended reading: "Risk, the science and politics of fear")
[preface that this response is obviously operating on very limited context]
"Wanting to tick some meeting boxes" feels a bit ungenerous. Ideally, a production outage shouldn't be a single mistake away, and it seems reasonable to suggest adding additional safeguards to prevent that from happening again[1]. Generally, I don't think you need to wait until after multiple incidents to identify and address potential classes of problems.
While it is good and admirable to stand up for your team, I think that creating a safety net that allows your team to make mistakes is just as important.
I didn't want to add a wall of text for context :) And that was the only time I've said something like that to a client. I was not being confrontational, just telling them how it is.
I suppose my point was that there's a cost associated with increasing reliability, sometimes it's just not worth paying it. And that people will usually appreciate candor rather than vague promises or hand-wavy explanations.
Good, but I would have preferred a comment about 'process gates' somewhere in there [0]. I.e. rather than say "it's probably nothing let's not do anything" only to avoid the extreme "let's double check everything from now on for all eternity", I would have preferred a "Let's add this temporary process to check if something is actually wrong, but make sure it has a clear review time and a clear path to being removed, so that the double-checking doesn't become eternal without obvious benefit".
> If we see a pattern of mistakes being made then we can talk about taking steps to prevent them.
...but that's not really nothing? You're acknowledging the error, and saying the action is going to be watch for a repeat, and if there is one in a short-ish amount of time, then you'll move to mitigation. From a human standpoint alone, I know if I was the client in the situation, I'd be a lot happier hearing someone say this instead of a blanket 'nothing'.
Don't get me wrong; I agree with your assessment. But don't sell non-technical actions short!
Yeah. Policies, procedures, and controls have costs. They can save costs, but they also have their own costs. Some pay for themselves; some don't. The ones that don't, don't create those procedures and controls.
> TL: If we did that, we’d just be YOLO’ing our changes, not doing validation. Which means we’d increase the probability of incidents significantly, which end up taking a lot of time to deal with. I don’t think we’d actually end up delivering any faster if we chose to be less careful than we normally are.
This is a really critical property that doesn't get highlighted nearly often enough, and I'm glad to see it reinforced here. Slow is smooth, smooth is fast. And predictable.
In real life you both can't afford having reputation risk and you need to ship anyway. If you have an incident, guess who's going to be liable – the manager or the name on the commit?
Stop negotiating quality; negotiate scope and set a realistic time. Shipping a lot of crap faster is actually slower. 99% of the companies out there can't focus on doing _one_ thing _well_, that's how you beat the odds.
I am probably missing an essential point here, but my first reaction was "this is literally the quality part of the scope/cost/time quality trade-off triangle?"
Has that become forgotten lore? (It might well be. It's old, and our profession doesn't do well with knowledge transmission. )
I’ve been in this industry a long time. I’ve read Lying with Statistics, and a bunch of Tufte. I don’t think it would be too much hyperbole to say I’ve spent almost a half a year of cumulative professional time (2-3 hours a month) arguing with people about bad graphs. And it’s always about the same half dozen things or variants on them.
The starting slope of the line in your carefulness graph has no slope. Which means you’re basically telling X that we can turn carefulness to 6 with no real change in delivery date. Are you sure that’s the message you’re trying to send?
Managers go through the five stages of grief every time they ask for a pony and you counteroffer with a donkey. And the charts often offer them a pony instead of a donkey. Doing the denial, anger and bargaining in a room full of people becomes toxic, over time. It’s a self goal but bouncing it off the other team’s head. Don’t do that.
> The starting slope of the line in your carefulness graph has no slope. Which means you’re basically telling X that we can turn carefulness to 6 with no real change in delivery date.
This strikes me as a pedantic argument, since the graph was clearly drawn by hand and is meant to illustrate an upward curving line. Now, maybe there's essentially no clear difference between 5 and 5.1, but when you extrapolate out to where 6 would be (about 65 pixels to the right of 5, if I can be pedantic for a moment), there actually is a difference.
This is a conversation about human behavior, not pixels.
A flat line will lead to bargaining, as I said. Don’t paint yourself into uncomfortable conversations.
If you don’t want the wolf in the barn don’t open the door.
Doesn't the flat line in this context mean that you're at a local minimum, which is where you want to stay? Where being less careful would take more time due to increased number of incidents.
Maybe we need to start with Zeno. Doing anything at all is relatively impossible.
And I've done high school calculus. It's a picture of a parabola, the derivative is very small near the minimum, and it can look kinda flat if the picture isn't perfectly drawn.
Principles of Product Development makes the point that a lot of real world tradeoffs in your development process are U-shaped curves, which implies that you will have very small costs for missing the optimum by a little. A single decision that you get wrong by a lot is likely to dominate those small misses.
A more sensible way to present the idea is to put the turning point of the parabola at the origin of the graph and then show that 5 is somewhere on the line of super-linearly increasing schedule risk.
The article stipulates that 5 is the value that minimizes execution time.
It could have put that on the y-axis, and labeled the left “extremely rushed” and the right side “extremely careful”. Maybe that would’ve been clearer, though I really think it’s clear if are charitable and don’t assume the author has made a mistake.
It’s a picture of a parabola if someone put the y axis at the dotted line not the origin. If you want to bargain with people - and this fictional conversation is a negotiation - then don’t anchor the people on bad values.
The article stipulates that 5 is the value that minimizes execution time. So the value between 0 and 5 would’ve been higher. It doesn’t intersect the x-axis because you can’t finish the task in zero time.
See the quote below.
That said, while I didn’t think it was a confusing drawing, I now wish he’d drawn the rest of the parabola, because it would’ve prevented this whole conversation.
> EM: Woah! That’s no good. Wait, if we turn the carefulness knob down, does that mean that we can go even faster?
> TL: If we did that, we’d just be YOLO’ing our changes, not doing validation. Which means we’d increase the probability of incidents significantly, which end up taking a lot of time to deal with. I don’t think we’d actually end up delivering any faster if we chose to be less careful than we normally are.
In general, I found that when I've told people to be careful on that code path (because it has bitten me before) I don't get the sense that it is a welcomed warning.
It's almost as if I'm questioning their skill as a engineer.
I don't know about you but when I'm driving a road and there is black ice around the corner a warning from a fellow driver is welcomed.
I like the idea of having an actual 'carefulness knob' prop and making the manager asking for faster delivery/more checks actually turn the knob themselves, to emphasise that they're the one responsible for the decision.
Yep. The best way to pushback is to ask your manager, "We'll do it fast since you are asking for it. What is the plan for contingencies in case things break?"
I did a lot of the work in my 40 year software career as an individual, which meant it was on me to estimate the time of the task. My first estimate was almost always an "If nothing goes wrong" estimate. I would attempt to make a more accurate estimate by asking myself "is there a 50% chance I could finish early?". I considered that a 'true' estimate, and could rarely bring myself to offer that estimate 'up the chain' (I'm a wimp ...). When I hear "it's going to be tight for Q2", in the contexts I worked in, that meant "there's no hope". None of this invalidates the notion of a carefulness knob, but I do kinda laugh at the tenor of the imagined conversations that attribute a lot more accuracy to the original estimate that I ever found in reality in my career. Retired 5 years now, maybe some magic has happened while I wasn't looking.
More than once I've used the xkcd method (Pull a gut number out of thin air, then double the numerator and increment the unit e.g. 1 hour -> 2 days, 3 weeks -> 6 months). When dealing with certain customers this has proven disappointingly realistic.
Fwiw in a real world scenario it'd be more helpful to hear "the timeline has risks" alongside a statement of a concrete process you might not be doing given that timeline. Everyone already knows about diminishing returns, we don't need a lesson on that.
My favorite tool when defining project timelines: What are we not doing?
There's an infinite number of nice-to-haves. A nice good deadline makes it super easy to clarify what you actually need vs what you only want.
and you'd be amazed when you start really having these discussions with the client how often stuff ends up not only not being needed, but often going right past 'nice to have' and right to 'lets not'. The problem is often the initial problem being WAY overspecified in ways that don't ACTUALLY matter but generate tons of extra work.
Yeah, to me this kind of thing is much better than the carefulness knob.
Delaying or just not doing certain features that have low ROI can drastically shorten the development time without really affecting quality.
This is something that as an industry we seem to have unlearned. Sure, it still exists in the startup space, with MVPs, but elsewhere it's very difficult. In the last 20 years I feel like engineers have been pushed more and more away from the client, and very often you just get "overspecified everything" from non-technical Product Managers and have to sacrifice in quality instead.
I had one today that with one email went from "I want this whole new report over our main entity type with 3 user specified parameters" to "actually, just add this one existing (in the db) column to this one existing report and that totally solves my actual problem". My time going from something like 2 days to 15 minutes + 10 minutes to write the email.
If everyone actually knew this stuff, this entire class of problem would cease to exist. Given that it has not...
Lorin is always on point, and I appreciate the academic backing he brings to the subject. But for how many years do we need to tell MBAs that "running with scissors is bad" before it becomes common knowledge? (Too damn many.)
It’s not the right approach. Structural engineers shouldn’t let management fiddle with their safety standards to increase speed. They will still blame you when things fail. In software, you can’t just throw in yolo projects with much lower “carefulness” than the rest of the product, everything has maintenance. The TL in this case needs to establish a certain set of standards and practices. That’s not a choice you give away to another team on a per-feature basis.
It’s also a ridiculous low bar for engineering managers to not even understand the most fundamental of tradeoffs in software. Of course they want things done faster, but then they can go escalate to the common boss/director and argue about prioritization against other things on the agenda. Not just “work faster”. Then they can go manage those whose work output is proportional to stress, not programmers.
Management decides whether they build a cheap wooden building, a brick one, or a steel skyscraper. These all have different risk profiles.
Safety is a business/management decision, even in structural engineering. A pedestrian bridge could be constructed to support tanks and withstand nuclear explosions, but why. Many engineered structures are actually extremely dangerous - for example mountain climbing trails.
Also yes, you have many opportunities to just YOLO without significant consequences in software. A hackathon is a good example - I love them, always great to see the incredible projects at the end. The last one I visited was sponsored by a corporation and they straight up incorporated a startup next day with the winning team.
Expected use and desired tolerance is a management decision. Safety is still on engineers.
Isn't the point what level of safety?
If management intends expected use to be a low-load-quick-and-dirty-temporary-use prototype to be delivered in days, it seems the engineers are not doing their job if they calibrate their safety process to a heavy-duty-life-critical application. And vice versa.
Making the decision about the levels of use, durability, reuse-ability, scalability, AND RISK is all management. Implementing those decisions as decided by management is on engineering. It is not on engineering to fix a bad-trade-off management decision beyond what is reasonably possible (if you can, great, but go look to work someplace less exploitative).
Which is why liability is needed.
LT is a member of the leadership team.
LT: Get it done quick, and don't break anything either, or else we're all out of a job.
EM: Got it, yes sir, good idea!
[EM surreptitiously turns the 'panic' dial to 10, which reduces a corresponding 'illusion of agency' dial down to 'normal']
A personal anecdote:
One of my guys made a mistake while deploying some config changes to Production and caused a short outage for a Client.
There's a post-incident meeting and the client asks "what are we going to do to prevent this from happening in the future?" - probably wanting to tick some meeting boxes.
My response: "Nothing. We're not going to do anything."
The entire room (incl. my side) looks at me. What do I mean, "Nothing?!?".
I said something like "Look, people make mistakes. This is the first time that this kind of mistake had happened. I could tell people to double-check everything, but then everything will be done twice as slowly. Inventing new policies based on a one-off like this feels like an overreaction to me. For now I'd prefer to close this one as human error - wontfix. If we see a pattern of mistakes being made then we can talk about taking steps to prevent them."
In the end the conceded that yeah, the outage wasn't so bad and what I said made sense. Felt a bit proud for pushing back :)
If you want to go full corporate, and avoid those nervous laughs and frowns from people who can't tell if you're being serious or not, I recommend dressing it up a little.
You basically took the ROAM approach, apparently without knowing it. This is a good thing. https://blog.planview.com/managing-risks-with-roam-in-agile/
Correct.
Corollary is that Risk Management is a specialist field. The least risky thing to do is always to close down the business (can't cause an incident if you have no customers).
Engineers and product folk, in particular, I find struggle to understand Risk Management.
When juniors ask me what technical skill I think they should learn next my answers is always; Risk Management.
(Heavily recommended reading: "Risk, the science and politics of fear")
[preface that this response is obviously operating on very limited context]
"Wanting to tick some meeting boxes" feels a bit ungenerous. Ideally, a production outage shouldn't be a single mistake away, and it seems reasonable to suggest adding additional safeguards to prevent that from happening again[1]. Generally, I don't think you need to wait until after multiple incidents to identify and address potential classes of problems.
While it is good and admirable to stand up for your team, I think that creating a safety net that allows your team to make mistakes is just as important.
[1] https://en.wikipedia.org/wiki/Swiss_cheese_model
I agree.
I didn't want to add a wall of text for context :) And that was the only time I've said something like that to a client. I was not being confrontational, just telling them how it is.
I suppose my point was that there's a cost associated with increasing reliability, sometimes it's just not worth paying it. And that people will usually appreciate candor rather than vague promises or hand-wavy explanations.
Good, but I would have preferred a comment about 'process gates' somewhere in there [0]. I.e. rather than say "it's probably nothing let's not do anything" only to avoid the extreme "let's double check everything from now on for all eternity", I would have preferred a "Let's add this temporary process to check if something is actually wrong, but make sure it has a clear review time and a clear path to being removed, so that the double-checking doesn't become eternal without obvious benefit".
[0] https://news.ycombinator.com/item?id=33229338
> If we see a pattern of mistakes being made then we can talk about taking steps to prevent them.
...but that's not really nothing? You're acknowledging the error, and saying the action is going to be watch for a repeat, and if there is one in a short-ish amount of time, then you'll move to mitigation. From a human standpoint alone, I know if I was the client in the situation, I'd be a lot happier hearing someone say this instead of a blanket 'nothing'.
Don't get me wrong; I agree with your assessment. But don't sell non-technical actions short!
Good manager, have a cookie.
Yeah. Policies, procedures, and controls have costs. They can save costs, but they also have their own costs. Some pay for themselves; some don't. The ones that don't, don't create those procedures and controls.
> TL: If we did that, we’d just be YOLO’ing our changes, not doing validation. Which means we’d increase the probability of incidents significantly, which end up taking a lot of time to deal with. I don’t think we’d actually end up delivering any faster if we chose to be less careful than we normally are.
This is a really critical property that doesn't get highlighted nearly often enough, and I'm glad to see it reinforced here. Slow is smooth, smooth is fast. And predictable.
In real life you both can't afford having reputation risk and you need to ship anyway. If you have an incident, guess who's going to be liable – the manager or the name on the commit?
Stop negotiating quality; negotiate scope and set a realistic time. Shipping a lot of crap faster is actually slower. 99% of the companies out there can't focus on doing _one_ thing _well_, that's how you beat the odds.
I got the sack when I did this last, which was better than the alternative – to keep trudging until the inevitable incident
It generally comes down to the age old question, pick two out of: Quality, Speed or Cost
I am probably missing an essential point here, but my first reaction was "this is literally the quality part of the scope/cost/time quality trade-off triangle?"
Has that become forgotten lore? (It might well be. It's old, and our profession doesn't do well with knowledge transmission. )