My team does Wednesday to Wednesday for many of the same reasons mentioned in the article, and it works great. We switch at 11am and hold a hand-off meeting at that time, and invite the whole team.
Hand-off meetings with the whole team work really well (in my opinion!) when you have a relatively small team--we have 9 FT teammates. Often someone else may have been delegated the page or bug that arose and can discuss how they handled it, or someone who wasn't involved may have insight for how to handle a situation better the next time. Since we're all going to be on rotation at least once during a quarter, it's great to know what happened in case a similar page pops up later.
Finally, we also fill out a running Doc before/during the meeting with links to the pages/bugs, along with short descriptions of how they were handled. This forms a great living memory of how to deal with incidents, and is also often the birthplace of new playbooks for handling new types of incidents.
Same here. Except we do a two week rotation, and it aligns with our sprints. The active on-call engineer doesn’t have any assigned sprint work and focuses their effort on fixing bugs or cleaning up the backlog when they’re not actively triaging an incident.
So when a major issue happens over the weekend only Step 1 happens during the weekend. Step 2 involves following up with other teams, creating new alarms and updating the runbook. And all that usually happen during the week. The oncall is going to spend at minimum their Monday doing that so it’s better if the schedule reflects that."
We do Thursday to Thursday and then you get Friday off after completed on-call.
Being on-call gives you no extra pay by itself, but if you get paged off hours and need to work you get paid 150 to 200% of your normal hourly wage depending on what time of day you need to work.
We do daily shifts with a follow the sun rotation, makes it easier to handle persistent commitments and ensures a bad week doesn't all land on the same person.
This! Whenever someone talk about on-call this aspect of that rotation gets swept under the carpet. Whenever I interview I always ask whether they have on-call system (they must if there are servers and apps involved) and if they do whether they have follow the sun.
Most don’t even like the question. For them such questions are red flags or the candidate is not “motivated enough”. Rarely some even have follow the sun policy. They might have one in their HQ, true for a lot of US/EU firms, but their offices in a developing country like India - it’s always something on the lines of “oh, engineers here take full ownership; they are the owners”.
Also, I have seen — 2-3 days rotation with follow the sun is best, week long or longer being worst.
Then there are companies where it could be forever on-call with no follow the sun - e.g. Amazon, Uber (in India at least). That’s another world altogether.
Not only that but if you have a multiple continental team then no one needs to be waken by an emergency meow. (My PagerDuty is set to a meow sound. So we practice meow driven development: I don't want to hear my phone meowing piteously.) Say, you have someone on the US west coast they can do 10am-10pm while someone else in continental Europe being nine hours ahead can do 7am-7pm.
In some cases it might help. Because then it becomes “natural” - on-call thing. It’s not something someone dreads as in “god, that week is coming”. Also, it spreads the fuck-ups and peaceful times better.
I’ve always been partial for Friday night through Friday night.
You start off over the weekend, when you have energy and can survive the two days alone. Ideally no Friday releases so the transition is calm, but as the writer says the batches might fail.
You spend the week fixing whatever breaks. You’re cleanly off the Monday to Monday sprint, just doing on-call/ops.
You finish Friday evening and immediately get Friday night and the weekend to recover when you need it most.
This post was discussed somewhere else and I saw someone say that their work does Firday noon to Friday noon and their work gives the outgoing on call the rest of Friday off. I feel like that's even better because 1) it recognizes the hard work that the outgoing on call put in 2) it give the incoming on call a few hours to get up to speed while they still have the support of the other engineers on the team.
Maybe I'm taking you too literally, but I wouldn't want to have a handoff sync-up (or any meeting, really) on a Friday night, nor push that earlier so significant things can happen between sync-up and the actual shift in responsibility from person to person. Friday-to-Friday does sound good.
One thing I really liked in a previous job was a split daytime-vs-nighttime rotation. It was well worth a little annoyance to set up in our tools. One week you'd be the 'daytime' oncall for business hours (something like 9-5 Mon-Fri, though we might have tweaked those hours a bit; it might have been 10-6 or something). The next you'd be on call for the complementary time (5-9, weekends). You were on call for the same total amount of time, just smeared over two different weeks. It ended up being less of a burden to optimize your schedule for a reasonable response time, but operational work still got done. And in practice awareness of operational issues was not too hard to maintain between the two members of the split.
(I think the best thing, if you can swing it, is probably a follow-the-sun rotation where there are three teams distributed 8 hours apart around the globe, and they trade off 8-hour workday shifts. But a lot of uncommon things probably have to be true of your organization for that idea to even be on the radar.)
That was exactly my reasoning too when I set up our on call roster as Friday to Friday, though for us Saturday is the busiest day in terms of customer activity, so it was a no-brainer.
At Google we used to get paid an oncall bonus which was calculated at something like 1/3 your prorated salary for the non-working hours you were oncall (IIRC), up to some limit per quarter. For my team a week of oncall per quarter would max it out and net you a few thousand dollars bonus.
On my teams, if someone got paged off-hours they would just work less the day after the event. imo it should just be part of the regular salary/work expectations, incentivizing keeping oncall low
Gross, no. This just allows management to ignore problems and push development teams to do feature work, even when everything is on fire and the oncall person is getting paged multiple times per day.
Oncall should be compensated, always. The oncall person should get a flat rate just for being on standby, and should also receive a per-page payout, and that amount should be larger if the page happens outside regular business hours.
Then management will actually realize there's a cost to pushing features and pulling in deadlines at the expense of robust engineering practices. Or they can decide they are fine with that, and paying the oncall person is a cost of doing business they way they want to.
I've seen too many instances either issues they come up during oncall never get fixed, and just page and page and page.
I will never again work at a company where oncall is "just a part of the job". I value my own time too much.
I've wished for a tech workers union for this reason. I don't care about pay, let the union say nothing about pay.
But let's align incentives. Any time spent fixing issues on-call is compensated 4-to-1. Workers may accrue compensation time, and any compensation time in excess of 20 hours is paid 10-to-1 when the employee leaves. The idea here isn't for workers to accrue and cash out comp time, but instead to give an incentive to the organization to ensure workers use their comp time.
Let's align incentives, what's hard on the worker should be hard on the owners and management.
In France it is mandatory either with salary or rest. In addition the labor code stipulates mandatory daily rest of 11 contiguous hours even during the weekend and extra 24 contiguous hours of rest during the weekend.
Hours of intervention are considered work.
In my company we get approximately 800€ for every week of on-call and each hour of intervention is also compensated with salary.
From my point of view this should be high enough for the company to be willing to focus on on-call issues.
Ater years of being on-call I must admit the salary is comfortable but it doesn't cover the pain and constraints of being on-call: being kinda "stuck" at home basically, lots of consequences on private life etc.
Original reason for this schedule was that on-call was paid by days per quarter in a tiered system so this guaranteed that all members got the 5% on-call for 10 days/quarter rather than one person hitting 9 days and dropping to 3%, but I stand by this as a better on-call rotation.
The number of people does need to be not wholly divisible so the days rotate so if you run into this you can combine Fri into Sat/Sun or break Sat/Sun apart. It’s a bit complex to set up but the mental impact of on-call is greatly reduced and if you need a week for vacation you can much more easily find someone to cover your shift for a couple days in a nearby week rather than ending up with 2 weeks back to back 6 weeks from now. And if you pull a weekend you get the week off rather than losing your weekend to on-call and going into a work week still on-call.
Any company that makes it an employee's responsibility to find "someone to cover" their on call time while they're on vacation is a company worth quitting.
I'm pretty sure that'd be illegal here in .au
On call coverage while an employee is on vacation is a management problem, not an employee problem.
Could not agree more, any company I've worked at with an on-call rotation has always ensured that staff are not scheduled when they have holiday booked. The only time an employee needed to find their own cover is if something unexpected came up during their on-call period and they needed a few hours out (like an emergency visit to the doctor with a child etc).
At my current job we have an automated scheduler which uses our gcal to ensure that it never schedules if people have an AFK entry. It also schedules fairly based on how long since the person as last on-call, not putting them on on a weekend if they were on last weekend etc (we do 24hr shifts).
In my 20ish years I’ve done every possible day for oncall schedules. I would say each have pros/cons but overall I found it to be a minor difference.
Mon-Mon is nice because it’s a logical time to start something fresh at the start of the week.
Tuesday is good for the reasons in the post, Wednesday is similar.
Thursday is nice because after you’re done you can relax on Friday.
Friday-Friday is less common but can be nice because you get the satisfaction of being done on the last day of the week.
I have occasionally convinced teams to adopt both oncall and sprint cycles aligned with Tuesday [1] - the dev teams all loved it. Management was a harder sell, but by and large were happier with the extra days to communicate results/get metrics before their own Friday deadlines.
[1] also Wednesday/Thursdays. Wednesdays were my favorite in good working environments, it felt like running a successful marathon, but it was more prone to falling apart due to short-term thinking.
I never understood why companies didn't simply leverage 24x7 internet MSPs.
They are able to staff 24x7 by spreading the cost over multiple customers and working through the process of making your application manageable by a 3rd party is super beneficial.
Most of these companies will also do performance monitoring and analysis as well.
They see issues and optimization opportunities across multiple applications and know more than a single team who's only built one.
That works well for generic IT systems and running the desktop/laptop fleets, but doesn’t work at all for running the software a company builds.
We typically split our teams, so we have ~16 split across two time zones so that our shifts are just 12 hours during the day. It works well, but it is expensive, so we support a lot of services (or a small number of very high priority services) as a result.
I hadn't heard of Managed Service Providers before, but you make a good case for them.
I'm finding surprisingly little discussion on HN regarding the costs/benefits of MSPs. Or rather, under which conditions (such as company size) they make sense.
We are on-call for 48hrs at a time, about once every 12 days or so, one day as backup, and one as primary. It's nice because it doesn't interrupt your week too much. The downside being that complex issues might require extra work while not on-call
My team has a meeting to hand off the on-call to the next person, and we discuss all pages we got during the week. Primarily two things: whether the page was for a good reason or not (good: our on-call person had an something actionable to fix. bad: non-actionable pages, pages because someone else's system was broken, false alarms, etc), and also whether there is something we can do so we never get paged for this again. I find it very effective at reducing pages.
Lol yeah. My old team had oncall pages in the middle of the night pretty often where nothing was actually the matter. My manager was only nominally on call. In the handoff meetings every week he was basically just like “that sucks”.
My team does Wednesday to Wednesday for many of the same reasons mentioned in the article, and it works great. We switch at 11am and hold a hand-off meeting at that time, and invite the whole team.
Hand-off meetings with the whole team work really well (in my opinion!) when you have a relatively small team--we have 9 FT teammates. Often someone else may have been delegated the page or bug that arose and can discuss how they handled it, or someone who wasn't involved may have insight for how to handle a situation better the next time. Since we're all going to be on rotation at least once during a quarter, it's great to know what happened in case a similar page pops up later.
Finally, we also fill out a running Doc before/during the meeting with links to the pages/bugs, along with short descriptions of how they were handled. This forms a great living memory of how to deal with incidents, and is also often the birthplace of new playbooks for handling new types of incidents.
Same here. Except we do a two week rotation, and it aligns with our sprints. The active on-call engineer doesn’t have any assigned sprint work and focuses their effort on fixing bugs or cleaning up the backlog when they’re not actively triaging an incident.
This is a strong positive imhe:
"- Step 1: handling it
- Step 2: making sure it doesn’t happen again
So when a major issue happens over the weekend only Step 1 happens during the weekend. Step 2 involves following up with other teams, creating new alarms and updating the runbook. And all that usually happen during the week. The oncall is going to spend at minimum their Monday doing that so it’s better if the schedule reflects that."
We do Thursday to Thursday and then you get Friday off after completed on-call. Being on-call gives you no extra pay by itself, but if you get paged off hours and need to work you get paid 150 to 200% of your normal hourly wage depending on what time of day you need to work.
Best on-call I’ve had.
We do daily shifts with a follow the sun rotation, makes it easier to handle persistent commitments and ensures a bad week doesn't all land on the same person.
This! Whenever someone talk about on-call this aspect of that rotation gets swept under the carpet. Whenever I interview I always ask whether they have on-call system (they must if there are servers and apps involved) and if they do whether they have follow the sun.
Most don’t even like the question. For them such questions are red flags or the candidate is not “motivated enough”. Rarely some even have follow the sun policy. They might have one in their HQ, true for a lot of US/EU firms, but their offices in a developing country like India - it’s always something on the lines of “oh, engineers here take full ownership; they are the owners”.
Also, I have seen — 2-3 days rotation with follow the sun is best, week long or longer being worst.
Then there are companies where it could be forever on-call with no follow the sun - e.g. Amazon, Uber (in India at least). That’s another world altogether.
Not only that but if you have a multiple continental team then no one needs to be waken by an emergency meow. (My PagerDuty is set to a meow sound. So we practice meow driven development: I don't want to hear my phone meowing piteously.) Say, you have someone on the US west coast they can do 10am-10pm while someone else in continental Europe being nine hours ahead can do 7am-7pm.
Daily might be okay for more ops/SRE types, but it is a hell for a primarily dev team. Can't focus on building shit.
In some cases it might help. Because then it becomes “natural” - on-call thing. It’s not something someone dreads as in “god, that week is coming”. Also, it spreads the fuck-ups and peaceful times better.
> But websites need to be up 24/7, cron jobs need to run on the weekend and backend servers need to be up to support both
Tech entrepreneurs should give more weight to choosing markets that don’t require this
I’ve always been partial for Friday night through Friday night.
You start off over the weekend, when you have energy and can survive the two days alone. Ideally no Friday releases so the transition is calm, but as the writer says the batches might fail.
You spend the week fixing whatever breaks. You’re cleanly off the Monday to Monday sprint, just doing on-call/ops.
You finish Friday evening and immediately get Friday night and the weekend to recover when you need it most.
This post was discussed somewhere else and I saw someone say that their work does Firday noon to Friday noon and their work gives the outgoing on call the rest of Friday off. I feel like that's even better because 1) it recognizes the hard work that the outgoing on call put in 2) it give the incoming on call a few hours to get up to speed while they still have the support of the other engineers on the team.
Maybe I'm taking you too literally, but I wouldn't want to have a handoff sync-up (or any meeting, really) on a Friday night, nor push that earlier so significant things can happen between sync-up and the actual shift in responsibility from person to person. Friday-to-Friday does sound good.
One thing I really liked in a previous job was a split daytime-vs-nighttime rotation. It was well worth a little annoyance to set up in our tools. One week you'd be the 'daytime' oncall for business hours (something like 9-5 Mon-Fri, though we might have tweaked those hours a bit; it might have been 10-6 or something). The next you'd be on call for the complementary time (5-9, weekends). You were on call for the same total amount of time, just smeared over two different weeks. It ended up being less of a burden to optimize your schedule for a reasonable response time, but operational work still got done. And in practice awareness of operational issues was not too hard to maintain between the two members of the split.
(I think the best thing, if you can swing it, is probably a follow-the-sun rotation where there are three teams distributed 8 hours apart around the globe, and they trade off 8-hour workday shifts. But a lot of uncommon things probably have to be true of your organization for that idea to even be on the radar.)
That was exactly my reasoning too when I set up our on call roster as Friday to Friday, though for us Saturday is the busiest day in terms of customer activity, so it was a no-brainer.
Is anyone getting compensated for being on-call? If you are paged and work outside of business hours, do you receive additional compensation?
Yes. And not only for responding to a page, but also for being stand by outside working hours.
Yes, 350€ (before taxes of course) per week. No additional compensation for responding/ working on incidents.
I would be interested in how response times are?
Mine is 15mins. So I have to respond and be in a incident call within 15mins.
At Google we used to get paid an oncall bonus which was calculated at something like 1/3 your prorated salary for the non-working hours you were oncall (IIRC), up to some limit per quarter. For my team a week of oncall per quarter would max it out and net you a few thousand dollars bonus.
On my teams, if someone got paged off-hours they would just work less the day after the event. imo it should just be part of the regular salary/work expectations, incentivizing keeping oncall low
Gross, no. This just allows management to ignore problems and push development teams to do feature work, even when everything is on fire and the oncall person is getting paged multiple times per day.
Oncall should be compensated, always. The oncall person should get a flat rate just for being on standby, and should also receive a per-page payout, and that amount should be larger if the page happens outside regular business hours.
Then management will actually realize there's a cost to pushing features and pulling in deadlines at the expense of robust engineering practices. Or they can decide they are fine with that, and paying the oncall person is a cost of doing business they way they want to.
I've seen too many instances either issues they come up during oncall never get fixed, and just page and page and page.
I will never again work at a company where oncall is "just a part of the job". I value my own time too much.
No, it should be compensated, so Management prioritises fixing issues, instead of adding new bugs
I've wished for a tech workers union for this reason. I don't care about pay, let the union say nothing about pay.
But let's align incentives. Any time spent fixing issues on-call is compensated 4-to-1. Workers may accrue compensation time, and any compensation time in excess of 20 hours is paid 10-to-1 when the employee leaves. The idea here isn't for workers to accrue and cash out comp time, but instead to give an incentive to the organization to ensure workers use their comp time.
Let's align incentives, what's hard on the worker should be hard on the owners and management.
Where I work, this would have no impact on the amount of tasks shoved into the pipeline by product and leadership.
Perhaps not, but at least the oncall person will be compensated for the crap they have to put up with.
In France it is mandatory either with salary or rest. In addition the labor code stipulates mandatory daily rest of 11 contiguous hours even during the weekend and extra 24 contiguous hours of rest during the weekend. Hours of intervention are considered work.
In my company we get approximately 800€ for every week of on-call and each hour of intervention is also compensated with salary.
From my point of view this should be high enough for the company to be willing to focus on on-call issues. Ater years of being on-call I must admit the salary is comfortable but it doesn't cover the pain and constraints of being on-call: being kinda "stuck" at home basically, lots of consequences on private life etc.
On a past team I set up on-call to be:
- Mon/Tue - Wed/Thu - Fri - Sat/Sun
Original reason for this schedule was that on-call was paid by days per quarter in a tiered system so this guaranteed that all members got the 5% on-call for 10 days/quarter rather than one person hitting 9 days and dropping to 3%, but I stand by this as a better on-call rotation.
The number of people does need to be not wholly divisible so the days rotate so if you run into this you can combine Fri into Sat/Sun or break Sat/Sun apart. It’s a bit complex to set up but the mental impact of on-call is greatly reduced and if you need a week for vacation you can much more easily find someone to cover your shift for a couple days in a nearby week rather than ending up with 2 weeks back to back 6 weeks from now. And if you pull a weekend you get the week off rather than losing your weekend to on-call and going into a work week still on-call.
Any company that makes it an employee's responsibility to find "someone to cover" their on call time while they're on vacation is a company worth quitting.
I'm pretty sure that'd be illegal here in .au
On call coverage while an employee is on vacation is a management problem, not an employee problem.
Could not agree more, any company I've worked at with an on-call rotation has always ensured that staff are not scheduled when they have holiday booked. The only time an employee needed to find their own cover is if something unexpected came up during their on-call period and they needed a few hours out (like an emergency visit to the doctor with a child etc).
At my current job we have an automated scheduler which uses our gcal to ensure that it never schedules if people have an AFK entry. It also schedules fairly based on how long since the person as last on-call, not putting them on on a weekend if they were on last weekend etc (we do 24hr shifts).
In my 20ish years I’ve done every possible day for oncall schedules. I would say each have pros/cons but overall I found it to be a minor difference.
Mon-Mon is nice because it’s a logical time to start something fresh at the start of the week. Tuesday is good for the reasons in the post, Wednesday is similar. Thursday is nice because after you’re done you can relax on Friday. Friday-Friday is less common but can be nice because you get the satisfaction of being done on the last day of the week.
I have occasionally convinced teams to adopt both oncall and sprint cycles aligned with Tuesday [1] - the dev teams all loved it. Management was a harder sell, but by and large were happier with the extra days to communicate results/get metrics before their own Friday deadlines.
[1] also Wednesday/Thursdays. Wednesdays were my favorite in good working environments, it felt like running a successful marathon, but it was more prone to falling apart due to short-term thinking.
I'm curious why Tues to Tues for sprints was a hard sell to management?
I never understood why companies didn't simply leverage 24x7 internet MSPs.
They are able to staff 24x7 by spreading the cost over multiple customers and working through the process of making your application manageable by a 3rd party is super beneficial.
Most of these companies will also do performance monitoring and analysis as well.
They see issues and optimization opportunities across multiple applications and know more than a single team who's only built one.
That works well for generic IT systems and running the desktop/laptop fleets, but doesn’t work at all for running the software a company builds.
We typically split our teams, so we have ~16 split across two time zones so that our shifts are just 12 hours during the day. It works well, but it is expensive, so we support a lot of services (or a small number of very high priority services) as a result.
I hadn't heard of Managed Service Providers before, but you make a good case for them.
I'm finding surprisingly little discussion on HN regarding the costs/benefits of MSPs. Or rather, under which conditions (such as company size) they make sense.
Any big players or companies you would recommend?
Are you speaking from personal experience having worked with one? What was the feedback between application management back to engineering like?
We are on-call for 48hrs at a time, about once every 12 days or so, one day as backup, and one as primary. It's nice because it doesn't interrupt your week too much. The downside being that complex issues might require extra work while not on-call
Ours starts at 5 PM on Tuesday and I think it's great.
> Most places take after hours paging pretty seriously.
LOL i wish
My team has a meeting to hand off the on-call to the next person, and we discuss all pages we got during the week. Primarily two things: whether the page was for a good reason or not (good: our on-call person had an something actionable to fix. bad: non-actionable pages, pages because someone else's system was broken, false alarms, etc), and also whether there is something we can do so we never get paged for this again. I find it very effective at reducing pages.
Lol yeah. My old team had oncall pages in the middle of the night pretty often where nothing was actually the matter. My manager was only nominally on call. In the handoff meetings every week he was basically just like “that sucks”.
My company does this
He makes a sound argument.