Who watches the watchers? How we page ourselves if incident.io goes down

(incident.io)

5 points | by rorymalcolm 8 hours ago ago

3 comments

(Full disclosure, I work at incident.io!)

We recently released our On-call product, and as part of that, had to think a lot about redundancy and 'failing safety'.

Here's how we achieve it - and how we're thinking about it. Interested if any other examples of this exist in the wild - I'd love to know more about how eg: Datadog achieve this.

lawrjone 7 hours ago

Author here!

It’s a fun problem to solve and one I’ve come across before when trying to alert on your monitoring tool being down, but slightly different when it’s your product.

Hopefully interesting if you’ve hit similar puzzles before.

8 hours ago

[deleted]