We use GitLab on the daily. Roughly 200 repos pushing to ~20 on any given day. There have been a few small, unpublished outages that we determined were server side since we have a geo-distributed team, but as a platform seems far more stable than 5-6 years ago.
My only real current complaint is that the webhooks that are supposed to fire in repo activity have been a little flaky for us over the past 6-8 months. We have a pretty robust chatops system in play, so these things are highly noticeable to our team. It’s generally consistent, but we’ve had hooks fail to post to our systems on a few different occasions which forced us to chase up threads until we determined our operator ingestion service never even received the hooks.
FWIW, GitHub is also unreliable with webhooks. Many recent GH outages have affected webhooks.
They are pretty good, in my experience, at *eventually* delivering all updates. The outages take the form of a "pause" in delivery, every so often... maybe once every 5 weeks?
Usually the outages are pretty brief but sometimes it can be up to a few hours. Basically I'm unaware of any provider whose webhooks are as reliable as their primary API. If you're obsessive about maintaining SLAs around timely state, you can't really get around maintaining some sort of fall-back poll.
Completely agree on all points. We've had dual remotes running on a few high traffic repos pushing to both GitLab and GitHub simultaneously as a debug mechanism and our experiences mirror yours.
There were so many severe Github Actions outages (10+ ?) in the past year. Cause: Migration to the disaster zone also known as Azure, I assume. Most of them happened during (morning) CET working hours, as to not inconvenience the americans and/or make headlines.
Money doesn't buy competency. It's a long-term culture thing. You can never let go on maintaining competency in your organization. It rots if you do. I guess Microsoft did let go.
Doubt it. I'm Ops person on Azure, while they just had terrible outage recently, they tend to be as stable as any other cloud provider and I haven't had many issues with Azure itself compared to whatever slop the devs are chucking into production.
iirc (it's been a while) they where on rackspace when Microsoft bought them out - there was an article a few months ago saying they where moving to Azure and freezing new features while they do the move[1].
Honestly I don't know half the features they have added because the surface is huge at this point everyone seems to be using a (different) subset of them anyway.
So a feature freeze isn't likely to have much impact on me.
A team of us moved it off Rackspace in 2013, it’s been mostly in a set of GitHub operated colo since then. Used to be there was some workloads on AWS and a bit of DirectConnect. Now it’s some workloads on Azure.
To the best of my knowledge there’s been no Rackspace in the picture since about 2013, the details behind that are fuzzy as it’s been 10+ years since I worked on infrastructure at GitHub.
In the Pragmatic Engineer podcast episode with the former CEO of Github, the latter mentioned that they had their own infra for everything. If I remember correctly, this was due to the fact that Github is quite old and at the time when Github Actions became a thing, cloud providers were not really offering the kind of infra that was necessary to support the feature.
I can't read the entirety of this article[1] because it's paywalled, but it looks like they ran their own servers:
> GitHub is currently hosted on the company’s own hardware, centrally located in Virginia
I imagine this predates their acquisition from Microsoft. Honestly, given how often Github seems to be down compared to the level of dependency people have on it, this might be one of the few cases where I might have understood if Microsoft embraced and extended a bit harder.
Fair enough, my Azure experience is minimal enough that maybe I shouldn't make assumptions about whether this would improve things. That being said, I do think there's merit in the idea that if Microsoft is going to be able to solve this problem, they probably should try to solve it just once, and in a general way, rather than just for Github?
Yep. Was using github for oauth on a petproject of mine. Got the unicorn, and was considering takingthe break, or just etting up something else. Seems to be running again for me now though.
We used to obsessively care about 500s. Like I would make a change that caused a 0.1% spike in 500s and I would silently say I'm sorry to the folks who got the unicorn page.
I'm not sure the new school cares nearly as much. But then again this is how companies change as they mature. I saw this with StubHub as well.. The people who care the most are the initial employees, employee #7291 usually dgaf
I was getting crazy thinking that there was something wrong with my SSH keys all of a sudden. Thanks $DEITY it's just GitHub.
Same. I reflex replaced mine thinking it needed to be. Glad its working now though
Anyone using GitLab have any insight on how well their operations are running these days?
We originally left GitLab for GitHub after being bit by a major outage that resulted in data loss. Our code was saved, but we lost everything else.
But that was almost 10 years ago at this point.
We use GitLab on the daily. Roughly 200 repos pushing to ~20 on any given day. There have been a few small, unpublished outages that we determined were server side since we have a geo-distributed team, but as a platform seems far more stable than 5-6 years ago.
My only real current complaint is that the webhooks that are supposed to fire in repo activity have been a little flaky for us over the past 6-8 months. We have a pretty robust chatops system in play, so these things are highly noticeable to our team. It’s generally consistent, but we’ve had hooks fail to post to our systems on a few different occasions which forced us to chase up threads until we determined our operator ingestion service never even received the hooks.
That aside, we’re relatively happy customers.
FWIW, GitHub is also unreliable with webhooks. Many recent GH outages have affected webhooks.
They are pretty good, in my experience, at *eventually* delivering all updates. The outages take the form of a "pause" in delivery, every so often... maybe once every 5 weeks?
Usually the outages are pretty brief but sometimes it can be up to a few hours. Basically I'm unaware of any provider whose webhooks are as reliable as their primary API. If you're obsessive about maintaining SLAs around timely state, you can't really get around maintaining some sort of fall-back poll.
Completely agree on all points. We've had dual remotes running on a few high traffic repos pushing to both GitLab and GitHub simultaneously as a debug mechanism and our experiences mirror yours.
Not sure what specific operational services are of interest - but here's a link to their historical service status [0]
[0] https://status.gitlab.com/pages/history/5b36dc6502d06804c083...
No issues on GitLab.
Haven't seen any outage from GitLab in like, ever.
https://status.gitlab.com/pages/history/5b36dc6502d06804c083...
Never had any problems really.
GitHub on the other hand has outages more frequently.
Must be a day ending in Y.
Github is owned by Microsoft, so this is a pretty small time indie operation, you need to give them a break.
Not replacing the CEO suggests they aren't focusing on it as much as they were.
I bet Microsoft is sad not because people can’t push, but because the training data for Copilot has slowed down.
PS: None of our 40+ engineers felt anything, our self hosted Forgejo is as snappy as ever.
Just your casual $3.8T company.
There were so many severe Github Actions outages (10+ ?) in the past year. Cause: Migration to the disaster zone also known as Azure, I assume. Most of them happened during (morning) CET working hours, as to not inconvenience the americans and/or make headlines.
Money doesn't buy competency. It's a long-term culture thing. You can never let go on maintaining competency in your organization. It rots if you do. I guess Microsoft did let go.
“guess Microsoft did let go” - are we thinking of the same Microsoft here?
I am thinking of the atrophying one. Not MikeRoweSoft.
Your weekly reminder to take a break
This sure does seem to happen a lot
I’m old enough to remember when GitHub was on main page due to a cool feature they added, now they just end up here when it stops working
Ah that was why. Oh well, I just needed to get the code to the server, so I didn't really need Github anyway.
Why does the main page show all green when there is an ongoing incident? All green here -> https://www.githubstatus.com/
This is normal for Microsoft. It's as though status is owned and controlled by either marketing or accounting, not engineering.
It's marked as resolved for some reason
because then some mid-level manager gets a telling off
and/or has to pay the SLA out of their budget
ahh, you are right. I am blind.
Coincidentally, Azure Devops was also missing the ssh keys earlier today, both in the web ui and for ssh login.
Related to the recent announcement they are moving to Azure?
https://news.ycombinator.com/item?id=45517173
Oh no. I look forward to watching my browser redirect 40 times on every attempted page load.
Doubt it. I'm Ops person on Azure, while they just had terrible outage recently, they tend to be as stable as any other cloud provider and I haven't had many issues with Azure itself compared to whatever slop the devs are chucking into production.
Wow. It wasn't already running on Azure? What was it (or is it) running on?
iirc (it's been a while) they where on rackspace when Microsoft bought them out - there was an article a few months ago saying they where moving to Azure and freezing new features while they do the move[1].
[1] https://thenewstack.io/github-will-prioritize-migrating-to-a...
Honestly I don't know half the features they have added because the surface is huge at this point everyone seems to be using a (different) subset of them anyway.
So a feature freeze isn't likely to have much impact on me.
EDIT: went and checked - https://github.blog/news-insights/github-is-moving-to-racksp... not sure if they moved again before the MS acquisition though.
A team of us moved it off Rackspace in 2013, it’s been mostly in a set of GitHub operated colo since then. Used to be there was some workloads on AWS and a bit of DirectConnect. Now it’s some workloads on Azure.
To the best of my knowledge there’s been no Rackspace in the picture since about 2013, the details behind that are fuzzy as it’s been 10+ years since I worked on infrastructure at GitHub.
In the Pragmatic Engineer podcast episode with the former CEO of Github, the latter mentioned that they had their own infra for everything. If I remember correctly, this was due to the fact that Github is quite old and at the time when Github Actions became a thing, cloud providers were not really offering the kind of infra that was necessary to support the feature.
I can't read the entirety of this article[1] because it's paywalled, but it looks like they ran their own servers:
> GitHub is currently hosted on the company’s own hardware, centrally located in Virginia
I imagine this predates their acquisition from Microsoft. Honestly, given how often Github seems to be down compared to the level of dependency people have on it, this might be one of the few cases where I might have understood if Microsoft embraced and extended a bit harder.
[1]: https://www.theverge.com/tech/796119/microsoft-github-azure-...
Well… https://www.reuters.com/technology/microsoft-azure-down-thou...
Fair enough, my Azure experience is minimal enough that maybe I shouldn't make assumptions about whether this would improve things. That being said, I do think there's merit in the idea that if Microsoft is going to be able to solve this problem, they probably should try to solve it just once, and in a general way, rather than just for Github?
>Microsoft
>solve it just once, and in a general way
Not Sharepoint? What a bummer.
I thought my SSH keys were revoked, whew.
Just started to replace mine when I saw someone post a message about GitHub
Yep. Was using github for oauth on a petproject of mine. Got the unicorn, and was considering takingthe break, or just etting up something else. Seems to be running again for me now though.
thought i was going crazy
Looking forward to the postmortem.
Are they using AI agents this time to resolve the outage? Probably not.
But this time, there is no CEO of GitHub to contact and good luck contacting Satya to solve the outage.
The postmortem will be simple since Github goes down so consistently every week you can almost use it as an alternative timekeeping system.
It's possible that Microsoft buying GitHub was a large-scale psyop intended to reduce the productivity of the competition.
Any time their startup competitors are making too much progress they can just push the "GitHub incident" button and slow everyone down.
We used to obsessively care about 500s. Like I would make a change that caused a 0.1% spike in 500s and I would silently say I'm sorry to the folks who got the unicorn page.
I'm not sure the new school cares nearly as much. But then again this is how companies change as they mature. I saw this with StubHub as well.. The people who care the most are the initial employees, employee #7291 usually dgaf