GitHub Actions is slowly killing engineering teams

(iankduncan.com)

144 points | by codesuki 4 hours ago ago

59 comments

  • danpalmer 2 hours ago

    I've used many of the CI systems that the author has here, and I've done a lot of CircleCI and GitHub Actions, and I don't come to quite the same conclusions. One caveat though, I haven't used Buildkite, which the author seems to recommend.

    Over the years CI tools have gone from specialist to generalist. Jenkins was originally very good at building Java projects and not much else, Travis had explicit steps for Rails projects, CircleCI was similarly like this back in the day.

    This was a dead end. CI is not special. We realised as a community that in fact CI jobs were varied, that encoding knowledge of the web framework or even language into the CI system was a bad idea, and CI systems became _general workflow orchestrators_, with some logging and pass/fail UI slapped on top. This was a good thing!

    I orchestrated a move off CircleCI 2 to GitHub Actions, precisely because CircleCI botched the migration from the specialist to generalist model, and we were unable to express a performant and correct CI system in their model at the time. We could express it with GHA.

    GHA is not without its faults by any stretch, but... the log browser? So what, just download the file, at least the CI works. The YAML? So it's not-quite-yaml, they weren't the first or last to put additional semantics on a config format, all CI systems have idiosyncrasies. Plugins being Docker images? Maybe heavyweight, but honestly this isn't a bad UX.

    What does matter? Owning your compute? Yeah! This is an important one, but you can do that on all the major CI systems, it's not a differentiator. Dynamic pipelines? That's really neat, and a good reason to pick Buildkite.

    My takeaway from my experience with these platforms is that Actions is _pretty good_ in the ways that truly matter, and not a problem in most other ways. If I were starting a company I'd probably choose Buildkite, sure, but for my open source projects, Actions is good.

    • SOLAR_FIELDS 2 hours ago

      Actions is many things. It’s an event dispatcher, an orchestrator, an execution engine and runtime, an artifact registry and caching system, a workflow modeler, a marketplace, and a secrets manager. And I didn’t even list all of the things Actions is. It’s better at some of those things and not others.

      The systems I like to design that use GHA usually only use the good parts. GitHub is a fine events dispatcher, for instance, but a very bad workflow orchestrator. So delegate that to a system that is good at that instead

  • burnJS 2 hours ago

    Killing engineer teams? Hyperbole thread titles need to be killed. I find github actions to be just fine. I prefer it to bitbucket and gitlab.

    • altmanaltman 17 minutes ago

      Yeah I was wondering how Microsoft is okay with Github murdering people but then was let down by the article.

    • noident an hour ago

      I clicked the article thinking it was about GitLab. Much of the criticism held true for GitLab anyway, particularly the insanely slow feedback loops these CI/CD systems create.

      • dbtablesorrows 19 minutes ago

        Can't blame gitlab for team not having a local dev setup.

  • habosa 2 hours ago

    Dead on. GitHub Actions is the worst CI tool I’ve ever used (maybe tied with Jenkins) and Buildkite is the best. Buildkite’s dynamic pipelines (the last item in the post) are so amazingly useful you’ll wonder how you ever did without them. You can do super cool things like have your unit test step spawn a test de-flaking step only if a test fails. Or control test parallelism based on the code changes you’re testing.

    All of that on top of a rock-solid system for bringing your own runner pools which lets you use totally different machine types and configurations for each type of CI job.

    Highly, highly recommend.

    • tcoff91 an hour ago

      Jenkins had a lot of issues and I’m glad to not be using it overall, but I did like defining pipelines in Groovy and I’ll take Groovy over YAML all day.

      • bigstrat2003 18 minutes ago

        Jenkins, like many complex tools, is as good or bad as you make it. My last two employers had rock solid Jenkins environments because they were set up as close to vanilla as possible.

        But yes, Groovy is a much better language for defining pipelines than YAML. Honestly pretty much any programming language at all is better than YAML. YAML is fine for config files, but not for something as complex as defining a CI pipeline.

        • tcoff91 12 minutes ago

          What kills me is when these things add like control flow constructs to YAML.

          Like just use an actual programming language!

  • anttiharju 2 hours ago

    Github being less and less reliable nowadays just makes this more true.

    In the past week I have seen:

    - actions/checkout inexplicably failing, sometimes succeeding on 3rd retry (of the built-in retry logic)

    - release ci jobs scheduling _twice_, causing failures, because ofc the release already exists

    - jobs just not scheduling. Sometimes for 40m.

    I have been using it actively for a few years and putting aside everything the author is saying, just the base reliability is going downhill.

    I guess zig was right. Too bad they missed builtkite, Codeberg hasn't been that reliable or fast in my experience.

  • zdw 2 hours ago

    I tend to disagree with this as it seems like an ad for Nix/Buildkite...

    If your CI invocations are anything more than running a script or a target on a build tool (make, etc.) where the real build/test steps exist and can be run locally on a dev workstation, you're making the CI system much more complex than it needs to be.

    CI jobs should at most provide an environment and configuration (credentials, endpoints, etc.), as a dev would do locally.

    This also makes your code CI agnostic - going between systems is fairly trivial as they contain minimal logic, just command invocations.

    • jamesfinlayson 19 minutes ago

      This so much - I remember migrating from one CI system to another a few years ago - I had built all of our pipelines to pull in some secrets and call a .sh file that did all the heavy lifting. The migration had a few pain points but was fairly easy. Meanwhile, the teams who had created their pipelines with the UI and broken them up in to multiple steps were not happy at all.

    • mitchjj an hour ago

      Can 100% confirm this is not an ad (at least not for Buildkite) and was a lovely surprise to read for the team.

  • simianwords 13 minutes ago

    What I find hardest about CI offerings is that each one has a unique DSL that inevitably has edge cases that you may only find out once you’ve tried it.

    You might face that many times using Gitlab CI. Random things don’t work the way you think it should and the worst part is you must learn their stupid custom DSL.

    Not only that, there’s no way to debug the maze of CI pipelines but I imagine it’s a hard thing to achieve. How would I be able to locally run CI that also interacts with other projects CI like calling downstream pipelines?

  • fmjrey 35 minutes ago

    Nice write up, but wondering now what nix proposes in that space.

    I've never used nix or nixos but a quick search led me to nixops, and then realized v4 is entirely being rewritten in rust.

    I'm surprised they chose rust for glue code, and not a more dynamic and expressive language that could make things less rigid and easier to amend.

    In the clojure world BigConfig [0], which I never used, would be my next stop in the build/integrate/deploy story, regardless of tech stack. It integrates workflow and templating with the full power of a dynamic language to compose various setups, from dot/yaml/tf/etc files to ops control planes (see their blog).

    [0] https://bigconfig.it/

  • tagraves 2 hours ago

    I hope the author will check out RWX -- they say they've checked out most CI systems, but I don't think they've tried us out yet. We have everything they praise Buildkite for, except for managing your own compute (and that's coming, soon!). But we also built our own container execution model with CI specifically in mind. We've seen one too many Buildkite pipelines that have a 10 minute Docker build up front (!) and then have to pull a huge docker container across 40 parallel steps, and the overhead is enormous.

    • ses1984 2 hours ago

      Can you explain how your product solves this problem? I clicked around your site and couldn't figure it out.

      • fourteenminutes an hour ago

        As a (very happy) RWX customer:

        - Intermediate tasks are cached in a docker-like manner (content-addressed by filesystem and environment). Tasks in a CI pipeline build on previous ones by applying the filesystem of dependent tasks (AFAIU via overlayfs), so you don't execute the same task twice. The most prominent example of this is a feature branch that is up-to-date with main passes CI on main as soon as it's merged, as every task on main is a cache-hit with the CI execution on the feature branch.

        - Failures: the UI surfaces failures to the top, and because of the caching semantics, you can re-run just the failed tasks without having to re-run their dependencies.

        - Debugging: they expose a breakpoint (https://www.rwx.com/docs/rwx/remote-debugging) command that stops execution during a task and allows you to shell into the remote container for debugging, so you can debug interactively rather than pushing `env` and other debugging tasks again and again. And when you do need to push to test a fix, the caching semantics again mean you skip all the setup.

        There's a whole lot of other stuff. You can generate tasks to execute in a CI pipeline via any programming language of your choice, the concurrency control supports multiple modes, no need for `actions/cache` because of the caching semantics and the incremental caching feature (https://www.rwx.com/docs/rwx/tool-caches).

        And I've never had a problem with the logs.

  • pmontra 2 hours ago

    > But Everyone Uses It!

    All of my customers are on bitbucket.

    One of them does not even use a CI. We run tests locally and we deploy from a self hosted TeamCity instance. It's a Django app with server side HTML generation so the deploy is copying files to the server and a restart. We implemented a Capistrano alike system in bash and it's been working since before Covid. No problems.

    The other one uses bitbucket pipelines to run tests after git pushes on the branches for preproduction and production and to deploy to those systems. They use Capistrano because it's a Rails app (with a Vue frontend.) For some reason the integration tests don't run reliably neither on the CI instances nor on Macs, so we run them only on my Linux laptop. It's been in production since 2021.

    A customer I'm not working with anymore did use Travis and another one I don't remember. That also run a build on there because they were using Elixir with Phoenix, so we were creating a release and deploying it. No mere file copying. That was the most unpleasant deploy system of the bunch. A lot of wasted time from a push to a deploy.

    In all of those cases logs are inevitably long but they don't crash the browser.

  • harikb 2 hours ago

    Ian Duncan, I was imagining you on a stage delivering this as a standup comedy show on Netflix.

    My pet peeve with Github Actions was that if I want to do simple things like make a "release", I have to Google for and install packages from internet randos. Yes, it is possible this rando1234 is a founding github employee and it is all safe. But why does something so basic need external JS? packages?

    • computerfriend 2 hours ago

      Yeah, their "standard library" so to speak (basically everything under the actions org) is lacking. But for this specifically, you can use the gh CLI.

  • peterldowns 2 hours ago

    Agreed with absolutely all of this. Really well written. Right now at work we're getting along fine with Actions + WarpBuild but if/when things start getting annoying I'm going to switch us over to Buildkite, which I've used before and greatly enjoyed.

  • 0xbadcafebee 35 minutes ago

    Personally I like Drone more than Buildkite. It's as close to a perfect CI system as I've seen; just complex enough to do everything I need, with a design so stripped-down it can't be simpler. I occasionally check on WoodpeckerCI to see if it's reached parity with Drone. Now that AI coding is a thing, hopefully that'll happen soon

  • WatchDog 2 hours ago

    I agree with all the points made about GH actions.

    I haven't used as many CI systems as the author, but I've used, GH actions, Gitlab CI, CodeBuild, and spent a lot of time with Jenkins.

    I've only touched Buildkite briefly 6 years ago, at the time it seemed a little underwhelming.

    The CI system I enjoyed the most was TeamCity, sadly I've only used it at one job for about a year, but it felt like something built by a competent team.

    I'm curious what people who have used it over a longer time period think of it.

    I feel like it should be more popular.

    • jamesfinlayson 14 minutes ago

      I used TeamCity for a while and it was decent - I'm sure defining pipelines in code must be possible but the company I worked at seemed to have made this impossible with some in-house integration with their version control and release management software.

    • dreamteam1 an hour ago

      tc is probably the best console runner there is and I agree, it made CI not suck. It is also possible to make it very fast, with a bit of engineering and by hosting it on your own hardware. Unfortunately it’s as legacy as Jenkins today. And in contrast to Jenkins it’s not open source or free, many parts of it, like the scheduler/orchestrator, is not pluggable.

      But I don’t know about competent people, reading their release notes always got me thinking ”how can anyone write code where these bugs are even possible?”. But I guess that’s why many companies just write nonsense release notes today, to hide their incompetence ;)

  • kdazzle an hour ago

    Pretty sure someone at MS told me that Actions was rewritten by the team who wrote Azure DevOps. So bureaucracy would be a feature.

    That aside, GH Actions doesn’t seem any worse than GitLab. I forget why I stopped using CircleCI. Price maybe? I do remember liking the feature where you could enter the console of the CI job and run commands. That was awesome.

    I agree though that yaml is not ideal.

  • dec0dedab0de an hour ago

    I just can't stand using a build system tied to the code host. And that is really because I have an aversion to vendor lock-in.

    webhooks to an external system was such a better way to do it, and somehow we got away from that, because they don't want us to leave.

    webhooks are to podcasts as github actions are to the things that spotify calls podcasts.

  • ed_mercer an hour ago

    nods. nods again. Yep, this is exactly why we left GitHub for GitLab two years ago. Not one moment of regret.

    Still, I wonder who is still looking manually at CI build logs. You can use an agent to look for you, and immediately let it come up with a fix.

    • riffraff 15 minutes ago

      GitHub has an integrated "let copilot look at the logs and figure out the issue" and I swear it has never worked once for me.

  • burnto 30 minutes ago

    Is it great? No. Is it usually good enough? Yes. CI shouldn’t be a main quest for most engineers. Just get it rolling early and adjust as needed.

  • apothegm 3 hours ago

    This is roughly how I feel about cloudformation. May we please have terraform back? Ansible, even?

    • bigstrat2003 17 minutes ago

      Why not just use Terraform, if you prefer that?

    • anttiharju 2 hours ago

      I think cdk is the one to use nowadays. Infrastructure as real code.

      • staticassertion 2 hours ago

        The worst part about CDK is, by far, that it's still backed by Cloudformation.

        • anttiharju 2 hours ago

          What pains are you experiencing? Cdk has far exceeded Ansible and Terraform in my experience.

          • kortex an hour ago

            Hooo boy where do I begin? Dependency deadlocks are the big one - you try to share resource attributes (eg ARN) from one stack to another. You remove the consumer and go to deploy again. The producer sees no more dependency so it prunes the export. But it can't delete the export, cause the consumer still needs it. You can't deploy the consumer, because the producer has to deploy first sequentially. And if you can't delete the consumer (eg your company mandates a CI pipeline deploy for everything) you gotta go bug Ops on slack, wait for someone who has the right perms to delete it, then redeploy.

            You can't actually read real values from Parameters/exports (you get a token placeholder) so you can't store JSON then read it back and decode (unless in same stack, which is almost pointless). You can do some hacks with Fn:: though.

            Deploying certain resources that have names specified (vs generated) often breaks because it has to create the new resource before destroying the old one, which it can't, because the name conflicts (it's the same name...cause it's the same construct).

            It's wildly powerful though, which is great. But we have basically had to create our own internal library to solve what should be non-problems in an IaC system.

            Would be hilarious if my coworker stumbled upon this. I know he reads hn and this has been my absolute crusade this quarter.

            • otterley 24 minutes ago

              > Dependency deadlocks are the big one - you try to share resource attributes (eg ARN) from one stack to another. You remove the consumer and go to deploy again. The producer sees no more dependency so it prunes the export.

              I’m a little puzzled. How are you getting dependency deadlocks if you’re not creating circular dependencies?

              Also, exports in CloudFormation are explicit. I don’t see how this automatic pruning would occur.

              > Deploying certain resources that have names specified (vs generated) often breaks

              CDK tries to prevent this antipattern from happening by default. You have to explicitly make it name something. The best practice is to use tags to name things, not resource names.

          • staticassertion 39 minutes ago

            I'll just echo the other poster with "deadlocks". It's obscene how slow CF is, and the fact that its failure modes often leave you in a state that feels extremely dangerous. I've had to contact AWS Support before due to CF locking up in an irrecoverable way due to cycles.

  • N_Lens an hour ago

    I matured as an Engineer using various CI tools and discovering hands-on that these tools are so unreliable (pipes often failing inconsistently). I am surprised to find that there are better systems, and I'd like to learn more.

  • dcchuck 2 hours ago

    I was excited for actions because it was “next to” my source code.

    I (tend to) complain about actions because I use them.

    Open to someone telling me there is a perfect solution out there. But today my actions fixes were not actions related. Just maintenance.

  • october8140 2 hours ago

    I have not had this experience. It sounds like a bad process rather than being GitHubs fault. I’ve always had GitHub actions double checking the same checks I run locally before pushing.

  • ZeWaka an hour ago

    I think this author would benefit from using the Refined GitHub browser extension, which fixes a lot of these problems.

  • esafak 2 hours ago

    Declarative (a la bazel and garnix) is obviously the way to go, but we're still living in the s̶t̶o̶n̶e̶ YAML age.

  • verdverm 2 hours ago

    I agree with the gripes, but buildkite is not the answer

    If I cannot fully self host an open source project, it is not a contender for my next ci system

  • rvz 2 hours ago

    > If you’re a small team with a simple app and straightforward tests, it’s probably fine. I’m not going to tell you to rip it out.

    > But if you’re running a real production system, if you have a monorepo, if your builds take more than five minutes, if you care about supply chain security, if you want to actually own your CI: look at Buildkite.

    Goes in line with exactly what I said in 2020 [0] about GitHub vs Self-hosting. Not a big deal for individuals, but for large businesses it's a problem if you can push that critical change when your CI is down every week.

    [0] https://news.ycombinator.com/item?id=22867803

    • BoorishBears 2 hours ago

      I know this is off topic, but that homepage is a piece of work: https://buildkite.com

      I get it's quirky, but I'm at a low energy state and just wanted to know what it does...

      Right before I churned out, I happened to click "[E] Exit to classic Buildkite" and get sent to their original homepage: https://buildkite.com/platform/

      It just tells you what it Buildkite does! Sure it looks default B2B SaaS, but more importantly it's clear. "The fastest CI platform" instead of some LinkedIn-slop manifesto.

      If I want to know why it's fast, I scroll down and learn it scales to lots of build agents and has unlimited parallelism!

      And if I wonder if it plays nice with my stack, I scroll and there's logos for a bunch of well known testing frameworks!

      And if I want to know if this isn't v0.0001 pre-alpha software by a pre-seed company spending runway on science-fair home pages, this one has social proof that isn't buried in a pseudo-intellectual rant!

      -

      I went down the rabbit hole of what lead to this and it's... interesting to say the least.

      https://medium.com/design-bootcamp/nothing-works-until-you-m...

      https://www.reddit.com/r/branding/comments/1pi6b8g/nothing_w...

      https://www.reddit.com/r/devops/comments/1petsis/comment/nsm...

      • mitchjj an hour ago

        Hello mate, Head of Brand and Design at BK here. Thanks for the feedback, genuinely; the homepage experiment has been divisive, in a great way. Some folk love it, some folk hate it, some just can't be bothered with it. All fair.

        Glad that the classic site hit the mark, but a lot work to do to make that clearer than it is; we're working on the next iteration that will sunset the CLI homepage into an easter egg.

        Happy to take more critique, either on the execution or the rabbit hole.

        • BoorishBears 33 minutes ago

          Great of you to accept critiques, but I don't think there's anything more I can add.

          You brought up Planetscale's markdown homepage rework in one of those posts and I actually think it's great... but it's also clear, direct, and has no hidden information.

          I'd love to see what happens to conversions once you retire this to an Easter Egg.

  • gchamonlive 2 hours ago

    > You’ve upgraded the engine but you’re still driving the car that catches fire when you turn on the radio.

    And fixing the pyro-radio bug will bring other issues, for sure, so they won't because some's workflow will rely on the fact that turning on the radio sets the car on fire: https://xkcd.com/1172/

  • tayo42 2 hours ago

    The internet makes me feel like the only person that doesn't mind Jenkins. Idk it just gets the job done ime.

    • jamesfinlayson 9 minutes ago

      I used Jenkins for years at a previous job - for the longest time it was a confusing mess of pipelines coupled with being a fairly outdated version.

      Once it was updated to latest and all the bad old manually created jobs were removed it was decent.

    • bigstrat2003 16 minutes ago

      Nah I don't mind Jenkins either. I think it's unpopular because you can definitely turn it into a monstrosity, and I think a lot of people have only seen it in that state.

  • cratermoon an hour ago

    “Microsoft is where ambitious developer tools go to become enterprise SKUs“

    It’s hard to remember, sometimes, that Microsoft was one of the little gadflies that buzzed around annoying the Big Guys.

  • CSSer an hour ago

    I hate to say this. I can't even believe I am saying it, but this article feels like it was written in a different universe where LLMs don't exist. I understand they don't magically solve all of these problems, and I'm not suggesting that it's as simple as "make the robot do it for you" either.

    However, there are very real things LLMs can do that greatly reduce the pain here. Understanding 800 lines of bash is simply not the boogie man it used to be a few years ago. It completely fits in context. LLMs are excellent at bash. With a bit of critical thinking when it hits a wall, LLM agents are even great at GitHub actions.

    The scariest thing about this article is the number of things it's right about. Yet my uncharacteristic response to that is one big shrug, because frankly I'm not afraid of it anymore. This stuff has never been hard, or maybe it has. Maybe it still is for people/companies who have super complex needs. I guess we're not them. LLMs are not solving my most complex problems, but they're killing the pain of glue left and right.

    • otterley 17 minutes ago

      The flip side of your argument is that it no longer matters how obtuse, complicated, baroque, brittle, underspecified, or poorly documented software is anymore. If we can slap an LLM on top of it to paper over those aspects, it’s fine. Maybe efficiency still counts, but only when it meaningfully impacts individual spend.

  • xyst 2 hours ago

    > this is a product made by one of the richest companies on earth.

    nit: no, it was made by a group of engineers that loved git and wanted to make a distributed remote git repository. But it was acquired/bought out then subsequently enshittified by the richest/worst company on earth.

    Otherwise the rest of this piece vibes with me.

  • slackfan 3 hours ago

    All CI is just various levels of bullshit over a bash script anyway.