Monorepo – Our Experience

(ente.io)

41 points | by vishnumohandas 5 hours ago ago

50 comments

  • CharlieDigital 2 hours ago

        > Moving to a monorepo didn't change much, and what minor changes it made have been positive.
    
    I'm not sure that this statement in the summary jives with this statement from the next section:

        > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
        > 
        > Now, it is a single one. Easy to review, easy to merge, easy to revert.
    
    IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.

    When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.

    This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.

    • audunw 39 minutes ago

      There’s nothing preventing you from having a single pull request in for merging branches over multiple repos. There’s nothing preventing you from having a parent repo with a lock file that gives you a single linear set of commits tracking the state of multiple repos.

      That is, if you’re not tied to using just Github of course.

      Big monorepos and multiple repo solutions require some tooling to deal with scaling issues.

      What surprises me is the attitude that monorepos are the right solution to these challenges. For some projects it makes sense yes, but it’s clear to me that we should have a solution that allows repositories to be composed/combined in elegant ways. Multi-repository pull requests should be a first class feature of any serious source code management system. If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.

      • CharlieDigital 26 minutes ago

            > Multi-repository pull requests should be a first class feature of any serious source code management system. 
        
        But it's currently not?

            > If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.
        
        It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.
      • pelletier 31 minutes ago

        > Multi-repository pull requests should be a first class feature of any serious source code management system.

        Do you have examples of source code management systems that provide this feature and do you have experience with them? repo-centric approach of GitHub often feels limiting.

    • taeric 43 minutes ago

      My only counter argument here, is when those 4 things deploy independently. Sometimes, people will get tricked into thinking a code change is atomic because it is in one commit, when it will lead to a mixed fleet because of deployment realities. In that world, having them separate is easier to work with, as you may have to revert one of the deployments separately from the others.

    • notwhereyouare 2 hours ago

      ironically was gonna come and comment on that same second block of text.

      We went from monorepo to multi-repo at work and it's been a huge set back and disappointment with the devs because it's what our contractors recommended.

      I've asked for a code deploy and everything and it's failed in prod due to a missing check in

      • CharlieDigital 2 hours ago

            > ...because it's what our contractors recommended
        
        It's sad when this happens instead of taking input from the team on how to actually improve productivity/quality.

        A startup I joined started with a multi-repo because the senior team came from a FAANG where this was common practice to have multiple services and a repo for each service.

        Problem was that it was a startup with one team of 6 devs and each of the pieces was connected by REST APIs. So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

        I refactored the whole thing in one weekend into a monorepo , collapsed the handful of services into one service, and we never looked back.

        That refactoring and a later paper out of Google actually inspired me to write this article as a practical guide to building a "modular monolith": https://chrlschn.dev/blog/2024/01/a-practical-guide-to-modul...

        • eddd-ddde an hour ago

          At least google and meta are heavy into monorepos, I'm really curious what company is using a _repo per service_. That's insane.

          • pc86 an hour ago

            It can make sense when you have a huge team of devs and different teams responsible for everything where you may be on multiple teams, and nobody is exactly responsible for all the same set of services you are. Depending on the security/access provisioning culture of the org, "taking half a day to manually grant access to the repos so-and-so needs access to" may actually be an easier sell than "give everyone access to all our code."

            If you just have 20-30 devs and everyone is pretty silo'd (e.g. frontend or backend, data or API, etc) having 75 repos for your stuff is just silly.

          • jgtrosh an hour ago

            My team implemented (and reimplemented!) a project using one repo per module. I think the main benefit was ensuring enough separation of concern due to the burden of changing multiple parts together. I managed to reduce something like 10 repos down to 3... Work in progress.

          • bobnamob an hour ago

            Amazon uses "repo per service" and it is semi insane, but Brazil (the big ol' internal build system) and Coral (the internal service framework) make it "workable".

            As someone who worked in the dev tooling org, getting teams to keep their deps up to date was a nightmare.

            • bluGill an hour ago

              Monorepo and multi repo both have their own need for teams to work on dev tooling when the project gets large.

          • dewey an hour ago

            It's almost never a good idea to get inspired by what Google / Meta / Huge Company is doing as most of the times you don't have their problems and they have custom toolings and teams making everything work on that scale.

            • CharlieDigital an hour ago

              In this case, I'd say it's the opposite: monorepo as an approach works amazingly well for small teams all the ways up to huge orgs (with the right tooling to support it).

              The difference is that past a certain level of complexity, the org will most certainly need specialized tooling to support massive codebases to make CI/CD (build, test, deploy, etc.) times sane.

              On the other hand, multi-repos may work for massive orgs, but is always going to add friction for small orgs.

              • dewey 43 minutes ago

                In this case I wasn't even referring to mono repo or not, but more about the idea of taking inspiration from very large companies for your own not-large-company problems.

      • jayd16 31 minutes ago

        If prod went down because of a missing check in, there are other problems.

  • xyzzy_plugh 2 hours ago

    Without indicating my personal feelings on monorepo vs polyrepo, or expressing any thoughts about the experience shared here, I would like to point out that open-source projects have different and sometimes conflicting needs compared to proprietary closed-source projects. The best solution for one is sometimes the extreme opposite for the other.

    In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.

  • memsom 2 hours ago

    monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.

    As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..

    Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.

    • munksbeer an hour ago

      No offense but I think you're doing monorepos wrong. We have more than 100 applications living in our monorepo. They share common core code, some common signals, common utility libs, and all of them share the same build.

      We release everything weekly, and some things much more frequently.

      If your testing is good enough, I don't see what the issue is?

      • bluGill 29 minutes ago

        > If your testing is good enough, I don't see what the issue is?

        Your testing isn't good enough. I don't know who you are, what you are working on, or how much testing you do, but I will state with confidence it isn't good enough.

        It might be acceptable for your current needs, but you will have bugs that escape testing - often intentional as you can't stop forever to fix all known bugs. In turn that means if anything changes in your current needs you will run into issues.

        > We release everything weekly, and some things much more frequently.

        This is a negative to users. When you think you will release again next so who cares about bugs it means your users see more bugs. Sure it is nice that you don't have to break open years old code anymore, but if the new stuff doesn't have anything the user wants is this really a good thing?

  • gregmac 2 hours ago

    To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.

    If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.

    The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).

    Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.

    TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.

    • CharlieDigital an hour ago

      It doesn't have to be that way.

      You can have a mono-repo and deploy different parts of the repo as different services.

      You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?

      • Falimonda 37 minutes ago

        This is spot on. A monorepo can still include a granular and standardized CI configuration across code paths. Nothing about monorepo forces you to perform a singular deployment.

        The gains provided by moving from polyrepo to monorepo are immense.

        Developer access control is the only thing I can think to justify polyrepo.

        I'm curious if and how others who see the advantages of monorepo have justified polyrepo in spite of that.

      • oneplane 43 minutes ago

        You wouldn't, but making a repo collection into a mono-repo means your mono-deploy needs to be split into a multi-maybe-deploy.

        As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.

        It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).

        But all of that needs to be constructed too, just merging some repos into one doesn't do that.

      • bryanlarsen 39 minutes ago

        If you don't deploy in tandem, you need to test forwards & backwards compatibility. That's tough with either a monorepo or separate repos, but arguably it'd be simple with separate repos.

        • CharlieDigital 31 minutes ago

          It doesn't have to be that complicated.

          All you need to know is "does changing this code affect that code".

          In the example I've given -- a React SPA and Go backend -- let's assume that there's a gRPC binding originating from the backend. How do we know that we also need to deploy the SPA? Updating the schema would cause generation of a new client + model in the SPA. Now you know that you need to deploy both and this can be done simply by detecting roots for modified files.

          You can scale this. If that gRPC change affected some other web extension project, apply the same basic principle: detect that a file changed under this root -> trigger the workflow that rebuilds, tests, and deploys from this root.

    • aswerty an hour ago

      This mirrors my own experience in the SaaS world. Anytime things move towards multiple artifacts/pipelines in one repo; trying to understand what change existed where and when seems to always become very difficult.

      Of course the multirepo approach means you do this dance a lot more: - Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used) - Update upstream systems to the new change - Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about

      While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.

      Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.

  • siva7 2 hours ago

    Ok, but the more interesting part - how did you solve the CI/CD part and how does it compare to a multirepo?

    • devjab 2 hours ago

      I don’t think CI/CD should really be a big worry as far as mono-repositories go as you can setup different pipelines and different flows with different configurations. Something you’re probably already doing if you have multiple repos.

      In my experience the article is right when it tells you there isn’t that big of a difference. We have all sorts of repositories, some of which are basically mono-repositories for their business domain. We tend to separate where it “makes sense” which for us means that it’s when what we put into repositories is completely separate from everything else. We used to have a lot of micro-repositories and it wasn’t that different to be honest. We grouped more of them together to make it easier for us to be DORA compliant in terms of the bureaucracy it adds to your documentation burden. Technically I hardly notice.

      • JamesSwift an hour ago

        In my limited-but-not-nothing experience working with mono vs multi repo of the same projects, CI/CD definitely was one of the harder pieces to solve. Its highly dependent on your frameworks and CI provider on just how straightforward it is going to be, and most of them are "not very straightforward".

        The basic way most work is to run full CI on every change. This quickly becomes a huge speedbump to deployment velocity until a solution for "only run what is affected" is found.

        • devjab 31 minutes ago

          Which CI/CD pipelines have you had issues with? Because that isn’t my experience at all. With both GitHub (also Azure DevOps) and gitlab you can separate your pipelines with configurations like .gitlab-ci.yml. I guess it can be non-trivial to setup proper parallelisation when you have a lot of build stages if this isn’t something you’re familiar with. A lot of other more self-hosted tools like Gradle, RushJS and many others you can setup configurations which does X if Y and make sure only to run things which are necessary.

          I don’t want to be rude, but a lot of these tools have rather accessible documentation on how to get up and running as well as extensive documentation for more complex challenges available in their official docs. Which is probably the, only, place you’ll find good ways of working with it because a lot of the search engine and LLM “solutions” will range from horrible to outdated.

          It can be both slower and faster than micro-repositories in my experience, however, you’re right that it can indeed be a Cthulhu level speed bump if you do it wrong.

        • bluGill 39 minutes ago

          The problem with "only run what is affected" is it is really easy to have something that is affected but doesn't seem like it should be (that is whatever tools you have to detect is it affected say it isn't). So if you have such a system you must have regular rebuild everything jobs as well to verify you didn't break something unexpected.

          I'm not against only run what is affected, it is a good answer. It just has failings that you need to be aware of.

    • CharlieDigital 2 hours ago

      Most CI/CD platforms will allow specification of targeted triggers.

      For example, in GitHub[0]:

          name: ".NET - PR Unit Test"
          
          on:
            ## Only execute these unit tests when a file in this directory changes.
            pull_request:
              branches: [main]
              paths: [src/services/publishing/**.cs, src/tests/unit/**.cs]
      
      So we set up different workflows that kick off based on the sets of files that change.

      [0] https://docs.github.com/en/actions/writing-workflows/workflo...

      • victorNicollet 2 hours ago

        I'm not familiar with GitHub Actions, but we reverted our migration to Bitbucket Pipelines because of a nasty side-effect of conditional execution: if a commit triggers test suite T1 but not T2, and T1 is successful, Bitbucket displays that commit with a green "everything is fine" check mark, regardless of the status of T2 on any ancestors of that commit.

        That is, the green check mark means "the changes in this commit did not break anything that was not already broken", as opposed to the more useful "the repository, as of this commit, passes all tests".

        • plorkyeran 44 minutes ago

          I would find it extremely confusing and unhelpful if tests in the parent commit which weren't rerun for a PR because nothing relevant was touched marked the PR as red. Why would you even want that? That's not something which is relevant to evaluating the PR and would make you get in the habit of ignoring failures.

          If you split something into multiple repositories then surely you wouldn't mark PRs on one of them as red just because tests are failing in a different one?

        • ants_everywhere an hour ago

          isn't that generally what you want? the check mark tells you the commit didn't break anything. if something was already broken it should have either blocked the commit that broke it or there's a flake somewhere that you can only locate by periodically running tests independent of any PR activity.

        • daelon an hour ago

          Is it a side effect if it's also the primary effect?

      • hk1337 an hour ago

        Even AWS CodeBuild (or CodePipeline) allows you to do this now. It didn't before but it's a fairly recent update.

    • victorNicollet an hour ago

      Wouldn't CI be easier with a monorepo ? Testing integration across multiple repositories (triggered by changes in any of them) seems more complex than just adding another test suite to a single repo.

      • bluGill 38 minutes ago

        Pros and cons. Both can be used successfully, but there are different problems to each. If you have a large project you will have a tool teams to deal with the problems of your solution.

  • h1fra 2 hours ago

    I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.

    In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.

    IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.

    • fragmede 2 hours ago

      > you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project)

      that's not a monorepo!

      Unless the singular "project" is stuff our company ships, the problem you have is of impedance mismatch between the projects, which is the problem that an actual monorepo solves. for swe's on individual projects who will never have the problem of having to ship a commit on all the repos at the "same" time, yeah that seems fine, and for them it is. the problem comes as a distributed systems engineer where, for whatever reason, many or all the repos need to be shipped at the ~same time. or worse - A needs to ship before B which needs ship before C but that needs to ship before A, and you have to unwind that before actually being able to ship the change.

      • hk1337 an hour ago

        > that's not a monorepo!

        Sure it is! It's just not the ideal use case for a monorepo which is why people say they don't like monorepos.

  • magicalhippo 2 hours ago

    We're transitioning from a SVN monorepo to Git. We've considered doing a kind of best-of-both-worlds approach.

    Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.

    Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.

    That is, tightly coupled code goes into the monorepo, the rest in separate repos.

    Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.

  • syndicatedjelly an hour ago

    Some thoughts:

    1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.

    2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?

    3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project

    • datadrivenangel 42 minutes ago

      Monorepo is just a single repo. Yup.

      Git submodules have some places where you can surprisingly lose branches/stashed changes.

      • syndicatedjelly 37 minutes ago

        One of my repos has a dependency on another repo (that I also own). I initialized it as a git submodule (e.g. my_org/repo1 has a submodule of my_org/repo2).

            Git submodules have some places where you can surprisingly lose branches/stashed changes.
        
        This concerns me, as git generally behaves as a leak-proof abstraction in my experience. Can you elaborate or share where I can learn more about this issue?
    • klooney 36 minutes ago

      > Does it just mean a single version-controlled repository of code?

      Yeah- they idea is that all of your projects share a common repo. This has advantages and drawbacks. Google is most famous for this approach, although I think they technically have three now- one for Google, one for Android, and one for Chrome.

      > They seem like a great solution to me

      They don't work in a team context because they're extra steps that people don't do, basically. And did some reason a lot of people find them confusing.

      • nonameiguess a minute ago

        https://github.com/google/ contains 2700+ repositories. I don't know necessarily how many of these are read-only clones from an internal monorepo versus how many are separate projects that have actually been open-sourced, but the latter is more than zero.