Monorepo – Our Experience

(ente.io)

250 points | by vishnumohandas 8 months ago ago

299 comments

    > Moving to a monorepo didn't change much, and what minor changes it made have been positive.

I'm not sure that this statement in the summary jives with this statement from the next section:

    > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
    > 
    > Now, it is a single one. Easy to review, easy to merge, easy to revert.

IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.

When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.

This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.

[-]

taeric 8 months ago

My only counter argument here, is when those 4 things deploy independently. Sometimes, people will get tricked into thinking a code change is atomic because it is in one commit, when it will lead to a mixed fleet because of deployment realities. In that world, having them separate is easier to work with, as you may have to revert one of the deployments separately from the others.

[-]

derefr 8 months ago

That's just an argument for not doing "implicit GitOps", treating the tip of your monorepo's main branch as the source-of-truth on the correct deployment state of your entire system. ("Implicit GitOps" sorta-kinda works when you have a 1:1 correspondence between repos and deployable components — though not always! — but it isn't tenable for a monorepo.)

What instead, then? Explicit GitOps. Explicit, reified release specifications (think k8s resource manifests, or Erlang .relup files), one per separately-deploy-cadenced component. If you have a monorepo, then these live also as a dir in the monorepo. CD happens only when these files change.

With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do. But there can also be separate PRs for updating the code vs. "integrating and deploying changes" to a component, if-and-when that is sensible.

[-]

ramchip 8 months ago

> With this approach, a single PR can atomically merge code and update one or more release specifications (triggering CD for those components), if and when that is a sensible thing to do.

How do you avoid the chicken-and-egg problem? Like if the k8s manifest contains a container tag, and the tag is created by CI when the PR is merged to main, it would seem you can’t add code and deploy that code in the same PR.

[-]

derefr 8 months ago

I haven't seen this approach in action personally, but I assume you'd set it up similarly to how Homebrew builds + pins bottles for formulae:

- PR creation (or any update to the PR branch by a human) would trigger a CI workflow to build+push the container;

- if this succeeds, the workflow would push a commit to the PR feature branch that pins the container ref;

- and the base branch would have a branch protection rule that makes the success of this CI workflow a prerequisite for PR mergeability.

In the explicit-GitOps-in-monorepo case, you'd probably want to do this by having the PR author modify the release spec file of each component they want to build a release for, replacing the old static ref with a temporary symbolic ref meaning "hey CI system, calculate this." Then the CI's added commit would rewrite those temporary symbolic refs into new static ones.

---

Although, that's all assuming that you even want to do CI builds in the first place. If you're developing containerized server software — rather than an operating system, a web browser, etc — then it shouldn't matter where you're building, and there isn't much impetus for deterministic, auditable builds, either. So why bottleneck builds by shoving them all onto some wimpy CI system—when every engineer already has a powerful workstation that can do builds often 4x faster, sitting there idle?

Here's what I call "local-first explicit GitOps":

1. you give your engineers the ability to push refs to the container registry (but you never use symbolic refs, always static manifest SHAs, so a malicious engineer can't do anything dangerous with this.

2. you add a script to your monorepo, that an engineer can run against their feature branch, that'll notice their current branch's local symbolic refs, and build+push containers locally in order to rewrite those into static refs.

3. On success, the script can use `gh` (or equivalent) to trigger PR creation.

Now every PR will come into existence already pinning valid new build artifacts by their static refs!

claytonjy 8 months ago

I just implemented something like this in Github Actions for a single python application which includes 2 containers and a model repository, deployed to Kubernetes via Argo from the in-repo Helm chart.

The same semantic version string is in pyproject.toml and values.yaml. These are enforced to be the same via pre-commit hook. Environment-specific versions are in env-specific values files like values-prod.yaml.

I have a build workflow which triggers on PRs that change certain files, computes either a pre- or post-release tag by appending the git hash to the server tag, and pushes the images and model repo using that tag. This will re-trigger as the PR is updated.

Users deploy these pre/post-releases in a few clicks with a manual github action which updates the branch and tag in our dev cluster Argo.

The PR can’t be merged unless the version is bumped.

When the PR is merged, a “promote” workflow runs which re-tags the latest image and model repo with the release version (pre-release without git hash suffix). This is fast, which means there’s no risk of image backoff loop from Argo trying to sync before artifacts are available.

Most PRs don’t touch the version in values-prod.yaml, so merges to main auto-deploy to staging while prod requires another PR updating only values-prod.yaml.

Folks complain a bit about manually bumping the version, and needing 2 PRs to take a change to prod, but it’s pretty smooth otherwise!

[-]

mtoohig 8 months ago

I would like to see the details of this, do you have the source available publicly or a blog post about it?

[-]

claytonjy 8 months ago

Sadly no; internal work stuff.

someone654 8 months ago

If you use helm charts or similar, the version of the image would be a variable that is updated outside the repo. ArgoCD can do this with the Application resource.

For example: build code and image, update git reference and version number in Application when build done.

You then get atomic updates of both code and configuration.

scubbo 8 months ago

...I can't believe I'd never thought about the fact that a "Deployment Repo" can, in fact, just be a directory within the Code Repo. Interesting thought - thanks!

taeric 8 months ago

I mean... sure? Yes, if you add extra structure on top of your code that is there to model the deployments, then you get a bit closer to modeling your deployments. Isn't that the exact argument for why you might want multiple repositories, as well?

chipdart 8 months ago

> My only counter argument here, is when those 4 things deploy independently.

This brings us to the elephant in the room: monorepo strategies are just naive and futile attempts at fixing the problems created by an absolute lack of support for versioning.

If you support versioning, you don't care where and how you store your code. You just push the changes you need to push, and make sure you bump versions to reflect the nature of your changes. Consumers are left with the responsibility of picking which to consume, and when to consume it.

[-]

t43562 8 months ago

Versioning sometimes becomes a problem itself.

You can't always allow support for V1 of some API to continue because it's just "wrong" in the light of your work. It can turn out that your "micro" architecture just didn't really work because factors X or Y tie certain groups of services together so much that a change to one forces changes to the others.

So redesign right? Well, the contractors are all gone and your small team can barely manage the bugs and minor features, let alone redesigns which tend to force huge changes in the entire system because it wasn't thought out that well to start with. i.e. the mountain to climb to fix a little problem is as high as fixing all the problems.

So you might have to update everything that references your changing API to use the new version....but your company has tons of repos and you don't know which ones reference your API, and they're owned by other teams and there's no central set of tests which have been run against the combination of the head of all these repos to ensure that they do in fact work together at exactly the state the combination is at when you are working.

All deployments are a guess - which repos do you need to deploy for change X? Some other team deploys their repo without realising that it now depends on a new version of yours. Oh dear. We should have created a versioning mechanism of some kind and made their deployment depend on ours......but no time to do that kind of thing and could you get other people to agree to use it and how does it work with your CI system and.........well....you just muddle on, force yourself to remember the details, make mistakes etc etc. Busy work.

IMO monorepos make a good place to put a set of whole-system end to end tests that everyone can run before they merge. That's going to sort out many conflicts before they turn into a bug hunt.

[-]

chipdart 8 months ago

> So you might have to update everything that references your changing API to use the new version....

That's how versioning works.

- You allow legacy applications to consume the legacy interface,

- you provide a new interface to the world so that they have a path forward to migrate out of consuming the legacy service through the legacy interface,

- the consumers migrate in a time frame that makes sense to them.

- in the end you sunset the legacy API, and pull the plug.

This scenario is not eliminated by monorepos.

What monorepos do is force consumers to migrate to the new interface as an urgent task that's part of the critical path. If you have resources to do that migration, it does not matter how you store your code.

[-]

t43562 8 months ago

That would take forever - far too long in our case. We have to be able to make changes on the clients of our code because the teams responsible for them are busy achieving their goals.

IMO versioning is something you do when you cannot do the work of upgrading yourself - it's a response you take which is non-optimal and you do it when it's unavoidable.

[-]

chipdart 8 months ago

> That would take forever - far too long in our case.

It doesn't. It just does not force everyone to rush both clients and services to prod without having a fallback plan. Also dubbed as competent engineering.

> We have to be able to make changes on the clients of our code because the teams responsible for them are busy achieving their goals.

Yes. That's what you want, isn't it? Otherwise as a service maintainer you'd be blocked for no reason at all.

And let's not even touch the topic of rolling back changes.

There is no way around it. Once you start to think about any of the rationale behind this monorepo nonsense, you soon realize it's a huge mess: lots of easily avoidable problems created by yourself for yourself, that otherwise would be easy to avoid.

[-]

t43562 8 months ago

I'm afraid I haven't really understood the argument.

To put the opposite view: Versioning could be done with individual functions in the code for example but we don't do that - we just update all the places that call it.

We usually start to do versioning where there are boundaries - such as with components that we bring in from external sources or with projects that are maintained by separate teams.

A version is usually a response a situation where you cannot update the clients that use the api yourself.

So monorepos can be seen as a way of just saying "actually I can update a lot of the points-of-use of this API and I should instead of creating a new version and waiting for someone else to finally use it."

[-]

chipdart 8 months ago

> To put the opposite view: Versioning could be done with individual functions in the code for example but we don't do that - we just update all the places that call it.

I think you're a bit confused. There are code repositories, and there are units of deployment. Those are not the same things.

Storing multiple projects in a single repository does not magically make them a single unit of deployment, even when you deploy them all with a single pipeline. When you have multiple projects, you always have multiple units of deployment.

In your example, you focused on a scenario that does not apply: individual functions that are a part of the same unit of deployment. Your example breaks down when you pack your hypothetical function in separate modules that you deploy and consume independently. Even if you do not explicitly tag a version ID to a package, implicitly your module has different releases with different versions of your code delivered at different points in time. If one of these deliveries has a breaking change them your code breaks. Explicitly specifying a version ID, such as adding git modules pointing at a specific commit, is a technique to preserve compatibility.

Where it is very obvious your example fails is when you look at scenarios involving distributed applications with independent units of deployment. This means things like a SPA consuming a backend service, or even a set of producers and consumers. Even if they are all deployed by the same pipeline, you either have forced downtime or you will always have instances of different versions running in parallel.

The SPA+backend is a rather obvious example: even if you do a deployment of a backend and frontend at the exact same time as an atomic transaction that ensures both are available at the precise same tick, don't you still have users with the browsers loaded with instances of the old SPA? They will continue to have it open until they hit F5, aren't they? If you released a breaking change to the backend, what do you think will happen to users still using the old SPA? Things will break, won't they?

Atomic deployments do not exist, however. Looking at services, you have no way to ensure you can deploy new versions of services at precisely the same tick. This means even with monorepos you will always have different deployments of those services running in parallel. Monorepo proponents fool themselves into believing this is not a problem because they count on experiencing problems at best only briefly during deployments, and that the system eventually reaches steady state. This means things like erratic responses, distributed transactions failing, perhaps only a few corrupt records going into the db, etc. If everyone pretends these problems are normal then there is no problem to fix.

Except this negates each and every single hypothetical advantages of a monorepo, and rejects any argument supporting it. Once you realize you're not actually eliminating problems and you are only buying forced downtime, no matter how small, and taking that hit out of willful ignorance. As a tradeoff, you're buying yourself operational problems and lack of flexibility due to the misuse of revision control systems.

And all for what? Because someone heard Google uses monorepos?

[-]

t43562 8 months ago

> Storing multiple projects in a single repository does not magically make them a single unit of deployment, even when you deploy them all with a single pipeline. When you have multiple projects, you always have multiple units of deployment.

It makes it easier to turn them into the same unit of deployment. There's nothing you cannot do some other way of course.

You're right about atomic deployments being difficult and sometimes one can control that risk by the order in which you change things. In a monorepo it's slightly easier to record some kind of script, makefile, dependency system that says "deploy this before that".

With browsers -for sure your user level API has to be stable and cannot change in a sudden incompatible way. When people have seen fit to have layers of APIs underneath it though, one can still have a lot of change thats theoretically hidden from users but still changes lots of APIs.

[-]

chipdart 8 months ago

> It makes it easier to turn them into the same unit of deployment.

They are not the same unit of deployment. That's an impossibility.

This critical mistake is at the core of this monorepo nonsense. It's a cargo cult, where people believe that storing code for multiple projects in the same source code revision contol system somehow magically turns distributed systems into a monolith and solves deployment issues. It does not.

> You're right about atomic deployments being difficult and sometimes one can control that risk by the order in which you change things.

No. That is false. Atomicity in a distributed transaction is not achieved by shuffling operations around. Specially those you cannot control.

dieortin 8 months ago

Versioning introduces a big maintenance cost, both on the side of the developer of a service and on the consumer side. On a monorepo you have a single version, and can be sure that everything works together on it. You are forced to address the costs of changing APIs continuously, but all components are always up to date. With multiple repos, versioning hell is real.

[-]

chipdart 8 months ago

> Versioning introduces a big maintenance cost, both on the side of the developer of a service and on the consumer side.

Not really.

From the producer side it only requires that a) you do not break your contract, b) when you need to break a contract, you bump your version number, c) when no one uses older versions, remove the code.

From the consumer, it's even easier: you just need to point to the version you're consuming.

What exactly do you think is the problem?

[-]

mewpmewp2 8 months ago

a) you do not break your contract

That pretty much makes all internal APIs and code like Public APIs as a product, which adds a huge costs in terms of initial API designs, assumption validations, because you can't make any mistakes at all. And later down the road it fixes you into a place where if new requirements arise, and they will, you won't be able to pivot or refactor anything.

Literally having this version setup and requirements changing I've seen estimates of what would be a week in a monorepo refactor either being estimated to take a year+ or simply impossible because of so many assumptions of those contracts. And there's such a nasty spaghetti of dependencies with different versions that no one can actually understand or hold in their head, and basically to build certain new features, they would have to start from scratch.

[-]

chipdart 8 months ago

> That pretty much makes all internal APIs and code like Public APIs as a product (...)

No. Different versions are not different products. They are the exact same product, provided in a way that delivers features in a way that won't break clients.

> (...) adds a huge costs in terms of initial API designs (...)

Obviously not. From the producer side, all it takes to support versioning is picking up your versioning strategy and sticking with it. This can mean providing an endpoint through a different path parameter, reading a request header, or even get the version through a query parameter.

Does it take you much design work to add a "v1" in a URL path? How about reading a request header? Does passing a query parameter require meetings to review designs?

Some frameworks even explicitly support versioning as a high-level concept and make it a part of their routing strategies. Copy the source file of a controller to another folder, change it's namespace to avoid conflicts, add a tag to the class, done. You just released a new version of your API.

> (...) assumption validations, because you can't make any mistakes at all.

You somehow managed to get it entirely backwards. With versioning, you can afford making all kinds of mistakes all the time. Breaking interfaces is no longer w concern. You release your changes to an upstream dependency/producer with zero use, downstream dependencies/consumers can gradually switch as their test plans allow, and if any problem is detected then this can be worked out before either consumers go live or producers receive production traffic.

More importantly, you have your old stable versions ready to roll back.

dezgeg 8 months ago

Bugs can and will happen, eg. when A depends on B and some developer accidentally makes a change to B that will pass B's tests but only break in A's use case. In monorepo case such bug can be noticed before the broken code hits master. In versioned case it might take long time (until B releases, and A takes the release into use).

Knowing when no one uses older versions is also hard (especially when you can no longer grep for them, unlike a monorepo) so some people get into defensive mode and never delete or fix any interfaces because "someone might still be using it".

[-]

wavemode 8 months ago

I guess I just don't see how this is any different from using external libraries. Yes, bumping a library from v3.2.4 to v3.2.5 can sometimes break your build. So, the solution is we stop using versioning?

No, you just pin your app to v3.2.4 until a working update is made (either to the library or to your own app). I'm not sure why that suddenly becomes an impossible task once the creators of the app and the library work for the same parent company.

> Knowing when no one uses older versions is also hard

That's the thing though - you don't have to care. If you delete an interface, just bump the version number accordingly. People who still need the interface will stay on the old version. (And - bonus - they are now placed under implicit pressure to refactor their code away from using that old interface, now that they know it's deprecated and will no longer be maintained. By contrast, if you have a practice of keeping code around code just because people are still using it, that code tends to just stick around forever.)

[-]

nuancebydefault 8 months ago

> external libraries

External libraries have to be versioned because their code is not part of your monorepo. If the whole world would be pushing in the same monorepo, while not breaking any functionality, we would not need version numbers.

[-]

ggregoryarms 8 months ago

"If the whole world was a monorepo" is a great point made me laugh.

chipdart 8 months ago

> Bugs can and will happen, eg. when A depends on B and some developer accidentally makes a change to B that will pass B's tests but only break in A's use case. In monorepo case such bug can be noticed before the broken code hits master.

I don't see what point you tried to make. So A breaks when someone breaks B. That's ok. You spot the error the moment you try to update A, well before you commit any code. You file a low-priority ticket and go on with your life because A is still good in production.

> Knowing when no one uses older versions is also hard (...)

Not really, it's the easiest part of the whole thing and a non-issue. If you manage a package this does not even register as a problem. If it's a service you have request metrics.

What is the problem, really?

[-]

dezgeg 8 months ago

> I don't see what point you tried to make. So A breaks when someone breaks B. That's ok. You spot the error the moment you try to update A, well before you commit any code. You file a low-priority ticket and go on with your life because A is still good in production.

What inevitably happens is that A hasn't been updating their version of B in weeks/months, but some recently landed bug fix in B is now needed RightAway(tm). And now if something accidental breakage happened between now and weeks/months ago, it'll be much more annoying to triage.

> Not really, it's the easiest part of the whole thing and a non-issue. If you manage a package this does not even register as a problem.

Well, where I'm at we have semi-regularly teams we have never heard of using our libraries. How can I know what things such people are using?

[-]

ggregoryarms 8 months ago

At least you're in control of when to look at the updates in new versions.

A monorepo gives you less control. If you're unable to enforce control through other means, sure use a monorepo. But it's limiting.

chrisandchris 8 months ago

> c) when no one uses older versions, remove the code. [...]

So, like never?

globular-toast 8 months ago

You actually have to do versioning properly, though. Most people I've worked with do not understand semantic versioning and basically don't understand how to ship new features in a non-breaking way. That or they're too lazy to do it. Either way, dependencies get broken constantly.

That's the real point of a monorepo: you can treat the whole thing as internal, break it to your heart's content. You only need to make sure you don't break the user-facing part, for some subjective value of break.

mewpmewp2 8 months ago

I've witnessed having those packages and versioning only lead down to unmanageable dependency hell.

[-]

chipdart 8 months ago

> I've witnessed having those packages and versioning only lead down to unmanageable dependency hell.

There is no dependency hell. Your consumers allocate work effort to migrate to the latest version. This is work you would still have to do with a monorepo.

[-]

mewpmewp2 8 months ago

In a monorepo it would be much easier to find out who the consumers are in the first place - how they use your APIs, and you could easily create a single PR that the consumers can review and approve.

In my experience, with separate repositories, packages and teams, I think it's a huge challenge to get e.g. 8 different teams together and then to agree to migrate, since everyone's roadmaps are probably full for the following 4 quarters already.

hinkley 8 months ago

If the version numbers of all services built from the PR are identical, you at least have a pretty clear trail to figuring out WTF happened.

Even with a few services, we saw some pretty crunchy issues with people not understanding that service A had version 1.3.1234 of a module and Service B had version 1.3.1245 and that was enough skew to cause problems.

Distinct repos tend to have distinct builds, and sooner or later one of the ten you're building will glitch out and have to be run twice, or the trigger will fail and it won't build at all until the subsequent merge, and having numbers that are close results in a false sense of confidence.

lmz 8 months ago

Isn't a mixed fleet always the case once you have more than one server and do rolling updates?

[-]

xyzzy123 8 months ago

Sort of; at medium scale you can blue/green your whole system out of the monorepo (even if its say 20 services) in k8s and flip the ingresses to cut over during release.

Of course k8s not required, you can do it in straight IaC etc (i.e deploy a whole parallel system and switch).

It's still "mixed fleet" in terms of any shared external resources (queues, db state, etc) but you can change service interfaces etc with impunity and not worry about compatibility / versioning between services.

Throwing temporary compute at the problem can save a lot of busywork and/or thinking about integration problems.

This stops being practical if you get _very_ big but at that point you presumably have more money and engineers to throw at the problem.

[-]

chipdart 8 months ago

> Sort of; at medium scale you can blue/green your whole system out of the monorepo (even if its say 20 services) in k8s and flip the ingresses to cut over during release.

That's overall a bad idea, and negates the whole point of blue-green deployments. This is particularly bad if you have in place any form of distributed transaction.

There are very good reasons why deployment strategies such as rolling deployments and one-box deployments were developed. You need to be able to gradually roll out a change to prevent problems from escalating and causing downtime. If you use all that infrastructure to then flip global switches then you're building up all this infrastructure to the negate any of that with practices that invariably cause problems.

And this is being done just because someone thinks it's a good idea to keep all code in a single repo?

[-]

xyzzy123 8 months ago

> You need to be able to gradually roll out a change to prevent problems from escalating and causing downtime.

From my perspective I prefer it to rolling updates, it just costs a lot of resources. It gives you a chance to verify all the parts of your system together during release.

You deploy an entire new parallel copy of your system and test it. This can be manual or automated. Once you have confidence, you can flip traffic. Alternatively you can siphon off a slice of production traffic and monitor metrics as you gradually shift traffic over to the new release. Note that you can also nearly "instantly" roll back at any point, unlike with a rolling update. Again, this is only "sort of" a mixed fleet situation because you can maintain the invariant that service A at v1.23 only talks to service B at v1.23. This means you can atomically refactor the interface between two services and deploy them without thinking too hard about what happens when versions are mixed.

Distributed transactions I'm not so sure about, we would have to talk specific situations.

> And this is being done just because someone thinks it's a good idea to keep all code in a single repo?

It's more like, if you have a monorepo this is one option that becomes simpler for you to do, should you happen to want to do it.

taeric 8 months ago

Yes. And if you structure your code to explicitly do this, it is a lot easier to reason about.

oftenwrong 8 months ago

This is generally seen as a feature. Most developers won't need to confront the messy reality of the deployed "world". They can work within the relatively pure world of the code history. Of course, some will have to do the work of actually producing releases, performing deployments, and operating the systems.

danudey 8 months ago

I prefer microservices/microrepos _conceptually_, but we had the same experience as your quoted text - making changes to four repos, and backporting those changes to the previous two release branches, means twelve separate PRs to make a change.

Having a centralized configuration library (a shared Makefile that we can pull down into our repo and include into the local Makefile) helps, until you have to make a backwards-incompatible change to that Makefile and then post PRs to every branch of every repo that uses that Makefile.

Now we have almost the entirety of our projects back into one repository and everything is simpler; one PR per release branch, three PRs (typically) for any change that needs backporting. Vastly simpler process and much less room for error.

[-]

esperent 8 months ago

Isn't there some middle ground between microservices and monorepos? For example, a few large clearly defined projects that talk to each other via versioned APIs?

[-]

0xFACEFEED 8 months ago

Assuming you're just referring to repos: not really IMO.

As soon as you split 1 repo into 2 repos you need to start building tooling to support your 2 repos. If your infrastructure is sufficiently robust with 2 repos then you might as well have 3 or 4 or 10. If it's built to _only_ support 2 repos (or 3 or 4) then it's brittle out of the gate.

The value of a monorepo is that you completely eliminate certain classes of problems and take on other classes of problems. Classic trade off. Folks that prefer monorepos take the position that multirepo problems are much harder than monorepo problems most of the time.

[-]

chipdart 8 months ago

> As soon as you split 1 repo into 2 repos you need to start building tooling to support your 2 repos.

No, not really.

If you're talking about projects for modules and components, all you need is a versioning strategy and release consumable packages of your projects.

If you're talking about services, all you need to do is support versioned APIs and preserve your contracts.

No tooling required. For projects you can even make do with git submodules. For services, all you need to do is update clients of your downstream dependencies.

What problems are you facing, exactly?

[-]

joshuamorton 8 months ago

Let me rephrase this:

If you aren't using a monorepo, you need some versioning process, as well as procedural systems in place to ensure that everyone's dependencies, stay reasonably up to date. Otherwise, you end up deferring pain in really unwanted ways, and require sudden, unwanted upgrades through api incompatibility due to external pressure.

This also has the downside of allowing api-owning teams to make changes willy-nilly and break backwards compatibility because they can just do it behind SemVer, and then clients of the api need to own the process of migrating to the new version.

A monorepo fixes both of these: you cannot get out of sync, so it is the api-owning team's responsibility to upgrade clients, since they can't break the API otherwise. Similarly, you get a versioning process for free, and clients can never be using out of date or out of support versions of a dependency.

Services work approximately the same either way, since you can't assume synchronous upgrades across service/rpc boundaries anyway.

Lvl999Noob 8 months ago

The classic 0, 1, or N. You can have 0 repos (no-code or outsourced or just some unversioned blob of code). You can have 1 repo (a mono-repo). Or you can have N number of repos (individual micro services). Limiting yourself to a number that's higher than 1 but still finite is just asking for pain. (Of course implementation limits can apply but there shouldn't be logical limits)

[-]

echelon 8 months ago

Monorepo and microservices are orthogonal.

You can have as many build targets in a monorepo as you like. You can also have large monoliths in a monorepo.

[-]

bvrmn 8 months ago

I worked with monolith split into multiple repos as well. It's orthogonal indeed.

solatic 8 months ago

Hard disagree. I've found that the best setup is two monorepos - one with feature/app code, and one with operations configuration.

The fundamental mismatch is the feature/app code will have longer testing times, stuff like Puppeteer and creating-infra-per-MR that just fundamentally takes a long time to run. But in ops, you need configuration to roll out quickly, maybe run an autoformatter and a linter or two beforehand and that's it. When you want to rollback, you don't need to wait for all your tests to run.

Yes you need to deal with versioning. You can just use automatic timestamps. You can write easy automation to push new timestamps to the ops/config repository when there are new releases in the app repository. The ops repository has minimal configuration to pull the latest version of the app repository and apply it, where the app repository includes infra and deployment scripts themselves.

[-]

gugagore 8 months ago

The problem you describe with waiting for tests for unrelated code changes (or potentially related but you want to skip it for the reasons you describe) is a problem that seems to deserve a solution even within a single monorepo.

What are the solutions? You would need multiple merge queues.

[-]

solatic 8 months ago

There isn't really a solution that preserves the vision and advantages of a monorepo. If infra, config, and code are stored in the same repository, then rolling back one necessarily means that the others are supposed to have been rolled back as well. When the version of the code is not a simple identifier pointing to the code but the code itself, there's simply no way to do that safely without recompiling and retesting the code. Same as when a change to code results in a revert to the infrastructure - run plan, run apply, wait for the slow infrastructure APIs, and that must happen before the change to code.

If you try to come up with fancy rules like forbidding changes to infra, config, and code in the same commit, then you're just cramming a polyrepo design into a monorepo, with much more fragility, because commits get rebased and squashed and etc.

amjnsx 8 months ago

Do you not just write code that’s backwards compatible?e.g an api deployment should not break the front end. The api changes should work on the old front end and the updated front end that uses the api change.

KGunnerud 8 months ago

Done that at two different places. This was public sector in both cases so typically many products in the organization. So one product was one monorepo.

Wrote this some years ago: https://dev.to/kgunnerud/our-experience-monorepo-with-java-m...

Nobody in our team wants to go back to nonmonorepo now, although everone was sceptical initially

svieira 8 months ago

Mono-repo is not the same as mono-service. You can deploy multiple services out of a single repository.

mirekrusin 8 months ago

Two monorepos:

Primary monorepo – single versioned packages for libraries and services that are deployed as a whole.

Secondary monorepo – individually versioned packages for shared libraries and independent (micro)services.

chipdart 8 months ago

> I prefer microservices/microrepos _conceptually_, but we had the same experience as your quoted text - making changes to four repos, and backporting those changes to the previous two release branches, means twelve separate PRs to make a change.

You only need to coordinate this sort of changes if you have absolutely no versioning strategy in place and accept breaking contracts for no reason at all.

If you have any form of versioning in place, you do not need to push multi-project commits. You update upstream packages and services, roll them out, and proceed to update the projects that consume them to gradually point them to the newly released versions.

[-]

gugagore 8 months ago

Surely the git SHA is "a form of versioning", isn't it? Yet I don't see how that form of versioning helps.

[-]

chipdart 8 months ago

> Surely the git SHA is "a form of versioning", isn't it?

If you're consuming a module from a git as a submodule, yes.

chipdart 8 months ago

> IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.

This scenario only happens if you're working with a distributed monolith, you care nothing about breaking existing APIs, and you own all producers and consumers.

When this happens, your problem is obviously not how you maintain projects independently in separate repos. Your problems is how you're failing to preserve API contracts, and instead you go to the extreme of insisting in breaking all contracts at once.

It's also a huge red flag when you talk about "accidentally end up with one repo in this branch". Don't you know what you are pushing to production?

As always, monorepo talks are just a thin veil of optimization placed over a huge mess of major operational and technical problems.

[-]

The_Colonel 8 months ago

> Your problems is how you're failing to preserve API contracts, and instead you go to the extreme of insisting in breaking all contracts at once.

Preserving contracts has a significant cost. Why would you try to preserve contracts if you're in fact in control of all producers and consumers?

[-]

chipdart 8 months ago

> Preserving contracts has a significant cost.

Not really. You just need to break out a new version when you put out a breaking change. This can mean anything between duplicating the code that implements your interface to then apply a change and checking a version flag when invoking an operation to determine which version to run.

Do you struggle to not break your code? To me it's like the main responsibility of any developer, isn't it?

> Why would you try to preserve contracts if you're in fact in control of all producers and consumers?

Because you clearly aren't in control, if you feel compelled to dump anything and everything in the same repository to try to prevent things from breaking.

Also, as others have pointed out, storing code in the same repository gives you no assurance that all your services will be deployed at the same exact moment in time. Consider for example multi-region deployments, or even deployments to multiple availability zones. Those aren't atomic. Why does anyone believe that storing all code in the same repository is enough to make these operations atomic?

[-]

friendzis 8 months ago

> Why does anyone believe that storing all code in the same repository is enough to make these operations atomic?

Not isolated to this particular issue. People really like to pretend that problems are much simpler than they actually are and that inherent complexity is just some edge cases to be ironed out.

[-]

The_Colonel 8 months ago

What is an edge case, what is an accepted outage window, failure rate etc. depends on the specific case.

On one end you have Netflix-like architectures, on the other end you have intranet apps used twice monthly by two users. There's a wide range of needs and expectations in between.

lolinder 8 months ago

> Why does anyone believe that storing all code in the same repository is enough to make these operations atomic?

No one is saying that. But there are lots of types of shared code that can be updated atomically, so why not update them atomically? And there's nothing inherent in being a monorepo that prevents you from being careful about the non-atomic updates.

It's really easy to accidentally deploy multiple repos in a way that gets the deployment order wrong. You have to be careful with non-atomic deployments whatever type of repo structure you have.

The_Colonel 8 months ago

> Do you struggle to not break your code? To me it's like the main responsibility of any developer, isn't it?

Yes, but my time is limited and there's the opportunity cost.

It also means more code, more tests for future. More code, ceteris paribus, represents more maintenance costs for the future.

> Because you clearly aren't in control, if you feel compelled to dump anything and everything in the same repository to try to prevent things from breaking.

I can prevent things from breaking in multi-repo, multi-versioned architecture, it's just more expensive to do so.

stefan_ 8 months ago

API contracts are not free, and for things that were not decided to be an API and or a contract an absolute waste of time.

Of course I’m “breaking all contracts at once”, for most of them were never a contract to begin with, and the monorepo effectively discourages people from assuming random behaviors are in fact contracts, and that preserves my sanity and the sanity of the code base that now doesn’t suffer from the combinatorial explosion in complexity that happens when people want everything to be an API contract.

[-]

chipdart 8 months ago

> API contracts are not free

Yes, they are. They are as free as adding a new endpoint.

> and for things that were not decided to be an API and or a contract an absolute waste of time.

You're talking about integrating multiple projects. There are always contract. You simply chose to be ignorant of that fact and fail to interpret breaking those contracts as the root cause of all your problems.

You point out anyone lauding monorepos as some kind of solution to any problem and I assure you that we can immediately isolate one or more problems caused by versioning and breaking contracts.

The only reason why we see this nonsense paraded in web services is that they are loosely coupled. When people worked mainly with modules and libraries, compiler and linking errors made this sort of error very obvious. Now people chose to unlearn facts and want to call that operational improvements.

[-]

LinXitoW 8 months ago

> There are always contract. You simply chose to be ignorant of that fact and fail to interpret breaking those contracts as the root cause of all your problems.

This reminds me a lot of the dynamic vs. static typing discussion. Even at a function level, there's always a contract, no other option. The only decision you can make is whether it should be well documented, compiler verifiable and explicit, or not.

r1cka 8 months ago

I've found referring to them as implicit vs explicit contracts conveys the idea and is more well received than proclaiming someone's ignorance.

skydhash 8 months ago

People (especially product managers) want to forget that there should be a spec somewhere and there’s a cost associated to the changes because you’re breaking contracts (updating relations between entities) and modifying execution flows (rewriting use cases). If you are making a monorepo, you should also have a single document specifying the whole project because of the tight coupling between services.

audunw 8 months ago

There’s nothing preventing you from having a single pull request in for merging branches over multiple repos. There’s nothing preventing you from having a parent repo with a lock file that gives you a single linear set of commits tracking the state of multiple repos.

That is, if you’re not tied to using just Github of course.

Big monorepos and multiple repo solutions require some tooling to deal with scaling issues.

What surprises me is the attitude that monorepos are the right solution to these challenges. For some projects it makes sense yes, but it’s clear to me that we should have a solution that allows repositories to be composed/combined in elegant ways. Multi-repository pull requests should be a first class feature of any serious source code management system. If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.

[-]

CharlieDigital 8 months ago

    > Multi-repository pull requests should be a first class feature of any serious source code management system.

But it's currently not?

    > If you start two projects separately and then later find out you need to combine their history and work with them as if they were one repository, you shouldn’t be forced to restructure the repositories.

It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.

[-]

dmazzoni 8 months ago

> It's called a directory copy. Cut + paste. I'd add a tag with a comment pointing to the old repo (if needed). But probably after a few weeks, no one is going to look at the old repo.

Not in my experience. I use "git blame" all the time, and routinely read through commits from many years ago in order to understand why a particular method works the way it does.

Luckily, there are many tools for merging git repos into each other while preserving history. It's not as simple as copy and paste, but it's worth the extra efford.

[-]

8 months ago

[deleted]

CRConrad 8 months ago

Any tips on where to find these tools, or just generally find out how that works?

[-]

oftenwrong 8 months ago

An approach I've used is to use https://github.com/newren/git-filter-repo (amazing, versatile tool) to put the tree into a subdirectory, and namespace its tags, and then merge that into another tree with --allow-unrelated-histories. I built a tool for an employer that automated this process for the purpose of consolidating an excessive number of repositories.

[-]

CRConrad 8 months ago

Thanks!

pelletier 8 months ago

> Multi-repository pull requests should be a first class feature of any serious source code management system.

Do you have examples of source code management systems that provide this feature and do you have experience with them? repo-centric approach of GitHub often feels limiting.

[-]

jvolkman 8 months ago

Apparently Gerrit supports this with topics: https://gerrit-review.googlesource.com/Documentation/cross-r...

zelphirkalt 8 months ago

It is very easy to build this yourself. Just use your dependency manager of choice that has git support and create a config of versions of the repos and lock file that. Whenever you have a change across multiple services, simply change that config in one go, so that every commit of that config describes a deployable state of the whole project.

[-]

williamdclt 8 months ago

I might be misunderstanding but that seems off-topic, not related to the idea of "multi-repository pull requests"

jamesfinlayson 8 months ago

At a company I used to be at they used GitHub Enterprise and some repos definitely seemed to have linked repos or linked commits (I don't remember exactly but there was some way of linking for repos that depended on each other).

Attummm 8 months ago

The issue you faced stemmed from the previous best practice of "everything in its own repository." This approach caused major issues. Such as versioning challenges and data model inconsistencies you mentioned. The situations it could lead to are comedy sketches, but it's a real pain especially when you’re part of a team struggling with these problems. And it’s almost impossible to convince a team to change direction once they’ve committed to it.

Now, though, it seems the pendulum has swung in the opposite direction, from “everything in its own repo” to “everything in one repo.” This, too, will create its own set of problems, which also can be comedic, but frustrating to experience. For instance, what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream? Coordinating that with 50 developers spread across 8 projects, all in a single repo.

Instead we could also face the problems we currently face and start out wirn a balanced approach. Start with one repository, or split frontend and backend if needed. For data pipelines that share models with the API, keep them in the same repository, creating a single source of truth for the data model. This method has often led to other developers telling me about the supposed benefits of “everything in its own repo.” Just as I pushed back then, I feel the need to push back now against the monorepo trend.

The same can be said for monoliths and microservices, where the middle ground is often overlooked in discussions about best practices.

They all reminded me of the concept of “no silver bullet”[0]. Any decision will face its own unique challenges. But silver bullet solution can create artificial challenges that are wasteful, painful, and most of all unnecessary.

[0]https://en.m.wikipedia.org/wiki/No_Silver_Bullet

[-]

lolinder 8 months ago

> what happens when someone accidentally pushes a certificate or API key and you need to force an update upstream

The correct approach here is typically to invalidate the certificate or API key. A force push usually doesn't work.

If you're using GitHub, the dangerous commit lives on effectively forever in an awkward "not in a repository" state. Even if you're not on GitHub and your system actually garbage collects, the repo has been cloned onto enough build machines and dev machines that you're better off just treating the key or cert as compromised than trying to track down all the places where it might have been copied.

[-]

Attummm 8 months ago

The example was just to illustrate a point about a forced push.

You’re correct about keys/certs once uploaded, they should be treated as compromised, especially when the repository isn’t self-hosted. However, replacing API keys and certificates can take time, and within a large corporation, it could take months.

[-]

someone654 8 months ago

Can you find other examples for when force pushing a shared branch is acceptable? I have trouble finding plausible examples for communicating a force push with 50 colleagues.

[-]

Attummm 8 months ago

API keys and certs must be removed from git even if compromised. Many firms use in-house GitLab, Bitbucket, etc. GitHub has enterprise support for removal of sensitive data.

Force pushes occur for various reasons. Sensitive data includes customer and employee personal info.

Other cases involve logs, DB backups, cache, PDFs, configs, and binary files.

Maintenance and performance needs form another category.

Team dynamics and varying git expertise can lead to accidental or uninformed force pushes, especially challenging in a monorepo with 50+ contributors.

In summary, reasons range from personal data (GDPR compliance), security requirements for logs/configs, to resolving merge conflicts and performance issues.

Your question regarding the need to communicate between 50 or more devs, there was no need but monorepo idea forces unnecessary communication and team effort when none was needed if there would be more repositories.

8 months ago

[deleted]

eikenberry 8 months ago

I thought one of the whole points behind separate (non-mono)repos was to help enforce loose coupling and if you came to a point where a single feature change required PRs on 4 separate repos then that was an indicator that your project needed refactoring as it was becoming to tightly coupled. The example in the article could have been interpreted to mean that they should refactor the functionality for interacting with the ML model into it's own repo so it could encapsulate this aspect of the project. Instead they doubled down on the tighter coupling by putting them in a monorepo (which itself encourages tighter coupling).

[-]

marcosdumay 8 months ago

The issue is that you can't "enforce" loose coupling. The causality is reversed here.

Your software artifacts will have loose coupling if you divided them well enough on their creation. As soon as they are created, you can't do anything else to change it, except for joining or splitting them.

[-]

eikenberry 8 months ago

It's not about enforcement, it's about encouragement and path of least resistance. In monorepos the path of least resistance is tight coupling (unless discouraged in other ways). In 'microrepros' (?) there is added resistance to tight coupling as was presented in the article. This encourages people down the correct path but cannot enforce it (again, as was presented in the article).

[-]

MajimasEyepatch 8 months ago

It leads people down a different path, but that's not necessarily the correct path. For example, between the lack of discoverability of separate repos and the hurdle of modifying multiple separate repos, teams might be tempted to duplicate functionality and data instead of figuring out the appropriate relationships between services. Some amount of coupling is normal and reasonable. There are ways to manage it well.

ted_dunning 8 months ago

Coupling happens due to other reasons than poor contracts. The discussions from Hyrum Wright from Google about large-scale refactoring are great in this respect.

https://www.youtube.com/watch?v=TrC6ROeV4GI

friendzis 8 months ago

Every decoupling point is based on certain contract. You can get away with a lot with well thought out contracts and interfaces, but sometimes the contract has to change and then both sides of the contract must change in lockstep. Usually it is possible to isolate changes to a contract, but the overhead is just not worth it most of the time.

ericyd 8 months ago

I felt the same, the author seemed to downplay the success while every effect listed in the article felt like a huge improvement.

notwhereyouare 8 months ago

ironically was gonna come and comment on that same second block of text.

We went from monorepo to multi-repo at work and it's been a huge set back and disappointment with the devs because it's what our contractors recommended.

I've asked for a code deploy and everything and it's failed in prod due to a missing check in

[-]

CharlieDigital 8 months ago

    > ...because it's what our contractors recommended

It's sad when this happens instead of taking input from the team on how to actually improve productivity/quality.

A startup I joined started with a multi-repo because the senior team came from a FAANG where this was common practice to have multiple services and a repo for each service.

Problem was that it was a startup with one team of 6 devs and each of the pieces was connected by REST APIs. So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

I refactored the whole thing in one weekend into a monorepo , collapsed the handful of services into one service, and we never looked back.

That refactoring and a later paper out of Google actually inspired me to write this article as a practical guide to building a "modular monolith": https://chrlschn.dev/blog/2024/01/a-practical-guide-to-modul...

[-]

eddd-ddde 8 months ago

At least google and meta are heavy into monorepos, I'm really curious what company is using a _repo per service_. That's insane.

[-]

pc86 8 months ago

It can make sense when you have a huge team of devs and different teams responsible for everything where you may be on multiple teams, and nobody is exactly responsible for all the same set of services you are. Depending on the security/access provisioning culture of the org, "taking half a day to manually grant access to the repos so-and-so needs access to" may actually be an easier sell than "give everyone access to all our code."

If you just have 20-30 devs and everyone is pretty silo'd (e.g. frontend or backend, data or API, etc) having 75 repos for your stuff is just silly.

[-]

eddd-ddde 8 months ago

Even then, what are the benefits? I understand that sometimes there are conditions that require more repos, like open source components, collaboration with other companies, etc. but when everything is under your control you can basically treat any directory as its own "repository".

bobnamob 8 months ago

Amazon uses "repo per service" and it is semi insane, but Brazil (the big ol' internal build system) and Coral (the internal service framework) make it "workable".

As someone who worked in the dev tooling org, getting teams to keep their deps up to date was a nightmare.

[-]

bluGill 8 months ago

Monorepo and multi repo both have their own need for teams to work on dev tooling when the project gets large.

jgtrosh 8 months ago

My team implemented (and reimplemented!) a project using one repo per module. I think the main benefit was ensuring enough separation of concern due to the burden of changing multiple parts together. I managed to reduce something like 10 repos down to 3... Work in progress.

[-]

tpm 8 months ago

> burden of changing multiple parts together

Then you are adapting your project to the properties of code repository. I don't see that as a benefit.

psoundy 8 months ago

Have you heard of OpenShift 4? Self-hosted Kubernetes by Red Hat. Every little piece of the control plane is its own 'operator' (basically a microservice) and every operator is developed in its own repo.

A github search for 'operator' in the openshift org has 178 results:

https://github.com/orgs/openshift/repositories?language=&q=o...

Not all are repos hosting one or more microservices, but most appear to be. Best of luck ensuring consistency and quality across so many repos.

[-]

adra 8 months ago

It's just as easy? When you have a monorepo with 5 million lines of code, you're only going to focus on the part of the code you care about and forget the rest. Same with 50 repos of 100,000 loc.

Enforcing standards means actually having org level mandates around acceptable development standards, and it's enforced using tools. Those tools should be just as easily run on one monorepo than 50+ distributed repositories, nay?

[-]

psoundy 8 months ago

Even in the best case of what you are describing, how are these tools configured and their configuration maintained except via PRs to the repos in question? For every such change, N PRs having to be proposed, reviewed and merged. And all this without considering the common need (in a healthy project at least) to make cross-cutting changes with similar friction around landing a change across repos.

If you wanted to, sure, applying enough time and money could make it work. I like to think that those resources might be better spent, though.

wrs 8 months ago

I worked at a Fortune 1 company that used one repo per release for a certain major software component.

[-]

seadan83 8 months ago

Did that work out well at all? Any silver lining? My first thought is: "branches" & "tags" - wow... Would branches/tags have just been easier to work with?

Were they working with multiple services in a multi-repo? Seems like a cross-product explosion of repos. Did that configuration inhibit releases, or was the process cumbersome but just smooth because it was so rote?

[-]

wrs 8 months ago

It was a venerable on-prem application done in classic three-tier architecture (VB.NET client, app server, and database). It was deployed on a regular basis to thousands of locations (one deploy per location) and was critical to a business with 11-digit revenue.

So yeah, cumbersome, but established, and huge downside risk to messing with the status quo. It was basically Git applied on top of an existing “copy the source” release process.

biorach 8 months ago

was that as insane as it sounds?

marcosdumay 8 months ago

Compared to all the issues with keeping a service running, pushing code to a different repo is trivial. If you think that's insane, the insanity is definitively not on the repository separation.

8 months ago

[deleted]

dewey 8 months ago

It's almost never a good idea to get inspired by what Google / Meta / Huge Company is doing as most of the times you don't have their problems and they have custom toolings and teams making everything work on that scale.

[-]

CharlieDigital 8 months ago

In this case, I'd say it's the opposite: monorepo as an approach works amazingly well for small teams all the ways up to huge orgs (with the right tooling to support it).

The difference is that past a certain level of complexity, the org will most certainly need specialized tooling to support massive codebases to make CI/CD (build, test, deploy, etc.) times sane.

On the other hand, multi-repos may work for massive orgs, but is always going to add friction for small orgs.

[-]

dewey 8 months ago

In this case I wasn't even referring to mono repo or not, but more about the idea of taking inspiration from very large companies for your own not-large-company problems.

influx 8 months ago

I’ve used one of the Meta monorepos (yeah there’s not just one!) and it’s super painful at that scale.

[-]

CRConrad 8 months ago

> ...one of the Meta monorepos (yeah there’s not just one!

Eeh... Might be quite meta-, but really doesn't feel very mono-.

aleksiy123 8 months ago

I feel like this has been repeated so much now that peoples takeaway is that you shouldn't adopt anything from large companies as a small company by default. And thats simply not true.

The point here is to understand what are the problems that are being solved, understand if they are similar to yours, and make a decision based on wether the tradeoffs are a good fit for you.

Not necessarily disagreeing with you but I just feel the pendulum on this statement has swung to far to the other side now.

[-]

CRConrad 8 months ago

> peoples takeaway is that you shouldn't adopt anything from large companies as a small company by default. And thats simply not true.

Yes it is. You said so yourself:

> The point here is to understand what are the problems that are being solved, understand if they are similar to yours, and make a decision based on wether the tradeoffs are a good fit for you.

That's you agreeing: “understand and make a decision” is pretty much the exact opposite of “adopt anything by default” — which peoples takeaway is that you shouldn't do — isn't it? So you're agreeing with the general takeaway, not arguing against it.

I think this pendulum has stopped swinging and come to a rest. And pendulums (pendula? penduli?) always stop pointing straight down, midway between the extremes: You shouldn't mindlessly reject something just because it comes from a big company, but certainly not blindly adopt it just because it does, either. But, just as you said, understand if it fits for you and then decide based on that.

And hey, on this particular issue, it seems some big companies do it one way, and others the opposite — so maybe the (non-existent) “average big company” is halfway between the extremes, too. (Whatever that means; one-and-a-half repository? Or half an infinity of repositories?)

LaserToy 8 months ago

Amazon

stackskipton 8 months ago

>So now any change to one service required deploying that service and pulling down the OpenAPI spec to regenerate client bindings. It was so clumsy and easy to make simple mistakes.

Why? Is your framework heavily tied to client bindings? APIs I consume occasionally get new fields added to it for data I don't need. My code just ignores it. We also have a policy you cannot add a new mandatory field to API without version bump. So maybe REST API would have new field but I didn't send it and it happily didn't care.

jayd16 8 months ago

If prod went down because of a missing check in, there are other problems.

[-]

notwhereyouare 8 months ago

did I say prod went down? I just said it failed in prod. it was a logging change and only half the logging went out. To me, that's a failure

wongarsu 8 months ago

It's not as much of a pain if your tooling supports git repos as dependencies. For example a typical multi-repo PR for us with rust is 1) PR against library 2) PR against application that points dependency to PR's branch, makes changes 3) PR review 4) PR 1 is approved and merged 5) PR 2 is changed to point to new master branch of commit 6) PR 2 is approved and merged

Same idea if you use some kind of versioning and release system. It's still a bit of a pain with all the PRs and coordination involved, but at every step every branch is consistent and buildable, you just check it out and hit build.

This is obviously more difficult if you have a more loosely coupled architecture like microservices. But that's self-inflicted pain

[-]

8 months ago

[deleted]

mountainriver 8 months ago

It’s a game of tradeoffs. Yes the situation described does sometimes happen. If you are intentional about how you break apart your repos you can avoid this most of the time. They don’t need to be micro repos, they can just be regular sized and driven by the data model.

Some benefits include:

* Separate repos evolve better than a monorepo. Particularly in early stage companies where pivots are frequent. * They keep tools just doing one thing well and prevent tying everything in the codebase together. * Dependencies are more flexible, this is particularly useful with ML libraries. * A lot of repo tooling is simpler with medium sized single purpose projects * Stronger sense of ownership boundaries

friendzis 8 months ago

> > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.

Tooling issue. Bitbucket will happily link PRs if instructed so.

> prevents a lot of mistakes from not having the right revision synced down across different repos.

Tooling issue. Depending on setup, "tooling" could be few lines in your favorite scripting language.

---

Monorepo steps around some tooling issues, but sacrifices access control, checkout speed

HelloNurse 8 months ago

This "huge quality of life improvement" is "minor" compared to the lack of significant differences in more important aspects of version control that become neither easier nor more complex: for example resolving conflicts (files that used to be in another repository remain uninvolved) or creating, merging and killing branches (still related to the same features, bugfixes, modules, schedules etc.)

throwawayk7h 8 months ago

Why can't we have one PR, many repos?

eduction 8 months ago

Jibes. Not jives. The word is jibes.

__MatrixMan__ 8 months ago

Every monorepo I've ever met (n=3) has some kind of radioactive DMZ that everybody is afraid to touch because it's not clear who owns it but it is clear from its quality that you don't want to be the last person who touched it because then maybe somebody will think that you own it. It's usually called "core" or somesuch.

Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for. The monorepo-alternative is that you solve it in the DMZ where it feels less like engineering and more like some kind of multiparty political endeavor where PR reviewers of dubious stakeholder status are using the exercise to further agendas which are unrelated to the feature except that it somehow proves them right about whatever architectural point is recently contentious.

Plus, it's always harder to remove something from the DMZ than to add it, so it's always growing and there's this sort of gravitational attractor which, eventually starts warping time such that PR's take longer to merge the closer they are to it.

Better to just do the "hard" work of maintaining versioned interfaces with documented compatibility (backed by tests). You can always decide to collapse your codebase into a black hole later--but once you start on that path you may never escape.

[-]

zaphar 8 months ago

Since we are indulging in generalizations from our past. With separate repos you end up with 10 "cores" that are radioctive DMZ's everybody is afraid to touch. And those "disciplined" public API's will be universally hated by everyone who consumes them.

Neither a monorepo nor separate repos will result in people being disciplined. If you already have the discipline to do separate repositories correctly then you'll be fine with a monorepo.

So I guess it's six on one hand, half dozen in the other.

[-]

__MatrixMan__ 8 months ago

No I think there's a clear difference. I've seen this several times: Somebody changes teams and now they're no longer responsible for a bit of code, but then they learn that it is broken in some way, and now they're sneaking in commits that--on paper--should now be handled by somebody else.

Dev's *like* to feel ownership of reasonably sized chunks of code. We like to arrange it in ways that is pleasing for us to work on later down the road. And once we've made those investments, we like to see them pay off by making quick easy changes that make users happy. Sharing a small codebase with three or four other people and finding ways to make each other's lives easier while supporting it is *fun* and it makes for better code too.

But it only stays fun if you have enough autonomy that you can really own it--you and your small team. Footguns introduced need to be pointed at your feet. Automation introduced needs to save you time. If you've got the preferences of 50 other people to consider, and you know that whatever you do you're going to piss off some 10 of them or another... the fun goes away.

This is simple:

> we own this whole repo and only this 10% of it (the public interface) needs to make external stakeholders happy, otherwise we just care about making each other happy.

...and it has no space in it for there to be any code which is not clearly owned by somebody. In a monorepo, there are plenty of places for that.

[-]

zaphar 8 months ago

Except that monorepos are just as capable of identifying and gatekeeping those spaces as separate repos are. If you aren't going to use those gatekeeping mechanisms in a mono repo I doubt that the quality of your separate repo codebases are going to be any better. You can't solve this problem with technology it's a cultural issue and like all cultural issues any technical constraints you put in place to control it will be just be worked around in ways that make the problem worse not better.

[-]

__MatrixMan__ 8 months ago

I don't doubt that the mechanisms, if used properly, are capable. My point is that the culture that tends to grow up around monorepos is one where those mechanisms aren't used properly.

> If you aren't going to use those gatekeeping mechanisms in a mono repo I doubt that the quality of your separate repo codebases are going to be any better

Those are pretty different competencies right? Getting the monorepo in good shape requires a broad understanding of what is where and who is who (and the participation of your neighbors). Making your repo a nice place for visitors to contribute to... you can learn to to that by emulating repos found anywhere.

[-]

zaphar 8 months ago

My own experience does not suggest that your last statement is true at all in practice.

0x6c6f6c 8 months ago

> Dev's *like* to feel ownership of reasonably sized chunks of code.

I just have to say this has not been my experience at all. A lot of developers want nothing more than to come in and write whatever lines of code they're told to and get paid. Ownership is often the last thing they want.

[-]

__MatrixMan__ 8 months ago

Yes well, extrinsic motivation works for some I guess. I work much better when I chose the task based on my team's vision of where we want to take the code and our understanding of the needs of the people involved. If the problem got so glaring that we weren't already on it by the time a manager was writing it down, we screwed up.

It's a feeling that's much harder to kindle in a monorepo, too much red tape and politics, nothing at all like a sandbox.

And I can't be the only one who feels this way. Otherwise why would there be a big pile of free open source software?

gorgoiler 8 months ago

I would advise annotating your experience anecdotes, which are surely valuable, with some information about team size, company size, corporate structure (tech team in a non tech corp, vs pure tech), age etc.

The meat is in the detail, I find.

[-]

__MatrixMan__ 8 months ago

That's good advice, thanks. In general I'll try to include that at the top level, but the edit window has passed. In these cases the companies were all about 200 employees, something like five to ten teams with hands in the code. Average team size: five to ten. All SaaS companies.

The_Colonel 8 months ago

> Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for.

This pattern has its own set of problems. Strict ownerships separation creates strong dependencies and capacity constraints on the project organization. A single team is rarely able to deliver a complete feature, because it will mean changes in a couple of services. If you go with a model where teams are allowed to build features in "foreign" services, you will still come to the situation that the owning team doesn't feel that responsible for something they haven't built / don't really understand. You can tweak it, involve the owning team more etc. but it has trade-offs / friction.

The worst anti-pattern coming from this is "we have dependency on service X / team Y, but their code reviews are usually very long and pedantic, threatening our delivery. Can we do it locally in our service instead?" which results in people putting stuff where it doesn't belong, for organisational reasons.

[-]

__MatrixMan__ 8 months ago

I'm not saying that the ownership should be strict, you should feel free to submit PRs to my repo, and I to yours.

I just don't want there to be a no-man's-land. And if there must be an every-mans-land, let it be explicitly so, not some pile of stuff that didn't fit anywhere else.

[-]

The_Colonel 8 months ago

That's good, but that still produces friction AND reduces the sense of the ownership of the owning team.

> I just don't want there to be a no-man's-land.

A different model I've seen is that the ownership of the whole system is shared across the trusted senior members (every team has one, but you need to get an approval from an external one). One thing this avoids is the bottleneck on specific teams (only team X can approve the PR, but they're now super busy with their own features).

[-]

__MatrixMan__ 8 months ago

Thats what we're trying to do. I'm not one of those people, but golly they seem overworked.

jatins 8 months ago

> you should feel free to submit PRs to my repo, and I to yours

Haven't seen it happen in practice. For some reason a separate repo induces more of "not my backyard" feeling that a separate folder

[-]

__MatrixMan__ 8 months ago

Really? GitHub is packed with commits authored by people besides the maintainers. Have your really not noticed?

Or do you just stop looking for a CONTRIBUTING.md when you're at work?

materielle 8 months ago

The problem is that DRY principles have be cargo culted way past the point of being useful.

People see two pieces of duplicate, or even similar code in a monorepo, and they feel the urge create some sort of shared directory to de duplicate them. The problem, is this introduces more edges to your graph, which over time increases the monorepo complexity exponentially. And who is going to maintain that shared directory?

Even in a mono-repo, you should probably start off by copy and pasting code around, until you figure out how to design and find ownership for an actual well thought through public api. And maybe for some code, that never happens.

Also, like many things, people cargo cult big tech ideas without understanding why they work. In Google, the monorepo has two key features: 1) every directory has an owner by default 2) the owner can control build target visibility lists.

That means that the ownership team controls who depends on them. By default, everything is private.

Basically, every directory should either be a project, owned by a team. Or some common library code, owned by the library maintainers.

The other thing, is library maintainers need to be discerning with what APIs and code they support. Again, a common refrain I here is “when it doubt, it’s better to put it in the company’s common library to promote code reuse”. That’s the exact wrong way to think about it. Library teams should be very careful about what APIs they support because they will be on the hook maintaining them forever.

This is all to say, I can see how monorepos fail. Because people cherry-pick the ideas they like, but ignore the important things which make them work at Google: 1) default code owners 2) visibility lists that default to private 3) discerning library teams that aggressively reject code additions 4) don’t be too dry, if it can’t be exposed in a public api, just copy and paste, just like you would in a microservice.

Boxxed 8 months ago

You can have well-defined interfaces without splitting it into many repos just like how you can have well-defined interfaces without splitting it into microservices. In fact, I've seen enough garbage to know that forcing the issue via these mechanisms just makes bad software worse.

actinium226 8 months ago

Even if people are disciplined about versioned interfaces with documented compatibility (which is something I've personally never seen, except perhaps in a small number of very high quality open source repos), you still end up with deployment headaches. Making any sort of backwards incompatible change requires a high level of coordination and opens up the potential for mistakes that break service. With a monorepo you can deploy a backwards incompatible change all at once, or at the very least you can see in a single commit/PR what the changes were, instead of split across multiple repos.

[-]

__MatrixMan__ 8 months ago

You can achieve the same thing with a single commit that changes only a bunch of version pointers, or by moving that kind of thing into a space governed by feature flags--both of which are easier to write tests for.

How do you write a test for the rollbackability of a commit, if that test must be in the commit which it asserts can be rolled back? Does the test clone a second copy of the monorepo at the older version so that it has access to the "before" state? The hard parts of backwards compatibility have to do with accumulated state over time, they don't go away because you've bound some versions together.

deskr 8 months ago

Any old codebase will have their own version of Nina, Katya, Masha or even just an open reactor, monorepo or not.

I once worked on a codebase that had had a directory that no one was allowed to touch because it was "about to be decommissioned". When I looked into it more closely, it had been "about to be decommissioned" for 7 years and it was holding so many things up.

rddbs 8 months ago

The problem with giving each team a repo and an API surface is that you create API boundaries where your organizational boundaries are, not necessary where your service boundaries are. And as your organizational structure evolves over time, your repo and API boundaries lag behind since it’s so difficult to make large scale shifts to the code.

[-]

__MatrixMan__ 8 months ago

I don't see that as a bad thing. By creating granules you constrain that evolution such that nobody ends up with half of a thing. Without those boundaries, people who don't understand the code may motivate organizational structures which don't make sense.

That's where I've been for a few months: The work of prior gatekeepers now run through the middle of what we're responsible for. It feels like we bought a house online and when we showed up the kitchen is in one country and the bathroom is in another so we have to clear customs several times a day and we have to hold a summit with the friends of the previous owner if we want to change anything--even things in the interior. The architect of the reorg would never have done this if the natural boundaries had been a bit more apparent, i.e. as a list of repos to be assigned to the new teams.

I'd prefer large scale shifts to come by replacing an old repo with a new one (or one with two, or two with one, or by open sourcing one which you no longer care to maintain). Maybe that slows down the organizational rate of change, but if the alternative is pretending to have changed in a way which isn't actually sustainable, then maybe slowing down is better.

hburd 8 months ago

There is a name for this phenomenon, it is called Conway's Law: https://en.wikipedia.org/wiki/Conway%27s_law

More recent commentators have noted a corollary - for software projects with a long lifetime of code reuse, such as Microsoft Windows, the structure of the code mirrors not only the communication structure of the organization which created the most recent release, but also the communication structures of every previous team which worked on that code.

brap 8 months ago

This seems like an org/process problem and not a tech problem.

You can end up in this situation with or without a monorepo.

[-]

__MatrixMan__ 8 months ago

Yes but it's harder to do by accident. If code is getting into the product at all it must be in somebody's repo?

mhh__ 8 months ago

Code ownership is a scam

[-]

mceachen 8 months ago

It can be, but the idea of having “code experts” (or people with the most battle scars with this chunk of code) can absolutely be helpful, especially if the team is sufficiently large (and people aren’t afraid to talk to other people for guidance).

__MatrixMan__ 8 months ago

Only when somebody else pushes it on you.

bob1029 8 months ago

The #1 benefit for me regarding the monorepo strategy is that when someone on the team refers to a commit hash, there is exactly one place to go and it provides a consistent point-in-time snapshot of everything. Ideally, all of the commits on master are ~good, so you have approximately a perfect time machine to work with.

I have solved more bugs looking at diffs in GitHub than I have in my debugger simply by having everything in one happy scrolly view. Being able to flick my mouse wheel a few clicks and confirm that the schema does indeed align with the new DTO model props has saved me countless hours. Confirming stuff like this across multiple repos & commits can encourage a more lackadaisical approach. This also dramatically simplifies things like ORM migrations, especially if you require that all branches rebase & pass tests before merging.

I agree with most of the hypothetical caveats, but if you can overcome them even with some mild degree of suffering, I don't see why you wouldn't fight for it.

[-]

hu3 8 months ago

I've seen teams take this even further and vendor all dependencies.

This way a commit hash contains even the exact third party code involved.

[-]

yurishimo 8 months ago

This is perfectly fine if your language of choice doesn’t have a robust package manager that supports version pinning. But then you need to enforce pinning across your org which could prove its own challenge.

[-]

hu3 8 months ago

Why only restrict to these cases?

One of the teams vendored npm and go packages which are robust. Code always ran inside Docker. Being able to just clone and run simplified their flow.

IshKebab 8 months ago

I don't see how that relates to monorepos. Even using submodules a single commit hash will specify the commit hashes of every submodule (usually; you can actually set up submodules to point to a branch instead of a commit but I've never seen anyone do this).

There are definitely huge benefits to monorepos but I don't see how this is one.

[-]

actinium226 8 months ago

With submodules you end up doing this dance where you have to do the following steps in order to make a change:

1) Create a PR in the submodule with your changes and create a PR in the main repo with the submodule hash replaced with the one in the 1st PR 2) Merge the submodule PR (now the main repo is pointing at an out of date hash in the submodule) 3) Go back to the main repo PR, update the hash, rerun tests, merge it (assuming tests pass)

It often feels burdensome, particularly when you need to change a single line of code in the submodule. It invites race conditions when multiple people work on the submodule simultaneously (so in step 3 above your tests might not pass because somebody else merged something into the submodule repo in between 1 and 2). It also creates weirdly ambiguous situations like what if someone updates the documentation in the submodule - do you update the main repo, or do you simply allow the main repo to lag behind the submodule's master branch?

[-]

fwip 8 months ago

I haven't used submodules much (just one project for a few years), but the only flow that seemed to work is "never commit changes in the submodule."

You can change the code locally on-disk for testing, then submit your changes to upstream. Once you've got an upstream ref (preferably after merge), then you can update the submodule to match.

This does slow everything down though, unless you're cleverer about CI and deployment than we were (very possible).

[-]

IshKebab 8 months ago

I think you're conflating submodules with vendoring.

[-]

fwip 8 months ago

That's one way to describe our strategy, yeah. We used submodules technically to basically vendor our dependencies. Our "upstream" was other people in the same org with whom we had a good working relationship, so our patches generally got merged within a day or two.

IshKebab 8 months ago

I agree, and that's annoying, but it's a different problem.

xyzzy_plugh 8 months ago

Without indicating my personal feelings on monorepo vs polyrepo, or expressing any thoughts about the experience shared here, I would like to point out that open-source projects have different and sometimes conflicting needs compared to proprietary closed-source projects. The best solution for one is sometimes the extreme opposite for the other.

In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.

[-]

bunderbunder 8 months ago

I've also seen this with branching strategies. IMO the best branching strategy for open source projects is generally the worst one for commercial projects, and vice versa.

[-]

adastra22 8 months ago

Which strategies are you assigning to each?

b5hi 8 months ago

this should be the top comment

gorgoiler 8 months ago

Repository boundaries are affected far more by the social structure of your organisation than anything technical.

Do you want hard boundaries between teams — clear responsibilities with formal ceremony across boundaries, but at the expense of living with inflexibility?

Do you want fluidity in engineering, without fixed silos and a flat org structure that encourages anyone to take on anything that’s important to the business right now, but with the much bigger overhead of needing strong people leaders capable of herding the chaos?

I’m sure there are dozens of other examples of org structures and how they are reflected in code layout, repo layout, shared directories, dropboxes, chat channels, and email groups etc.

msoad 8 months ago

I love monorepos but I'm not sure if Git is the right tool beyond certain scale. Where I work doing a simple `git status` takes seconds due to the size of the repo. There has been various attempts to solve Git performance but so far this is nothing close to what I experienced at Google.

The Git team should really invest in tooling for very large repos. Our repo is around 10M files and 100M lines of code and no amount of hacks on top of Git (cache, sparse checkout etc etc) is not really solving the core problem.

Meta and Google have really solved this problem internally but there is no real open source solution that works for everyone out there.

[-]

vlovich123 8 months ago

Meta open-sourced their complete stack: https://github.com/facebook/sapling

Microsoft released Scalar (https://github.com/microsoft/scalar) although it's not a complete stack yet but it is already planning on releasing the backend components eventually.

Have you tried Sapling? It has EdenFS baked in so it'll only materialize the files you touch and operations are fast because it has a filesystem watcher for activity so it doesn't need to do a lot of work to maintain a view of what has been invalidated.

[-]

IshKebab 8 months ago

> Meta open-sourced their complete stack

I don't think so. Last I checked there are still server components that are too tied into Facebook's infrastructure to open source.

> Scalar

As far as I understand it Scalar is no longer active because most of the features have been merged into mainline Git.

[-]

aseipp 8 months ago

FWIW, both Mononoke (the server) and EdenFS (the virtual filesystem) are now "usable, but undocumented & unsupported" in the open-source repo as of the past ~3 months, and there are published binary artifacts created on each commit if you go look through the workflows. So, if you're adventurous, you can try using them now. That said, it's probably still a year out or so from actually being usable and documented.

[-]

vlovich123 8 months ago

That’s what I thought from quickly skimming the repo. Thought I was a bit crazy.

dijit 8 months ago

I’m secretly hoping that google releases piper (and Mondrian); the gaming industry would go wild.

Perforce is pretty brutal, and the code review tools are awful - but its still the undisputed king of mixed text and binary assets in a huge monorepo.

[-]

habosa 8 months ago

Mondrian! That’s a name I haven’t heard in a while. Google uses Critique for code review these days.

I tried to bring the best of Critique to GitHub with https://codeapprove.com but you’re right there’s a lot that just doesn’t work on top of git.

ralph84 8 months ago

There were rumors that at one point piper included source code licensed from Perforce. That could make open sourcing it more difficult if any of that code is still hanging around.

optymizer 8 months ago

We at Meta use Sapling, which is open source. Using EdenFS is what makes it super fast at scale.

https://sapling-scm.com/

https://engineering.fb.com/2023/06/27/developer-tools/meta-d...

brown9-2 8 months ago

fsmonitor and untrackedCache doesn’t help?

alphazard 8 months ago

The classic micro/multi repo mistake is reaching for more repos when you really need better tooling and permissions on single repo. People have probably wasted millions of engineer-hours across the industry with multiple repos, all because GitHub doesn't have expressive path-level permissions.

[-]

zug_zug 8 months ago

I believe .codeowners does that

[-]

Degorath 8 months ago

It's not a good implementation of ownership tags, unfortunately :(

mgaunard 8 months ago

Doing modular right is harder than doing monolithic right.

But if you do it right, the advantage you get is that you get to pick which versions of your dependencies you use; while quite often you just want to use the latest, being able to pin is also very useful.

[-]

move-on-by 8 months ago

This is something I’ve been preaching at work and it just falls on deaf ears. We aren’t even doing a great job with monolithic, why people think modular is going to _improve_ things is beyond my comprehension. I’ve pretty much decided no one actually cares and they just see it as an opportunity for résumé building. Someone want to enlighten me?

lukewink 8 months ago

You can still publish packages and pull them down as (pinned) dependencies all within a monorepo.

[-]

mgaunard 8 months ago

that's a terrible and arguably broken-by-design workflow which entirely defeats the point of the monorepo, which is to have a unified build of everything together, rather than building things piecemeal in ways that could be incompatible.

For C++ in particular, you need to express your dependencies in terms of source versions, and ensure all of the build artifacts you link together were built against the same source version of every transitive dependency and with the same flags. Failure to do that results in undefined behaviour, and indeed I have seen large organizations with unreliable builds as a manner of routine because of that.

The best way to achieve that is to just build the whole thing from source, with a content-addressable-store shared with the whole organization to transparently avoid building redundant things. Whether your source is in a single repo or spread over several doesn't matter so long as your tooling manages that for you and knows where to get things, but ultimately the right way to do modular is simply to synthesize the equivalent monorepo and build that. Sometimes there is the requirement that specific sources should have restricted access, which is often a reason why people avoid building from source, but that's easy to work around by building on remote agents.

Now for some reason there is no good open-source build system for C++, while Rust mostly got it right on the first try. Maybe it's because there are some C++ users still attached to the notion of manually managing ABI.

stackskipton 8 months ago

As DevOps/SRE type person that occasionally gets stuck with builds, Monorepos world well if company will invest in the build process. However, many companies don't do well in this area and Monorepo blast radius becomes much bigger so individual repos it is. Also, depending on the language, building private repo is easy enough to keep all common libraries in.

photonthug 8 months ago

Monorepos: if you don’t have it, everyone wants it, and if you do have it, no one likes it. There are solid, good faith arguments for each way. There are very real benefits to switching in some circumstances, but those reasons are themselves fragile and subject to frequent change based on cultural or operational shake ups.

So in the end this is a yaml vs json type of argument mostly, and if you’re thinking about rioting over this there is a very good chance you could find a better hill to die on.

[-]

dan-robertson 8 months ago

I have a monorepo and like it. I think at some size you get enough people maintaining the repo that it becomes good? I also never want yaml, fwiw.

[-]

photonthug 8 months ago

> I also never want yaml, fwiw.

Fair enough, it’s just that if you’re my coworker or employee I’m wondering if you don’t have something more important to worry about =D

siva7 8 months ago

Ok, but the more interesting part - how did you solve the CI/CD part and how does it compare to a multirepo?

[-]

devjab 8 months ago

I don’t think CI/CD should really be a big worry as far as mono-repositories go as you can setup different pipelines and different flows with different configurations. Something you’re probably already doing if you have multiple repos.

In my experience the article is right when it tells you there isn’t that big of a difference. We have all sorts of repositories, some of which are basically mono-repositories for their business domain. We tend to separate where it “makes sense” which for us means that it’s when what we put into repositories is completely separate from everything else. We used to have a lot of micro-repositories and it wasn’t that different to be honest. We grouped more of them together to make it easier for us to be DORA compliant in terms of the bureaucracy it adds to your documentation burden. Technically I hardly notice.

[-]

JamesSwift 8 months ago

In my limited-but-not-nothing experience working with mono vs multi repo of the same projects, CI/CD definitely was one of the harder pieces to solve. Its highly dependent on your frameworks and CI provider on just how straightforward it is going to be, and most of them are "not very straightforward".

The basic way most work is to run full CI on every change. This quickly becomes a huge speedbump to deployment velocity until a solution for "only run what is affected" is found.

[-]

devjab 8 months ago

Which CI/CD pipelines have you had issues with? Because that isn’t my experience at all. With both GitHub (also Azure DevOps) and gitlab you can separate your pipelines with configurations like .gitlab-ci.yml. I guess it can be non-trivial to setup proper parallelisation when you have a lot of build stages if this isn’t something you’re familiar with. A lot of other more self-hosted tools like Gradle, RushJS and many others you can setup configurations which does X if Y and make sure only to run things which are necessary.

I don’t want to be rude, but a lot of these tools have rather accessible documentation on how to get up and running as well as extensive documentation for more complex challenges available in their official docs. Which is probably the, only, place you’ll find good ways of working with it because a lot of the search engine and LLM “solutions” will range from horrible to outdated.

It can be both slower and faster than micro-repositories in my experience, however, you’re right that it can indeed be a Cthulhu level speed bump if you do it wrong.

[-]

JamesSwift 8 months ago

I implied but didnt explicitly mention that I'm talking from the context of moving _from_ existing polyrepo _to_ monorepo. The tooling is out there to walk a more happy-path experience if you jump in on day 1 (or early in the product lifecycle). But its much harder to migrate to it and not have to redo a bunch of CI-related tooling.

[-]

devjab 8 months ago

Oh, fair enough. It’ll depend on the situation of course. We didn’t have they much of an issue as we already had the CI/CD pipelines for the single projects which mean we could merge multiple of them into a single project fairly easily. But as you say that’s us, we could have easily faced a situation where it would be a complete rebuild of everything if we’d used something like the Azure DevOps “drag and drop” configurations in their web gui.

bluGill 8 months ago

The problem with "only run what is affected" is it is really easy to have something that is affected but doesn't seem like it should be (that is whatever tools you have to detect is it affected say it isn't). So if you have such a system you must have regular rebuild everything jobs as well to verify you didn't break something unexpected.

I'm not against only run what is affected, it is a good answer. It just has failings that you need to be aware of.

[-]

JamesSwift 8 months ago

Yeah thats a good point. Especially for an overly-dynamic runtime like ruby/rails, theres just not usually a clean way to cordon off sections of code. On the other hand, using nx in an angular project was pretty amazing.

[-]

bluGill 8 months ago

Even in something like C++ you often have configuration, startup scripts (I'm in embedded, maybe this isn't a think elsewhere), database schemas, and other such things that the code depends on but it isn't obvious to the build system that the dependency exists.

CharlieDigital 8 months ago

Most CI/CD platforms will allow specification of targeted triggers.

For example, in GitHub[0]:

    name: ".NET - PR Unit Test"
    
    on:
      ## Only execute these unit tests when a file in this directory changes.
      pull_request:
        branches: [main]
        paths: [src/services/publishing/**.cs, src/tests/unit/**.cs]

So we set up different workflows that kick off based on the sets of files that change.

[0] https://docs.github.com/en/actions/writing-workflows/workflo...

[-]

victorNicollet 8 months ago

I'm not familiar with GitHub Actions, but we reverted our migration to Bitbucket Pipelines because of a nasty side-effect of conditional execution: if a commit triggers test suite T1 but not T2, and T1 is successful, Bitbucket displays that commit with a green "everything is fine" check mark, regardless of the status of T2 on any ancestors of that commit.

That is, the green check mark means "the changes in this commit did not break anything that was not already broken", as opposed to the more useful "the repository, as of this commit, passes all tests".

[-]

plorkyeran 8 months ago

I would find it extremely confusing and unhelpful if tests in the parent commit which weren't rerun for a PR because nothing relevant was touched marked the PR as red. Why would you even want that? That's not something which is relevant to evaluating the PR and would make you get in the habit of ignoring failures.

If you split something into multiple repositories then surely you wouldn't mark PRs on one of them as red just because tests are failing in a different one?

[-]

victorNicollet 8 months ago

I suppose our development process is a bit unusual.

The meaning we give to "the commit is green" is not "this PR can be merged" but "this can be deployed to production", and it is used for the purpose of selecting a release candidate several times a week. It is a statement about the entire state of the project as of that commit, rather than just the changes introduced in that commit.

I can understand the frustration of creating a PR from a red commit on the main branch, and having that PR be red as well as a result. I can't say this has happened very often, though: red commits on the main branch are very rare, and new branches tend to be started right after a deployment, so it's overwhelmingly likely that the PR will be rooted at a green commit. When it does happen, the time it takes to push a fix (or a revert) to the main branch is usually much shorter than the time for a review of the PR, which means it is possible to rebase the PR on top of a green commit as part of the normal PR acceptance timeline.

[-]

plorkyeran 8 months ago

Going off the PR status to determine if the end result is deployable is not reliable. A non-FF merge can have both the base commit and the PR be green but the merged result fail. You need to run your full test suite on the merged result at some point before deployment; either via a commit queue or post-merge testing.

[-]

victorNicollet 8 months ago

I agree ! We use the commit status instead of the PR status. A non-FF merge commit, being a commit, would have its own status separate from the status of its parents.

ants_everywhere 8 months ago

isn't that generally what you want? the check mark tells you the commit didn't break anything. if something was already broken it should have either blocked the commit that broke it or there's a flake somewhere that you can only locate by periodically running tests independent of any PR activity.

daelon 8 months ago

Is it a side effect if it's also the primary effect?

hk1337 8 months ago

Even AWS CodeBuild (or CodePipeline) allows you to do this now. It didn't before but it's a fairly recent update.

[-]

CharlieDigital 8 months ago

As a prior user of AWS Code*, I can appreciate that you qualified that with "Even" LMAO

victorNicollet 8 months ago

Wouldn't CI be easier with a monorepo ? Testing integration across multiple repositories (triggered by changes in any of them) seems more complex than just adding another test suite to a single repo.

[-]

bluGill 8 months ago

Pros and cons. Both can be used successfully, but there are different problems to each. If you have a large project you will have a tool teams to deal with the problems of your solution.

habosa 8 months ago

I think this is one of the big reasons so many companies end up using Bazel as part of scaling their monorepos. It has many faults but one thing it does perfectly is “build and test everything affected by this commit”

IshKebab 8 months ago

You use a build system that sandboxes dependencies (Bazel, Buck2, Please, etc.) so that you know there are no undeclared dependencies. Then you can simply query the build system "what could have been affected by this change" and only build/test those things.

gregmac 8 months ago

To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.

If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.

The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).

Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.

TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.

[-]

CharlieDigital 8 months ago

It doesn't have to be that way.

You can have a mono-repo and deploy different parts of the repo as different services.

You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?

[-]

Falimonda 8 months ago

This is spot on. A monorepo can still include a granular and standardized CI configuration across code paths. Nothing about monorepo forces you to perform a singular deployment.

The gains provided by moving from polyrepo to monorepo are immense.

Developer access control is the only thing I can think to justify polyrepo.

I'm curious if and how others who see the advantages of monorepo have justified polyrepo in spite of that.

gregmac 8 months ago

> If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?

Why would you bother to spend the time figuring out whether or not it needs to get deployed? Why would you spend time training other (and new) people to be able to figure that out? Why even take on the risk of someone making a mistake?

If you make your deployment fast and seamless, who cares? Deploy everything every time. It eliminates a whole category of potential mistakes and troubleshooting paths, and it exercises the deployment process (so when you need it, you know it works).

[-]

CharlieDigital 8 months ago

It's literally automatic by configuring your workflows by file changes.

   > If you make your deployment fast and seamless, who cares?

It still costs CI/CD time and depending on the target platform, there are foundational limitations on how fast you can deploy. Fixing a button in a React SPA and deploying that to S3 and CloudFront if fast. Building and deploying a Go backend container -- that didn't even change -- to ECS is at least a 4-5 minute affair.

[-]

gregmac 8 months ago

I get that, and like anything it's a matter of trade-offs.

To me, in most cases, 4 or even 10 minutes is just not a big deal. It's fire-and-forget, and there should be a bunch of things in place to prevent bad deploys and ultimately let the team know if something gets messed up.

When there's an outage, multiple people get involved and it can easily hit 10+ hours of time spent. If adding a few minutes to the deploy time prevents an outage or two, that's worth it IMHO.

bryanlarsen 8 months ago

If you don't deploy in tandem, you need to test forwards & backwards compatibility. That's tough with either a monorepo or separate repos, but arguably it'd be simple with separate repos.

[-]

CharlieDigital 8 months ago

It doesn't have to be that complicated.

All you need to know is "does changing this code affect that code".

In the example I've given -- a React SPA and Go backend -- let's assume that there's a gRPC binding originating from the backend. How do we know that we also need to deploy the SPA? Updating the schema would cause generation of a new client + model in the SPA. Now you know that you need to deploy both and this can be done simply by detecting roots for modified files.

You can scale this. If that gRPC change affected some other web extension project, apply the same basic principle: detect that a file changed under this root -> trigger the workflow that rebuilds, tests, and deploys from this root.

oneplane 8 months ago

You wouldn't, but making a repo collection into a mono-repo means your mono-deploy needs to be split into a multi-maybe-deploy.

As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.

It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).

But all of that needs to be constructed too, just merging some repos into one doesn't do that.

[-]

CharlieDigital 8 months ago

This is not very complex at all.

I linked an example below. Most CI/CD, like GitHub Actions[0], can easily be configured to trigger on changes for files in a specific path.

As a very basic starting point, you only need to set up simple rules to detect which monorepo roots changed.

[0] https://docs.github.com/en/actions/writing-workflows/workflo...

[-]

oneplane 8 months ago

It's not complex if it's just a 'dump everything in one place'-repository, but the concept of a mono repository is that separate applications that share code can share it by being in the same commit stream.

That's where it immediately gets problematic; say you have a client and a server and a shared library that does your protocol or client/server code for you. If you wanted to bump the server part and check if it's backwards compatible with say, the clients that you have already deployed, your basic "match this stuff" pattern doesn't work because it now matches either nothing, because it just builds the library and not the client or the server, or it matches all of it, because you didn't make a distinction. And when you just want the new server with the new library and nothing else, you somehow have to build and emit the new library first, reference that specific version in your code, rebuild your code, and meanwhile not affect the client. That also means when someone else is busy working on the client or on the older library, you have to essentially all be on different branches that cannot integrate with each other because they would overwrite or mess up the work of the people that are busy with their own part.

You could go monorail and do something where every side effect of the thing you touches automatically also becomes your responsibility to fix, but most companies aren't at a scale to pay for that. That also applies to things like Bazel. You could use it to track dependent builds and also tag specific inter-commit dependencies so your server might build with the HEAD source library and your client would still be on an older commit. But that that point you have just done version/reference pinning, with extra steps. You also can't see that at the same time in your editor, unless you open two views in to the same source, at which point you might as well open two views, one for the library and one for the server.

If your project, organisation of people (teams) and dependencies are simple enough, and you only deploy one version at a time, of everything, then yes, doing multiple directories in a single repo or doing multiple repos is not much of a change. But you also gain practically nothing.

There are probably some more options as I don't doubt it is more of a gradient; the Elastic sources seem to do this somewhat well, but they all release everything at once in lockstep. I suppose one way to take separate versions out of that is to always export the builds and have a completely separate CD configuration and workflow elsewhere. This appears to be what they have done as it's not public.

mxey 8 months ago

Unless that literally prevents you from accessing other parts of the repository, that creates a big risk that you depend on something from another folder and are not triggering rebuilds when you should.

aswerty 8 months ago

This mirrors my own experience in the SaaS world. Anytime things move towards multiple artifacts/pipelines in one repo; trying to understand what change existed where and when seems to always become very difficult.

Of course the multirepo approach means you do this dance a lot more: - Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used) - Update upstream systems to the new change - Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about

While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.

Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.

jolt42 8 months ago

I couldn't disagree more. DevOps probably likes it but from software dev standpoint it's not helpful at all. Had to stave that thinking off at my last job (because of devops). It was obvious it was going to be a huge waste of time with a continuing negative payback. It was trivial to essentially put a description file for our repo of 7 or so components.

default-kramer 8 months ago

> Refactoring across repository boundaries requires much more activation energy as compared to spotting and performing gradual refactorings across folder boundaries. Technically it is the same, but the psychological barriers are different.

I love the "activation energy" metaphor here. But I don't agree that "technically it is the same." At my current job, we have more than 100 minirepos and I am unable to confidently refactor the system like I normally would in a monorepo. It's not merely the psychological barrier. It's that I am unable to find all the call sites of any "published" function. Minirepos create too many "published" functions in the form of Nuget packages. Microservices create too many "published" functions in the form of API endpoints. In either case, "Find All References" no longer works; you have to grep, but most names are not unique enough.

For this reason, the kind of refactoring that keeps a codebase healthy happens at a much lower rate than all the other projects I've worked on.

vekker 8 months ago

I like monorepos as a developer, but as a founder, monorepos have one massive downside: if you want to hire outside help, you have to share everything.

While in some cases, the complete context is helpful for the job, in other cases, and I realize this may be pure paranoia but, you may not want to share the complete picture.

[-]

IshKebab 8 months ago

Is that really an issue? Huge companies like Microsoft and Google use monorepos, and they hire tens of thousands of people and contractors all with access to the code.

I think it's a natural fear but the reality is that a) most people don't leak source code, and b) access to source code isn't really that valuable. Most source code is too custom to be useful to most other people, and most competitors (outside China at least) wouldn't want to steal code anyway.

Actually I did find this answer on how Google does it and apparently they do support some ACLs for directories in their monorepo. Microsoft uses Git though so I'm not sure what they do.

https://www.quora.com/If-Google-has-1-big-monorepo-how-do-th...

[-]

bob1029 8 months ago

> and b) access to source code isn't really that valuable

This is a very important lesson.

Once you learn that The Moat is more about the customers & trust, you stop worrying so much about every last possible security vector into your text files.

Treating a repository like a SCIF will put a lot of friction on getting things done. If you simply refrain from placing production keys/certs/secrets in your source code, nothing bad will likely occur with a broad access policy.

The chances that your business has source code with any intrinsic market value is close to zero. That is how much money you should spend on defending it.

LunicLynx 8 months ago

There is only one concept of a monorepo. And that is the google approach. This is a project repo and in a project repo things should stay together.

Your tooling must be different for it to work.

So using git for it will not have a positive result.

8 months ago

[deleted]

akoboldfrying 8 months ago

This prompted a shower thought: Isn't N separate repos actually strictly worse than a monorepo with N completely independent long-lived branches, where each person checks out all the ones they need to work on under separate folders with `git worktree add`?

I can think of only 2 ways that the multiple-branch monorepo is worse:

1. If the monorepo is large, everyone has to deal with a fat .git folder even if they have only checked out a branch with a few files.

2. Today, everyone expects different branches in a repo to contain "different versions of the same thing", not "a bunch of different things". But this is purely convention.

The only real benefit that I can see of making a separate repo (over adding a new project directory to a "classic" monorepo) is the lower barrier to getting underway -- you can just immediately start doing whatever you want; the pain of syncing repos comes later. But this is also true when starting work under a new branch in the branch-per-project style monorepo: you can just create a branch from the initial commit, and away you go -- and if you need to atomically make changes across projects, just merge their branches first!

What are the downsides I'm not seeing?

[-]

adastra22 8 months ago

> a monorepo with N completely independent long-lived branches

That's not a monorepo.

[-]

akoboldfrying 8 months ago

It's a single git repo.

What would you call it?

[-]

sunshowers 8 months ago

I'd consider continuous integration (people continuously merge their changes into main) to be one of the defining characteristics of a monorepo.

adastra22 8 months ago

“Repo” here really means branch. A monorepo is where you put everything into a single branch.

syndicatedjelly 8 months ago

Some thoughts:

1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.

2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?

3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project

[-]

swisniewski 8 months ago

I would be hard pressed to call the Linux Kernel a mono repo.

It’s all one kernel really.

The Kernel is a monolith, but that doesn’t make its repo a mono repo.

FreeBSD, on the other hand, is a mono-repo. It has the kernel, all the user mode tools, basically everything in a single repo.

That is very different from the Linux ecosystem as whole.

Linux is not a mono repo.

datadrivenangel 8 months ago

Monorepo is just a single repo. Yup.

Git submodules have some places where you can surprisingly lose branches/stashed changes.

[-]

syndicatedjelly 8 months ago

One of my repos has a dependency on another repo (that I also own). I initialized it as a git submodule (e.g. my_org/repo1 has a submodule of my_org/repo2).

    Git submodules have some places where you can surprisingly lose branches/stashed changes.

This concerns me, as git generally behaves as a leak-proof abstraction in my experience. Can you elaborate or share where I can learn more about this issue?

[-]

datadrivenangel 8 months ago

From the git-scm book:

"The other main caveat that many people run into involves switching from subdirectories to submodules. If you’ve been tracking files in your project and you want to move them out into a submodule, you must be careful or Git will get angry at you. "

Though apparently newer versions of git are better about not losing submodule branches, so my concerns were outdated.

klooney 8 months ago

> Does it just mean a single version-controlled repository of code?

Yeah- they idea is that all of your projects share a common repo. This has advantages and drawbacks. Google is most famous for this approach, although I think they technically have three now- one for Google, one for Android, and one for Chrome.

> They seem like a great solution to me

They don't work in a team context because they're extra steps that people don't do, basically. And did some reason a lot of people find them confusing.

[-]

nonameiguess 8 months ago

https://github.com/google/ contains 2700+ repositories. I don't know necessarily how many of these are read-only clones from an internal monorepo versus how many are separate projects that have actually been open-sourced, but the latter is more than zero.

[-]

adastra22 8 months ago

I've never worked for Google. But my understanding is that their deployment code doesn't really on any of those 2700+ repositories. I believe it doesn't rely on anything that isn't checked into the monorepo.

If they spin out an open-source project, they either (1) continue development internally and (maybe) do periodic releases by exporting that directory from the monorepo; or (2) allow development to occur externally and periodically import changes when upgrading the version used by the monorepo.

Either way, the point is that to build any Google service, you checkout the monorepo and type whatever their equivalent of 'make' is. No external dependencies.

toast0 8 months ago

> What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?

In my mind, a mono repo is one company, one (or a very small number of) source code repository. When I started working at Yahoo, everything was in CVS on one hostname (backed by a NetApp Filer), that was a mono repo; when you got into the weeds, there were actually a couple separate repo; Yahoo Japan had a separate repo, and DNS and prod ops had a separate repo, and a couple more, but mostly everything in one; just organized by directories so most people only checked out part of the repo, because not many people needed to look at all the code (or had disk space for either). That evolved into separate SVN repos for each group that wanted to move to SVN. I assume they moved to git at some point after I left.

Same deal when I was at Whatsapp. When I started, we had one SVN repo that everyone shared --- that was a mono repo; when we moved to git, each client had their own repo, server had a repo, and there was a common repo for docs and other things that needed sharing. Facebook had a lot of repos, but one repo was very large and requires 'monorepo' style tools. As a first step for monorepo style tools; in a large company with a large git repo; you need something to sequence commits, because otherwise everyone gets stuck on the git pull; git push loop until they manage to win the race. This wasn't an issue with a large CVS repo, because commits are file based, and while you might have conflicts within your team, you didn't need a global lock; I don't remember having issues with it in SVN either, but my memory is fuzzy and the whole company SVN repo I had was a lot smaller than the whole company CVS repo.

Maybe, I'd say a monorepo is a large repo where the majority of users/developers aren't going to need or want most of the tree.

> Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project

I don't use submodules often, and I'm not sure if some of the irritations have been fixed, but in my use I run into two things: a) git clone requires additional work to get a full checkout of all the submodules; b) git pull requires additional work to update the submodules. I'm sure there's some other issues with some git features; but I was actually fine with CVS and don't really care about git features :P

paxys 8 months ago

All the pitfalls of a monorepo can disappear with some good tooling and regular maintenance, so much so that devs may not even realize that they are using one. The actual meat of the discussion is – should you deploy the entire monorepo as one unit or as multiple (micro)services?

[-]

marcosdumay 8 months ago

That's the thing. All the pitfalls of multi-repos also disappear with good tooling and regular maintenance.

Neither one has an actual edge. Yet you can find countless articles from people talking about their experience. Take those as a hint about what kind of tooling you need, not about their comparative qualities.

8 months ago

[deleted]

drbojingle 8 months ago

a lot of comments here seem to think that mono-repo has to mean something about deployment. I just don't want to have to run git fetch and 5 different repos to get everything I need and that's good enough reason for me to use one.

[-]

klabb3 8 months ago

Not to mention when you need to make cross repo changes in development, and you have to set up a whole web of local repointing in package manifests. Repo is a hard boundary. Sometimes you need one. But to create a boundary when you don’t have to, on things that are deeply interconnected and have nowhere near a stable api? Utter madness imo.

__MatrixMan__ 8 months ago

Presumably your language has a package manager which can do that for you? (or you could distribute it as an image). I guess you do have to decide which version you should depend on, but putting that power in your hands is sort of the point.

h1fra 8 months ago

I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.

In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.

IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.

[-]

fragmede 8 months ago

> you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project)

that's not a monorepo!

Unless the singular "project" is stuff our company ships, the problem you have is of impedance mismatch between the projects, which is the problem that an actual monorepo solves. for swe's on individual projects who will never have the problem of having to ship a commit on all the repos at the "same" time, yeah that seems fine, and for them it is. the problem comes as a distributed systems engineer where, for whatever reason, many or all the repos need to be shipped at the ~same time. or worse - A needs to ship before B which needs ship before C but that needs to ship before A, and you have to unwind that before actually being able to ship the change.

[-]

h1fra 8 months ago

my implicit point was that most people don't want monorepo; when they talk about monorepo they talk about consolidating a project together, that can span many different repos and technology.

I'm not convinced that making completely different teams work on the same repo is making things better. In the case of cascading dependencies what usually works better than a convoluted technical solution is communication.

hk1337 8 months ago

> that's not a monorepo!

Sure it is! It's just not the ideal use case for a monorepo which is why people say they don't like monorepos.

[-]

vander_elst 8 months ago

"one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library."

They are literally saying that multiple repos should be used, also for sharing the code, this is not monorepo, these are different repos.

klabb3 8 months ago

> when a company puts completely different projects together inside a single repo.

This is google3. It was absolutely loved by the majority of devs. If you change a line of code, all dependencies are type checked and tested and also a bunch of other things. It keeps so many versioning issues out.

One of the big reasons why the JS ecosystem is so fragmented compared to Go or even Rust is the leftpad-sized packages with 10 config files that are out of date. Not to mention our friend peerDependencies, who needs no introduction.

DrBazza 8 months ago

It's kind of funny that the wisdom in software development is to program against interfaces and not implementations.

And yet here we are with monorepos, doing the big-ball of mud approach.

I've worked on several multi-repo systems and several monorepos. I have a weak preference for monorepos for some of the reasons given, especially the spread of pull requests, but that's almost a 'code smell' in some respects.

Monorepos that I've contributed to that have worked well: mostly one language (but not always), a single top-most build command that builds everything, complete coverage by tests in all dimensions, and the repo has tooling around it (ensure code coverage on check in, and so on).

Monorepos that I've contributed to that haven't: opposites of the previous points.

Multi-repos that have worked well: well abstracted and isolated, some sort of artefact repository (nexus, jfrog, whatever) as the layer between repos, clear separation of concerns.

Multi-repos that have not worked well: again, opposites of the previous, including git submodules (please, just don't), code duplication, fragile dependencies where changing any repo meant all had to change.

magicalhippo 8 months ago

We're transitioning from a SVN monorepo to Git. We've considered doing a kind of best-of-both-worlds approach.

Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.

Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.

That is, tightly coupled code goes into the monorepo, the rest in separate repos.

Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.

[-]

dezgeg 8 months ago

In my experience this turns to nightmare when (not if, when) there is need to make changes to the libraries and app at the same time. Especially with libraries it's often necessary to create a client for an API at the same time to really know that the interface is any good.

[-]

magicalhippo 8 months ago

The idea is that the libraries we put in nuget are really non-project-specific. We'll use nuget to manage library versions rather than git submodules, so hopefully they can live fine in a separate repo.

So updating them at the same time shouldn't be a huge deal, we just make the change in the library, publish the nuget package, and then bump the version number in the downstream projects that need the change.

Ideally changes to these libraries should be relatively limited.

For things that are intertwined, like an API client alongside the API provider and more project-specific libraries, we'll keep those together in the same repo.

If this is what you're thinking of, I'd be interested in hearing more about your negative experiences with such a setup.

adastra22 8 months ago

...just curious, what's your line of work where you're still using SVN?

[-]

magicalhippo 8 months ago

Company makes B2B software. Our main product is a traditional Win32 application with over 25 year old code in production, though most of it is much more recent.

SVN worked well for us, being a relatively small team until recently, so no real need to change until now.

The primary driver for the move to Git has been compliance. We just can't have on-prem servers with critical things like code anymore, and there's effectively just one cloud-based SVN offering that had ISO27001 etc.

So incentive to move to Git just got a lot stronger.

KaiserPro 8 months ago

Monorepos have their advantages, as pointed out, one place to review, one place to merge.

But it can also breed instability, as you can upgrade other people's stuff without them being aware.

There are ways around this, which involve having a local module store, and building with named versions. Very similar to a bunch of disparate repos, but without getting lost in github (github's discoverability was always far inferior to gitlab)

However it has its draw backs namely that people can hold out on older versions than you want to support.

[-]

dkarl 8 months ago

> But it can also breed instability, as you can upgrade other people's stuff without them being aware

This is why Google embraced the principle that if somebody breaks your code without breaking your tests, it's your fault for not writing better tests. (This is sometimes known as the Beyonce rule: if you liked it, you should have put a test on it.)

You need the ability to upgrade dependencies in a hands-off way even if you don't have a monorepo, though, because you need to be able to apply security updates without scheduling dev work every time. You shouldn't need a careful informed eye to tell if upgrades broke your code. You should be able to trust your tests.

[-]

KaiserPro 8 months ago

I mean yes, that is an approach.

However that only really works if people are willing to put effort into tests. It also surmises that people are able to accuratly and simply navigate dependencies.

The issue is that monorepos make it trivial to add dependencies, to the point where if I use a library to get access to our S3-like object storage system, it ends up pulling a massive chain of deps culminating in building caffe binaries (yes, as in the ML framework.)

I cannot possibly verify that massive dependency chain, so putting a test on some part which fails is an exercise in madness.

It requires a culture of engineering discipline that I have yet to see at scale.

mcnichol 8 months ago

Monolith vs Microservice argument all over again.

Tradeoffs for mono are drivers of micro and vice versa.

Looking at the GitHub insights it becomes pretty clear there are about two key devs that commit or merge in PRs to main. I'm guessing this is also whom the code reviews happen etc. Comparing itself to Linux where the number of recurring contributors are more by orders of magnitude just reeks of inexperience. I'm being tough with my words because at face value, the monorepo argument works but it ends in code-spaghetti and heartache when things like developer succession, corporate strategy, market conditions throw wrenches in the gears.

Not for nothing I think a monorepo is perfectly fine when you can hold the dependency graph (that you have influence over) in your head.

Maybe there's a bit of /rant in this because I'm tired of hearing the same problem with solutions that are spun as novel ideas when it's really just: "Pre-optimization is the root of all evil."

You don't need to justify using a monorepo if you are small or close to single threaded in sending stuff into main. It's like a dev telling me: "I didn't add any tests to this and let me explain why..."

The explanation is the admission in my mind but maybe I'm reading into it too much.

Article is nicely written and an enjoyable read but the arguments don't have enough strength to justify. You are using a monorepo, that's okay. Until it's not, that's okay too.

stillbourne 8 months ago

I like to use the monorepo tools without the monorepo repo. If that makes any god damn sense. I use NX at my job and the monorepo was getting out of hand, 6 hour pipeline builds, 2 hours testing, etc. So I broke the repo into smaller pieces. This wouldn't have been possible if I wasn't already using the monorepo tools universally through the project but it ended up working well.

ofrzeta 8 months ago

I am not convinced Git submodules are so bad. Obviously it's a bit more work than a monorepo but it actually works quite nice to have the parent repo pin the commits of the submodules. So you can just update, say, the "frontend" when fixing a bug, update the submodule and commit the hash to the parent. lgtm.

yen223 8 months ago

> Moving to a monorepo didn't change much, and what minor changes it made have been positive.

It's pretty refreshing to see an experience report whose conclusion is "not much has changed", even though in practice that's the most common result for any kind of process change.

memsom 8 months ago

monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.

As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..

Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.

[-]

vander_elst 8 months ago

Working on a monorepo where we have hundreds (possibly thousands) of projects each with a different version and release schedule. It actually works quite well, the dependencies are always in a good state, it's easy to see the ramifications of a change and to reuse common components.

[-]

memsom 8 months ago

Good for you. For us, because we have multiple projects going on, pulling the code in different ways, code that runs on embedded, code that runs in the cloud, desktop apps (real ones written in C++ and .Net, not glorified web apps), code that is customer facing, code used by third parties for integrating our products, no - it just doesn’t work. The embedded shares a core with other levels, and we support multiple embedded platforms (bare metal) and OS (Windows, Linux, Android, iOS) and also have stuff that runs in Amazon/Azure cloud platform. You might be fine, but when you hit critical mass and you have very complicated commercial concerns, it doesn’t work well.

[-]

tomtheelder 8 months ago

I mean it works for Google. Not saying that's a reason to go monorepo, but it at least suggests that it can work for a very large org with very diverse software.

I really don't see why anything you describe would be an issue at all for a monorepo.

munksbeer 8 months ago

No offense but I think you're doing monorepos wrong. We have more than 100 applications living in our monorepo. They share common core code, some common signals, common utility libs, and all of them share the same build.

We release everything weekly, and some things much more frequently.

If your testing is good enough, I don't see what the issue is?

[-]

bluGill 8 months ago

> If your testing is good enough, I don't see what the issue is?

Your testing isn't good enough. I don't know who you are, what you are working on, or how much testing you do, but I will state with confidence it isn't good enough.

It might be acceptable for your current needs, but you will have bugs that escape testing - often intentional as you can't stop forever to fix all known bugs. In turn that means if anything changes in your current needs you will run into issues.

> We release everything weekly, and some things much more frequently.

This is a negative to users. When you think you will release again next so who cares about bugs it means your users see more bugs. Sure it is nice that you don't have to break open years old code anymore, but if the new stuff doesn't have anything the user wants is this really a good thing?

[-]

munksbeer 8 months ago

> Your testing isn't good enough. I don't know who you are, what you are working on, or how much testing you do, but I will state with confidence it isn't good enough.

Yes, that is true. No amount of testing can prevent bugs in a complex enough project.

But this is no different in a monorepo or multirepo.

I apologise but I don't think I've understood your point.

[-]

bluGill 8 months ago

My point is that releasing weekly because your testing is good enough is not possible. It might be an acceptable compromise for your problem, but there are others who that isn't enough.

I work on software that is safety critical, as such we require all releases to go through several weeks of manual testing - this is after a very large automated test suite. Realistically it is months of manual testing, but if programmers were perfect it could be done in weeks.

[-]

munksbeer 8 months ago

Ok, but how does that relate to the original point about monorepos?

[-]

bluGill 8 months ago

Your point about mono repos and your point about testing have no relation to each other as far as I can see.

memsom 8 months ago

No offence, but you might be a little confused by how complex your actual delivery is. That sounds simple. That sounds like it has a clear roadmap. When you don’t, and you have very agile development that pivots quickly and demands a lot of change concurrently for releases that have very different goals, it is not possible to make all your ducks sit in a row. Monorepos suck in that situation. The dependency graph is so complex it will make your head hurt. And all the streams need to converge in to the main dev branch at some point, which causes huge bottlenecks.

[-]

tomtheelder 8 months ago

The dependency graph is no different for a monorepo vs a polyrepo. It's just a question of how those dependencies get resolved.

[-]

memsom 8 months ago

With multiple repos, we can get a single deliverable for each project requirement, manage convergence and still maintain a main branch for each repo. The artifacts can be packaged and consumed by the other dependencies that are changing. It stops monolithic development. It library A needs to branches of change, we don’t need to create 2 branches for the entire product line. You literally only change the parts you are currently working on, and insignificant changes don’t get sucked in by proxy.

dboreham 8 months ago

The meta-syndrome here is: one person fixing a problem they have, and thereby making problems other people have worse. Often first person doesn't have a good awareness of the full melange of problems and participants.

bobim 8 months ago

Started to use a monorepo + worktrees to keep related but separated developments all together with different checkouts. Anybody else on the same path?

indulona 8 months ago

monorepo is the way to go if the code portrays to the entire application as a whole. otherwise, if there are applications that are not connected in any way, it makes absolutely no sense to pull them together. it's really not a rocket science. some people just prefer to ice skate uphill, i guess, and have to make simple things complicated.

oslem 8 months ago

Alright everyone, we’ve trained for this. Grab your popcorn and get a good seat. It’s the return of the of great mono/poly repo debate.

Abismith 8 months ago

[dead]