Monorepos for true CI
On the 15th and 16th of April CITCON Europe took place in Cluj-Napoca (Transylvania). CITCON is an open spaces conference. The agenda is made up by the people attending the conference. Because of this there are always a couple of nice takeaways.
This year Ivan Moore (@ivanrmoore) made a claim that you can not do CI without using monorepos. A monorepo is simply one repository where all source code ends up. This is contrary to a setup with (for example) Git source control where you have a single repository per module. The idea of having a single (big) repository with all code seems strange. After some thorough discussion the merits of monorepos were clear to me.
Consider a scenario with 2 services which depend on a module with some common/shared code. Each service has to manage the version of the common module itself – hence the latest changes made to the common module are not integrated with the services. Now you bump the version of the common module.
Not to mention the bookkeeping you’ll have to do when you need to
- Develop some generic functionality in service 1
- Once it works for service 1, move the code to the common repo
- Build release the repo
- Update the version of the common module in service 1
- Eventually, update the version of the common module in service 2
If step 5 goes bad, we’ll have to fix the common module, leading to another release. Well, you get the drift. Module 2 is integrating too late.
If service 1 and 2 and the common module were in the same repo, life becomes simpler: there’s only one version of the common module to take into account: the one on your current branch. This implies that the build will break instantly when an incompatible change has been checked in ( == fast feedback, yay!). Furthermore, it will avoid code duplication: a disjoint common module is everybody’s responsibility => no one will take full ownership. At some point it becomes easier to create functionality and not move it to the common module, but replicate it in service 2.
Having everything in one place makes it easier to refactor code, find duplicated code and even perform structural changes over all modules in your repo.
One very important feature monorepos offer is repeatability: when you rely on snapshot releases the results may vary over time depending on the current snapshot version. Even if you tag all builds, it eats up lots of resources where a majority of the binaries are never ever used.
With monorepos, the version (state) of the application is determined by a commit hash/revision number.
It is worth noting that companies like Google and Facebook take this to the extreme: all source code for the whole company ends up in one repo including third party code. I would assume it’s a good start to put all code for your current project in one repo (start small), and leave external dependency resolution as is. This would promote bigger refactorings and improve code reuse while saving you the version upgrade hassle.
So what about the downsides to this approach:
- The repository can become quite large. This only slows down the initial checkout . It would probably take more time to check out a bunch of repo’s, though.
- Will the repository get slow? Well, the Linux kernel is maintained in one repo. By the time your code base is that many lines of code (currently about 17M), you can decide if it’s useful to split up the code base or invest in tooling.
- Everybody can change all code. Of course! That’s why you have source control to start with. The important thing is that you can track the code changes and see who made them.
- Everybody is pushing all the code to one repo, hence you’ll have to update/pull/rebase all the time. In practice this does not prove to be a real problem: if you’re using pull requests, the number of commits on the master branch are not that big. If you’re doing trunk based development (👍) you’re integrating everything all the time. There is no overlap between code committed for service 1 and service 2, i.e. no merge conflicts.
- You’ll have to set up your CI builds in a smart way, so that not everything is built on every commit. Modern build tools can deal with that. If not, you’ll have to write a script to perform that task, which is not a problem, since it will be committed in the same repository and therefore is always up to date with the rest of the code.
All in all I am pretty convinced that the advantages outweigh the downsides. In retrospect, considering the projects I did over the last couple of years, every single one of them would benefit from using a monorepo, instead of a per-module-repo.
Aside: Modern IDE’s like IntelliJ and Eclipse can sort out these types of source code dependencies. Changes in the common module could cause errors in the service module(s). Although this solves the dependencies to some extend for one developer, it's not a solution a complete team can rely on.