About monorepo

Sun, Dec 25, 2022

It’s been something like a year and half I’ve been CTO at Tessan. When I joined, I’ve found the same problems that plague engineering teams:

lack of delivery practices
ship branches instead of versions (using kind of gitflow - that was soooo scary)
no testing but manual testing
no metrics in production
no cadence to deliver feature
no way to enable a feature in production once mature
manual deployments

As most company, Tessan is using a multi-repositories strategy. Like a vast majority of companies, this is just the situation that has been reached without much thoughts: start with a project, start a second, discover builds are complicated - split into 2nd repository,… rince again That’s where most companies are. No strategy, no thinking about what can be done to improve things or improve delivery.

I must confess, we are still using multiple-repositories. But things have been largely improved to gear towards a mono-repository.

What has been implemented is a gitops strategy:

repositories have dedicated build and generates their own artifacts: Docker images or zip archive (we have part of our infrastructure that’s running on-premise in the field). Artifacts are archived and tracked using a git tag or branch name
everything is deployed using Terraform with strict versionning. Targets are Kubernetes and on-premise infrastructure
all deployments happens using GitHub actions within protected environments hence we are using the super expensive GitHub Enterprise just to only use protection rules (this badly hurts)
we have more deployment pieces since splitting the big server-side monolith was a requirement to move faster across teams (yes, we went for micro-services using fbus)

Basically, this means we have more repositories than before 🤪. But we are in good shape to switch to mono-repository now. Everything is running quite smooth. So why move away from that model?

Well, feature development is a pain in the neck. Most of our development implies modifying several applications from front-end to back-end while changing database models from time to time. This is difficult for most devs to grasp:

isolate from main branch (especially when several repositories are impacted) the time feature stabilizes
understand impacts for testing
understand impacts for a release to ensure smooth communication with support team
do not miss something when deploying to test environment (we have micro-services again)
get the correct merge period to lower impacts

From an operational point of view, there are also several things hard to track:

reviews are complicated: usually this spans several repositories and it’s a hell to understand what’s going on
tests are underestimated (due to incremental impacts)
communication is impaired has several repositories have to be tracked

And from a dev perspective, it’s rather not good:

no motivation for changes/refactoring across repositories
lack of understanding how things keep working despite partially released (nullables, feature flags…)
no opportunity to learn by reading more code

For at least one thing, I’m a strong advocate for mono-repository: atomic feature implementation.

But if you think going mono-repository is easy, you are totally wrong. Single app per repository (aka multiple-repositories) is easy to do: setup sources, build and generate artifact on changes. Done.

When using mono-repository, you will hit the wall for sure: time to build the applications and noise:

Most of the time, there is no need to rebuild everything. We only need to rebuild and release what has changed. Shipping the whole platform is a non-sense.
Noise is a clear problem has looking at history is not so funny. Hopefully, Meta has release a tool to understand what’s going on Sapling. I’ve not tested it, but maybe this can help inthe future. To be investigated.

So what to do ? Well, your mono-repo must have tools to ensure it’s fast, optimize the build and provide auditing features:

identify what has changed (a new commit, a new branch)
build only what has changed - and think about changes in libraries up to deployment
delivery what has changed - generate a release note for changes

A mono-repository requires much more work to setup than multi-repositories. But benefits are tremendous.

I learned about mono-repository (or at least unified view of mult-repositories) at Criteo: that was the MOAB (Mother Of All Builds). When I left, I decided to create full-build to help D-Edge move faster engineering side. full-build is not much maintained (publicly speaking) but it lives under various names today (no more public). I’ve considered doing a public v2 as I’m not really satisfied with current state of affair.

Anyway, there are several tools on the market:

Bazel (Google) / Buckbuild (Meta): probably the top offer to consider but requires dedicated teams - does not really fit the startup/mid-company. Lack reuse of existing project metadata
Turborepo: javascript only
NX Build: nice plugin support but javascript builds are so messy. Also, does not provide proper isolation, this leaks everywhere. It works ok with lots of sweats

But as I’m saying, I’m not really satisfied. What I’m looking for:

be explicit with projects declaration: no magic
use most of metadata from projects (npm, .net projects, maven…) to get dependencies
leverage eco-systems instead of relying on plugins
ensure strong projects isolation - all paths must be relative to project, not workspace
support for explicit tasks

All in all, I think I will go for full-build v2 😃 I just need a tool that is no brainer and definitively open-source.