Monorepo is a solution to structure and maintenance problems of a project, usually when it can be fragmented into smaller projects due to size and complexity. But, it can also be by simply dividing it into smaller parts, called packages. For example, Babel is one of the libs popular which is structured in monorepo.
This article explains how a project works in this framework. If monorepo seems to be confusing so far, then stay here with me. My goal is to make the subject simple so that you can understand the topic without needing to know a lot about JavaScript and also how this is solved with Lerna and Yarn Workspaces.
Difference from a common repository to a monorepo
In a monorepo, the functionality is more divided and not just imported from one file to another. How to get this encapsulation varies between each language.
In JavaScript this can be accomplished in the following way: make one file import another as if it was a published npm dependency —but actually it's inside the repository itself.
To explain the monorepo in JavaScript, let's use an example. Consider this repository with multiply and power functions; we want to convert it to a monorepo:
// index.js
function multiply(a, b) {
return a * b;
}
function power(base, exponent) {
var result = base;
for (var i = 1; i < exponent; i++) {
result = multiply(base, result);
}
return result;
}
multiply(2, 3) // 6
power(2, 3) // 8
// package.json
{
"name": "monorepo"
}
$ tree
.
├── index.js
└── package.json
But first, let's understand an important concept: node_modules.
What a package in node_modules looks like?
The node_modules directory in any frontend project contains the dependency packages. They are managed by the npm or yarn (we don't normally make manual changes to this directory).
A package in node_modules needs these two files at least:
- index.js, with JavaScript code
- package.json, with the field main pointing to index.js (can use another name, but by default is usually a index.js file)
Creating a package in node_modules
To use your own package, it is not necessary to publish to npm and install with the npm install command. If we manually create in node_modules a directory, the import of this dependency is the same:
var multiply = require('multiply')
function power(base, exponent) {
var result = base;
for (var i = 1; i < exponent; i++) {
result = multiply(base, result);
}
return result;
}
multiply(2, 3) // 6
power(2, 3) // 8
function multiply(a, b) {
return a * b;
}
module.exports = multiply;
{
"main": "index.js"
}
{
"name": "monorepo"
}
$ tree
.
├── index.js
├── node_modules
│ └── multiply
│ ├── index.js
│ └── package.json
└── package.json
Although it works, it is not feasible to do this in the project because:
- Normally we ignore the node_modules directory, not updating in git
- Has low visibility that there is something of the project there
There is a way to keep the package code out of node_modules, but still as if it were there, which is by using a symbolic link.
Symbolic link
This link is interpreted by the operating system. It points one directory to another (or file, that works too), both of which have the same content at the end.
From the result above, we can move the package out of the node_modules and create a symbolic link:
$ mv node_modules/multiply ./multiply
$ ln -s ../multiply node_modules/multiply
$ tree
.
├── index.js
├── multiply
│ ├── multiply.js
│ └── package.json
├── node_modules
│ └── multiply -> ../multiply
└── package.json
There are still problems with this approach:
- The link must be created in each new clone of the repository;
- It is not visible that there is a package available to import, such as dependencies registered in package.json
Installing a local package with npm
It is possible to achieve the same result with npm itself. However, we need to add some fields in package.json, name and version. Thus, the local package is closer than it would be to a package published on npm.
{
"name": "multiply",
"main": "multiply.js",
"version": "1.0.0"
}
{
"name": "monorepo",
"dependencies": {
"multiply": "file:multiply"
}
}
$ npm install multiply
npm WARN monorepo@ No description
npm WARN monorepo@ No repository field.
npm WARN monorepo@ No license field.
+ multiply@1.0.0
added 1 package and audited 1 package in 0.494s
found 0 vulnerabilities
$ tree
.
├── index.js
├── multiply
│ ├── index.js
│ └── package.json
├── node_modules
│ └── multiply -> ../multiply
├── package-lock.json
└── package.json
Note that it adds the package as a dependency on package.json.
Why isn't a public package installed instead, rather than the local one? This is because when running the npm command, it sees that a directory with the same name already exists. The presence of package.json inside identifies it as a package. Then npm adds this dependency with symlink.
This is only for npm, but there is an equivalent for yarn. With yarn link we get a similar result, with few differences. The directory is available to be "linked" to any repository. The package.json is not updated as there is no installation in this case. For the intended purpose, yarn link is not a good option.
So, installing local packages with npm solves the problems from before. So why use Lerna or Yarn Workspaces?
Lerna
In the above example, we only have one package in the repository. And if we had several, how could we control it all? And if they are published on npm, how is the dependency between them? And if I modify a package, how do I know which other packages depend on it to update too?
These and other difficulties arise in larger projects. The goal of Lerna is to optimize the flow of monorepo management. For example:
- Configuration of which packages and where they are located, explicitly;
- Package versioning with different versions or all in the same version;
- Run commands on all packages, such as installing dependencies and build;
- Publish packages to npm and create releases in git for just the modified packages (with one command!).
In practice, Lerna uses exactly the same symbolic link solution. The difference is that these, and some other features, makes managing a monorepo easier.
Why not use Lerna?
In my experience, I found only one reason not to use Lerna. There is a limitation when publishing packages through CI.
The lerna publish command, in addition to other functionality, is responsible for:
- Identify the packages that need to be updated;
- Allow selection of which packages will be published;
- Synchronize your dependencies — they are also affected.
However, this is interactively in command (does not run in automated deploy). There is no way to pass parameters to choose packages. Therefore, each changed package is selected to receive an update of the same version: major, minor or patch (see Semantic Versioning). This is not a problem if:
- The change is a patch;
- All packages follows the same version;
- Or the publish command doesn't run in CI (you always use in interactive mode).
A case where this would be a problem: I applied a patch to a package (patch), but I haven't published it yet. Then later, I updated the API of another package (major). When publishing, both must be updated in the version as major.
To avoid this, an alternative is to implement your own publish script, mapping dependencies just like Lerna does. See more about this issue on Lerna's GitHub.
Yarn Workspaces
Another monorepo solution, but very different from the Lerna proposal. The goal is not to improve easiness of management. In addition to symlinking packages, Yarn Workspaces organizes dependencies better — that is, the benefit is installing dependencies faster.
It is normal for many packages to use dependencies such as ESlint, TypeScript and Jest. What Yarn Workspaces does is install the common dependencies into the node_modules of the project's root (instead of installing in every package), including the project's own local packages that are symlinked, resulting in a monorepo.
This works due to the behavior of Node. When a package is imported in the file, it doesn't try to fetch that package only in node_modules. If can't find it there, it goes back to the previous directory and try to find the package in another node_modules there. If can't find it there too, keep coming back until it finds it. In a monorepo, going back two levels there is a node_modules from the root of the project. In the Yarn Workspaces, there are symlinks and packages in common in the project (see here for a post explaining more about how it works).
Why not use Yarn Workspaces?
There are some downsides. The first and most obvious is that the project needs to use yarn instead of npm. Another is a risk of breaking the project without realizing it.
For example, every dependency on the node_modules of the project root can be imported into the code of a package, even if it is not listed in your package.json. This leads to the problem where in your local instance of the repository everything works, but in CI, or another clone of the project, it can break.
The problem is even more subtle if the dependency is listed in package.json, but it is another version. Leading to complicated debugging issues.
Lerna + Yarn Workspaces
As they have different purposes, it is possible to use both solutions at the same time. The result is a monorepo more organized and faster to work. In usability, the difference is only in the moment of initial installation of the dependencies.
To see more about using Yarn Workspaces, read this documentation.
Conclusion
Monorepo in JavaScript projects are symbolic links between packages. This allows them to be imported as dependencies, just as if they were packages published on npm.
It is very common to use Lerna in monorepo because it gives you more agility in managing the repository.
Yarn Workspaces I don't see it being used much compared to Lerna. Its biggest benefit is the speed of installing dependencies.
If you are more curious about how a monorepo works, I recommend checking the Lerna website: https://lerna.js.org/.