Facebook takes a DevX-first approach to growing and evolving their monolithic repo. The repo is responsible for web pages and web servers, leveraging signed, checked, and highly-usable generated code to fulfill all integration and deployment requirements. Generated code offers daily benefits to Facebook’s developers, like auto-resolution of merge conflicts, script verification checkpoints, and time-saving incremental code generation only for files that were changed. Yet as code generation scaled to accommodate more developers, their pain points–like slower processes and broken code sneaking in–also grew in scale. This led the DevX team to take a step back. Learn how their renewed focus on DevX led to improving their existing frameworks and internal language used, as well as creating smart build caches to speed things up.
Get an overview of how Facebook (Meta) leveraged code generation to its maximum potential before uncovering its DevX limitations. Learn about the steps the Developer Experience team took to support code gen as a scalable benefit for developers, how it eventually evolved to expose some problematic experiences, and what the Meta team has done recently to find alternatives to code generation.
Affan Hussain works as a Software Engineer on the Developer Experience team at Facebook (Meta). He loves to help other developers move faster and achieve their goals.

Gradle Enterprise customers use the Gradle Enterprise Performance and Trends dashboards to identify and monitor performance bottlenecks in the build/test process, including code generation and annotation processors. You can learn more about these features in the free instructor-led Build Cache deep-dive training.
Check out these resources on keeping builds fast with Gradle Enterprise:
- How to leverage Build Scan™ to keep builds fast.
- Sign up for our free training class, Build Cache Deep Dive to learn more how you can monitor the impact of code generation on build performance.
- Watch a 5-minute video on how to Detect and Fix Performance Regressions using Gradle Enterprise.
Affan: Good afternoon, everyone. I hope you had an enjoyable lunch.
You're nice and full and have no more distractions. And you can just listen to
me speak about generated code for however long. Just quick upfront, I thought
I'm going to be giving a lightning talk today. And so I planned for about an 8
to 10 minute talk, but now I have 30 minutes, so you'll get a little bit of
extra details. So but bear with me, some of it might be a little bit more
polished than other parts. We're just here to have fun, though. A little bit
about me. I'm on the developer experience team at Meta. I've been here for about
three years and have worked on a variety of systems from continuous integration
to this code generation stuff that I'm about talk to you about today. But a
little bit of background before we get into this story, I want to give you a
little bit of background about how we do development at Meta. I'm going to be
focusing on the monorepo that hosts our website and web server deployments. This
repository is mainly JavaScript and hack code. Being a variant of PHP that we've
developed in house. In this repository, we really like to take a developer
experience first approach to how we grow and evolve the repository. So we do a
few interesting things. One, we don't have a build, hack of an interpreter
language, JS as an interpreter language.
We don't like to have to compile code to be able to run things. We believe you
check out the repo, you start using it, nothing else. We don't have any explicit
import or explicit dependencies in our hack code. So you want to use some piece
of code, you can just go ahead and use it. You don't need to manage this giant
list of dependencies at the top of all your files. We also have a monolithic
release. No individual teams, products or anything else needs to configure their
continuous integration or their deployments. You want to write a new product,
you hop into the repo, you start writing code, it'll get out of prod whenever
you merge a code into the repo. I also want to clarify what I mean by generated
code. We're really talking about the story. So when I'm about generated code, I
mean generated source code. Code that you can imagine writing yourself except in
this case it was generated by some script or a command that you ran in the
terminal. We also do a couple of special things where, one, we sign the
generated code, we take the hash of the content and we save that. So we know
when users try to modify the generated code themselves. We don't want them doing
this. If it's generated, we assume it's been generated correctly. Don't go and
touch the generated code that has room for error. And the last interesting thing
I think we do is, we check in the generated code into the repository. So when
you check out the repo, you don't need to rerun the code generation command.
It's already been generated for you and checked in. You just hit the ground
running.
So generated code is typically a solution used by framework developers to
improve the developer experience for their framework. I think this is best
explained with an actual example. So I'll go with something simple with like
request routing, you get a web request and you want to figure out which class to
run the code for that request. Pretty common problem that we've all run into
different frameworks out there, handle it differently. At Meta, we have a
framework where you define a single class, you define the root for that class
and as well as some other things about parameters, etc. You run the command,
this code generation command does a few things. One, it'll add your route to the
URI map. This is a giant generated file that hosts all the routes that people
have set up and mapped them to the classes that need to run that code for that
request. It also generates some type classes for other people around the code
base to interact with your route in a type-safe way. And so we have the solution
of the generated code we discovered years and years ago and we started to use it
for a lot of things. We started generating types between languages. So for
example, Thrift or Protobuf, which know a very common thing, you want to share
some schema between two languages. We started doing other interesting things
like generating tests. You integrate it into some framework. We run a command
and now you have a baseline set of tests to guarantee correctness for your
system. That's pretty awesome. We do things like this kind of pseudo cache
precomputation with our generated code. That is, for example, the URI maps you
could imagine. We calculate them on the spot for each request. You go through
the entire code base and find it. But we store it as code. You could do things
like find all subclasses of this class, instantiate them, call this function,
and then store the results as code. That way you don't have to do it at request
time. And so what we really noticed here was people started really, really
loving this generated code solution. They started using it for everything. In
particular, we have big frameworks and small frameworks. So big frameworks like
the URI Maps, we have an ORM framework that uses generated code, a logging
framework to uses generated code. But the smaller, more product-specific thing
also began to copy these solutions because that's what the big frameworks do.
That's what's approachable. That's what people know. We should do that same
thing. And so people begin further using generated code, expanding on it,
thinking like, how do I architect my code to use generated code? And we the
developer experience team see okay, people love this and the experience is not
great. We need to support it. We need to scale it. And so we really worked on
two things how people, the average developer interacts with generated code and
how people create new generated code. So when it comes to interacting with
generated source code, there's a few things people need to keep in mind.
One is when you have certain files that are like really big and hot, they can
get modified very often you run into lots of merge conflicts, that people having
merge conflict in generated code. No one likes merge conflicts. We can all
resolve those for you now. You know, you run into merge conflict. You run the
script. It happened in the background. Awesome. Cool. Sometimes people were
forgetting to run the code generation scripts, so we start adding checks in our
continuous integration. Did you run the correct code, run the correct command?
As more and more command were being added, people were like "there's too many
commands. I don't know which one to run." So we created a one command. It looked
at what files changed and it will figure out the correct sub-command to run. And
we also started making it easier to write and create new generated code. So for
these framework developers, how do you make sure they're creating a good
experience by default, believe it or not? Creating generated code and writing
the script and sub commands is really hard. I mentioned before how we have no
explicit imports or dependencies. Well, that comes with a pretty big trade off
when someone, let's say, deletes a symbol or deletes a class or depends on
something in some weird way. How do you know when to regenerate your code? So we
noticed a lot of people were making bugs and went to regenerate their code. And
so we built a framework that simplified this for them. We also did things like
implementing incremental code generation. So a lot of these sub commands would
look at this entire giant repo, find all classes or subclasses of some type, do
some kind of computation, and those are kind of slow. And they we're like, okay,
what if you looked at just the files that were changed, generate the code just
for these small bits, and that help speed up the code generation process. And so
we keep going and going and going. Until I head to the back button, my bad. Yes.
Until we reach what we read a couple of years ago, which is generated code has
become a daily part of our developers lives every single day. They need to run
code generation multiple times per day and it is still the biggest pain point
for our developers. We get common complaints. It takes so long to regenerate the
code. We have this incremental code generation we've made it relatively bug
free, but people are still giving complaints. Broken code is getting into the
main line, I talked about before merge conflict and how we ought to resolve
them. That ended up appearing like your rebasing diffs were being really slow
and so we took a step back. We looked at the problem and we were like, okay,
generated code isn't scaling. This is not the correct solution for a lot of our
problems. Can we do a better job? And so we figured out a few things.
One, a lot of people were using generated code because there were some
deficiencies in our other frameworks I mentioned before with how we have
generated tests. Well, why are people generating tests that follow some weird
pattern? We can improve the testing framework. You can write one test, define
how this test will work for all of these different cases and then not have to
generate each individual case. And then the infrastructure can handle checking
individual classes and cases of that test. I mentioned before how we have these
kind of like pseudo caches, we people running some kind of precomputation and
then saving it as code. Well, why can't there be an actual cache? And so we
built this really smart caching system that watches the file system. And as
you're changing your code, it's regenerating the data and storing it in memory.
And so we no longer need to materialize this data as code in the repo. It's just
there by default in the background. No commands to run. And perhaps the most
interesting thing we did because we own half the language, we were able to
improve the language to make it more expressive. Generated code we noticed, was
actually a sign that the language itself was not expressive enough. We saw
people were doing things where dependent types would be really useful, for
example, where the output type of a function would change based on the input
type of the function in a way that's not handled by like normal generics. So we
created this thing called enum classes. You can find it, find out about it
later. I can talk to you about it later. And we work with the hack team to
implement this feature and roll it out to the repo and it helped us eliminate a
lot of generated code and this is still an ongoing project for us. We've been
rolling back or deleting generated code for about 2 to 3 years now and we've
made a lot of progress and we have a lot of progress still to be made. There are
a couple of few key takeaways that we learned through this multi-year process
that I think are particularly useful. One, small frameworks and new frameworks
are going to copy what the big guys are doing, right? You kind of have this
novelty budget, I think, compiler and language people think about, which is if
you do things that are too new, people aren't going to adopt your framework
because it's too different. They don't want to spend that extra time learning.
So people will do what's familiar, which in this case was generated code. And as
the second point mentioned, it wasn't the best solution for a lot of people.
This is hard to write, difficult to maintain, but it really powerful tool and
people were using it when they had alternatives such as building new features
into their frameworks or architecting their code in a different way. And so
because we made generative code so approachable, people opted for the more
difficult, complex solution, even though it wasn't the correct one. And lastly,
once our generated code, which was a first and actually pretty big win for
developer experience, once it grew into a problem, we had to step back,
reevaluate and figure out for these individual problems that people are using
this kind of silver bullet approach for. Can we create a more optimized or
optimal solution for these specific problems and create a better developer
experience that way? Thank you. I will be taking some questions.