What’s inside?
The practice of DPE prioritizes speeding up builds and tests. One of the techniques used to accomplish this is called build caching. In this talk by Gradle experts, we review strategies and tooling to make builds cacheable, which avoids re-running components of builds and tests whose inputs have not changed. We also discuss how to keep builds cacheable by catching regressions before they impact performance.
Summit Producer’s Highlight
Avoiding unnecessary work with build caching is one of the first steps towards accelerating build and test speed, but it’s not a silver bullet. Development teams need data to measure and compare inputs and outputs across builds to maximize the benefits. Watch a Gradle demo on how to optimize build caching, prevent regressions, and avoid “build cache misses”, when build cache cannot be used due to differences in various factors like version number, OS, time stamps, etc. Gradle shares how to implement build caching in local and CI builds with Gradle Build Tool and Maven. We also show how Spring Boot saved 50% of build time with Gradle Enterprise.
About Etienne
Etienne Studer joined Gradle eight years ago and has been working on Gradle Enterprise since its inception in 2015.
About Jim
Jim Hurne is a senior software engineer with 18+ years of experience who thrives on hard problems. Strong team leader who mentors others. Pragmatic learner who stays abreast of current trends and technologies in order to achieve greater technical and professional excellence. Expert in cloud technologies. Possesses broad experience in a variety of other technologies and platforms.
More information related to this topic
Gradle Enterprise Solutions for Developer Productivity Engineering

Gradle Enterprise customers use the Gradle Enterprise Build Scans and Performance dashboards to implement and tune their build cache. These tools help identify and manage build cache misses due to volatile inputs, and ensure Performance Continuity by catching regressions in cacheability before they threaten the codebase.

To keep builds fast with Gradle Enterprise, check out these resources:

Etienne Studer: All right, so good afternoon everyone, and thank you for your attendance. In Formula One, whenever a car leaves the pits and goes on the track, there are more than 300 sensors that capture more than a million data points per second. And all that data that is captured is used to optimize the performance of the car, the stability of the car, the reliability of the car. It's also used to optimize the wings, the distribution of the weight and so on. If it's a race, that data is sent back to the engineering team while the car is driving and the engineers are making changes to the car while the car is driving. So they're trying to get the peak performance under all conditions while the car is driving. And those decisions are made by the engineering team, not by the driver. The driver in his seat who could only react based on gut feelings is not making those calls. It's the engineers that have all the data pressed. And in development, productivity, engineering, we're in a similar situation. We need to capture build metrics that we can act upon. We need the data such that we can understand how is our tool chain behaving and that we can improve the behavior of the tool chain so we can make it more reliable, we can make it more stable and we can make it more performance. We also need the data to improve bottlenecks in performance.

And once we understand where they are, we can apply acceleration technologies. And so we're in this constant cycle of capture data, interpret data, make changes, run the tool chain again and again capture data and so on. It's a never ending cycle. If we look a bit more into accelerating the feedback cycle, which is what this is about today, we have several approaches how we can go about this. We can avoid work conceptually speaking by reusing work that has been done before. We can also avoid work by skipping work that is not related to the changes we made. And whenever we need to work to do work, we can paralyze that work and we can of course also make the work itself faster. Just give you one example each. If you use a calculator, it's not gonna calculate pie every time you press the pie button. They did it once, it's stored it, and now it's always reused. If you have some linking rules, you make a change to a CSS file. We don't need to rerun the linting rules that are about Java.

If you have slow scholar compilation, well you could distribute that compilation and get faster feedback cycles and maybe Kotlin released a new compiler version that is faster. That is also a way to improve the speed of your tool chain. One example below, which is not super visible with the resolution of this screen, but basically if we take something from the testing domain, if we have a test task that executes tests, and those tests haven't changed since the last run, the class pass hasn't changed and other inputs haven't changed, we can just reuse the test report and we can skip the work totally. If we have a set of tests we need to run, instead of running all the tests, we can only run those or we can choose to run only those tests that are affected by the changes that were made. And then we end up with just running a subset of tests and whatever tests are left to run, we can distribute that work across multiple agents. So just to make that a bit more concrete, now, if we look more into avoiding work by reusing outputs that have been created by previous work, the question is what is your unit of work?

Typically in Gradle or Maven, we use a task and the artifacts are stored in a so-called build cache, but we could also choose a different type of unit of work. It could be, for example, the Gradle configuration phase, which as an artifact has the task graph. And we could store that in a so-called configuration cache. But no matter what we choose as a unit of work, we have to persist it somewhere so we can reuse it on later builds. And ideally it's that we have a local cache, but we also have a remote cache. So we can also share those artifacts between different developers or different CI agents of course. Now what typically doesn't work well is to base that unit of work, or that scope of a unit of work from a commit ID or to a branch name 'cause it's very coarse-grained. And the more coarse-grained it is, the more sensitive it is to cache in validations. So what we experience, or I would say most people experience, including our customers and everybody else, if you turn on build caching, you get some savings, but typically you don't get the maximum of savings.

Sometimes people have the impression or the expectation, it's this magic bullet. I turn it on and my build times are now fantastic from one moment to the next. But oftentimes you get very significant savings already, but you can go further. But there's a little bit of investment in there. And we'll go into that in a few minutes. So we need to invest a little bit to take our build that might already be somewhat cacheable to fully cacheable. And once it is fully cacheable, we want to keep it fully cacheable. We don't want to regress. So we need to keep investing to keep it there, but it pays off. It's worth the investment, we'll have some numbers later on. To give you one concrete example.

Barely readable, but you will see the numbers up here. Spring Boot, a very active open source project, but still modest in terms of number of builds per day. There's 100 builds. There are customers, users, they have 20,000 builds a day. But even those 100 builds add up. So for every 24 hours that passed, Spring Boot is saving around 50, 55 hours in task execution day, which equals to about six and a half full-time employees. So since you had lunch yesterday and today, 53 hours were saved in task execution time, which also means savings in CPU time, savings in CI infrastructure costs and so on, besides developer time. And now imagine you're doing 100, 1000 times more build it explodes in terms of savings. So I said when you turn on build caching, typically you experience some hits, but you don't get a hit on every task where you expect it to be there.

And what are the reasons? There are multiple reasons, but I wanna mention two of them. One is you might have... You wanna consume an artifact from the cache, but it's not there anymore. It got evicted. That is one. So it got put there, but by the time you want to use it, other things have already evicted it because they were put into the cache. That's something you see, but it's not, it's typically not the main reason. The main reason are volatile inputs. What are volatile inputs? It's inputs that change between different executions of the build, but you don't really expect those changes to the inputs to have an impact on the output of that task or of that goal. So for example, timestamps, you might have some timestamps as input every time you're on the build the timestamps is different. So you will never get a cache hit on that goal or on that task. Or you might have some absolute path depending on where you run it from a different location you will get cache misses. Different operating systems you check out the project, you have different line endings that might already cost some cache misses. Or the build add some user and host information that again, creates some volatile inputs in some of the tasks or very interestingly in happening quite a lot is when you use code generators, quite a few of those don't generate deterministic output.

Some of them create random method names, other ones create random ordering of the methods. And so whoever consumes that source code meant by who, I mean the next task who consumes that source code will deal with volatile inputs. Version numbers, maybe you bump up the version number on CI whenever you build because you use the pipeline or the job ID or something, you will have volatile inputs.

So before we look at how we can tame those inputs, one thing to keep in mind is if you look at a task and or a goal and you don't have a hit, often times that task is not the culprit. The culprit is up upward, upstream. So if you do some compilation based on source code that has been generated and you see you don't have hits for the compilation, it might be due that your code generator creates non-deterministic output. So the root of the problem is not necessarily where you don't get the cache, it might also be upstream. And also very interesting with those volatile outputs in the case of the code generation where it would be an output is that it can be masked if you use build caching. Because if the inputs are the same second time you run it, it will not generate that volatile output, but it will just take it from the cache meaning the next task will also take this artifact as an input.

And because it's already cached, take the output from the cache. But the moment you make a change, even if it doesn't affect the output, you will see cache invalidation across all these tasks. So really what we also should do, and it's something we want to do in the future, is task output tracking. But even without that, just with task input tracking, we get quite far as you will see. So how can we tame build cache misses? There is training available for Gradle and Maven, it's for free. So I'm not gonna go into detail, but conceptually we can remove volatile inputs and quite often it's possible like why can't we just take things away from the build? Well, there usually a lot of things in the build that shouldn't even or don't even need to be there.

Every plugin adds a timestamp file or many plugins too. And other things that if you look at them more closely, well, I don't really need it. So why even have it there when it creates cache and validation? Or you have volatile inputs and you make them stable for development. It might not matter if the version number is always changing or it's fixed. So you might change it to be fixed just for development or you can normalize them if you say it doesn't matter what the absolute path is, you just care about the relative path. Well then let's normalize the absolute path into a relative path.

Now, before we look at how we can go about finding build cache misses and fixing them, two things to keep in mind. In my opinion, the process of optimizing the developer productivity should also be a productive process itself. And we'll see what that means in a second. And the second one is probably for most of you, there's more you could optimize than you have time and resources available. So what do you optimize? Where do you invest your time, your resources, you need to make an informed decision. And you need the data to do so. And that's something we in development, productivity, that we always try to do is surface the data that allows you to make these decisions.

So if we want to go and attain this fully cacheable build that you don't start with usually, we propose you on some experiments. And those experiments, they should be measurable or they should be able to measure what happens while you run the experiment, as well as what comes out of the experiment. They should happen in a controlled environment. So if I run the experiment while somebody else is building on... Or CI is building, that should not affect my experiment. It should be reproducible. So if I wanna run the same experiment tomorrow or I wanna ask my colleague, please can you investigate why we have this build cache missed, he should be able to do exactly what I did and get the same results and it should be automated. So many human errors you can do, especially when it's something that sounds very simple. I run the build twice, clear the cache and so on, errors all over the place. So if we can automate that, that is part of running this experiment, reliable.

So to fulfill these requirements, we've created, so-called build validation scripts. We call them, they're available for free. They belong, you could say to Gradle enterprise, but they're not part of the core product. They're available on GitHub. And in combination with Build Scans and the build comparison feature, we can efficiently detect what is taken from the cache, what is not, why is it not taken from the cache address it rerun the experiment, with very little effort, right? And that's what it should be. It should be very little effort, it should be very productive as an exercise. And there are three types of experiment, we can kind of mimic what the developer does. They build locally, they change branches, they build in the ID, they build on the command line. Then we have the CI environment where we run on CI as part of a job, right?

So we run... Or as part of a pipeline, we run multiple jobs and we wanna use the cache, or we make a change and we make later a change again and then we wanna leverage the cache between those two builds. And the third scenario is we're building on CI. We're populating the cache and we're then building locally later as a developer. And we wanna benefit from whatever has been put into the cache, right? And for all these three scenarios, we, have scripts available that allow you to verify very efficiently, how cacheable is your build already.

So conceptually, before we see a demo, it starts by clearing the cache remote or locally it doesn't really matter conceptually. We clear that cache. We run one build, it populates the cache with all the entries, and then we run the build again, right? And that second build, we'll try to consume from the cache, get some hits, get some misses, get some more hits. And then when we're done, then it gets interesting, right? This is just mechanics. This is not where we want spend time or we wanna do something else while that happens. But then it gets interesting because now we can interpret the data, we can look at the Build Scans and other places and determine how well cacheable is our build already, where is it not? How much time does it cost us that it's not cacheable, right? So we can really assess what do we wanna do now? Do we want to fully optimize? Are we happy with how it is? That is then an informed decision that we can make. Okay?

So we're gonna do a little demo here and show you how we can use these experiments to automate the process of running, such an experiment and identify and then investigate those build cache misses so we can then quickly fix the root cause.

Jim Hurne: All right, so for this demo, we're going to run an experiment on one of Etienne's personal projects, and we're gonna find out how cacheable his project is. And if there's any areas for improvement. So first thing we're going to run our build validation script, which will execute the experiment for us. What it does is it checks out the project. So you get a completely clean checkout, and then it will run those two builds that Etienne was talking about. So here we can see the two builds just ran. Now it's fetching the data from Gradle enterprise, and now we can investigate. Now that the experiment has executed, we can look at that data and we can try to understand the cacheability of the build.

All right, so here we can see that 15 tasks in total, three of those were executed, the second time. And of those three, one of them was cacheable. And so that's a task that we wish did not run on the second build. So we can dig into this a little bit. We can look at the task, and kind of understand, which it was. So it was the test task. If we go back to the output from our experiment, we can, click on another link, which will load up a comparison of the inputs, for that task between the two different builds. So we can see here that there was something on the test tasks class path that changed. And when we look at it, it's a file that changed. So that's interesting. I wonder what's changed in that file between the two builds. So we can see why it was a mess. All right, so we'll go back. So one of the nice things about the build validation scripts is that it saves the state of the project for each of the builds. So that way we can say, what did that file look like at the end of build one, and what did that file look like at the end of build two? And we can just do a basic diff on them. So that's what I'm going to do here.

We'll go into the data directory. You can see there's a folder for the first build and the second build. And so we'll just do a diff across those two folders. And we can see here what changed is, a timestamp. So this file is one of those files that contains a timestamp in it. So if this was real life and we... If we felt it was really important to optimize this, we would use one of those strategies that Etienne introduced earlier to optimize this and get that cacheable task so that it's always being pulled from the cache on the second build.

Etienne Studer: So all the experiments we have run similarly to this one, including the ones on CI. We can go further than that though. We can even say we have a project that is not connected to Gradle Enterprise, it knows nothing about Gradle Enterprise. But we still wanna see what would we get out of the box? What would we get if we invest a little bit more into the cashing and get to a fully cacheable build. We wanna assess this with as little investment as possible. Maybe your company already uses Gradle Enterprise, some projects are connected, but you're working on a project that is not yet connected, but you still wanna see, how much would I get out of it? So you have data and you can go to your boss and say, "Hey, I also want to connect to it." And so we're gonna show you that as well on the demo.

All right, for this one, we thought it would be kind of fun to try connecting the Apache Maven project, the Gradle Enterprise, Maven itself. So we're gonna run the experiment, and we're gonna connect it to Gradle Enterprise, and we'll see how it does. So just like before, we will run an experiment. So here you can see we're enabling Gradle Enterprise on it, and we're also pointing it to a particular Gradle Enterprise server to publish the Build Scans too. Next, once again, we do a clean checkout of the project. We run two builds which because of Demo Magic, we can do almost instantly. And now we've got our output.

Jim Hurne: So again, let's take a look at the Build Scan for this, and we can see that 509 goals make up the project as a whole, 110 of those goals were avoided. And by the way, this doesn't happen automatically. Because we connected the build to Gradle Enterprise, it enabled build caching on the build. Normally, the Maven build does not have build caching on it at all, right? And so just by doing that, we avoided 110 tasks on that second build. One of the tasks we did not avoid even though it was cacheable. So just like before, we could dig into this and take a look and understand which of the goals was it that missed and then we could further optimize it if we wanted to.

Etienne Studer: Thank you. And maybe it sounds funny that we use Apache Maven to connect and see how well cacheable it is. But we are in touch with Apache Software Foundation and it's very likely that very soon all the Apache projects, whether it's Gradle or Maven, are going to be able and connect to Gradle Enterprise as part of an open source free offering. And then I think it's just a matter of time until Maven is probably built with Gradle Enterprise as well. We can go yet one more step. That is, well, maybe you don't even have Gradle Enterprise installed in your company at all. But you still wanna get some idea. What if I had it for Christmas maybe? How many savings would I get out of the box without investing? And if I invested a bit more, what would I get? All right? So we also have a mode for that, that we wanna quickly show you with a demo from Jim.

Jim Hurne: All right, for this one, we will try the experiment on Apache Beam. And this time we're not going to connect it to a Gradle Enterprise server, but we are going to enable the Gradle Enterprise, Gradle plugin. It's partly so that we can get the caching to work. All right, so here we go again, we're gonna run our experiment. This time you can see we're disabling Build Scan publishing, so it won't actually connect to... Or won't publish Build Scans to a Gradle enterprise server. We do the checkout and then we run our two builds. Once again, taking advantage of Demo Magic to do it very quickly. And we have our outcome here.

So because Build Scan Publishing was disabled, we don't have any of the links to a Build Scan of course, but we were able to still gather some information from the build and the build validation scripts show it to us here. So just by enabling Gradle Enterprise on Apache Beam, we avoided on that second build two minutes of build time because a lot of those tasks were taken from the cache. And we can also see that there was another about two seconds of build time that we could potentially optimize out if we did the same process that we've showed you before, where we connect this to Gradle Enterprise, we've published a Build Scan. We can dig into that and understand why the task, cacheable task was executed when we didn't want it to be.

Etienne Studer: All right. So if you interested, you wanna know what could you get out of it, you have nothing to connect to. You can still get that data very cheaply, very quickly, and make an assessment, make a use case and also determines it worth it, right? We saw that there's two seconds left, you could optimize. You can debate whether that is worth doing or not, saving two seconds. But the two minutes and 53 seconds that you get out of the box, I think everybody would take them per build. And there's also one area we didn't touch on at all, and we will not touch on this in presentation, but there are also these tasks that are not cacheable ever by default. You might have written your own task, it will not be taken from the cache unless you do something with it and we're not going there. But you could also make those cacheables and transform something that would never be taken to something that is taken from the cache.

Okay. We can do a little bit of interesting extrapolation using that data that we captured. So just looking at a single build, we're not looking at the stream of data. We're looking at a single build. How much work can we avoid? It's basically the savings we realized from the beginning, that's what you saw the number, plus the potential of what else we could save. That is basically our total savings that we can get. But so far this has all been sequential. We've just been adding up task execution times or goal execution times. So we need to normalize this to stay in this type of wording, with the parallelization. And so if we normalize parallelization, we get the number that is basically determining the maximum build-time savings we would get for a single build, which is basically everything's taken from the cache. That's how much we can save, taking into account parallelization. And then we can take a little bit of experience into account. So that's why it gets a bit more fuzzy. But what we see from experience is that if we have a maximum build time saving of x, that the average build-time savings you get over many builds are somewhere between 35 and 65%. It can of course be lower and higher, but on average, the average savings you get are somewhere in there. So just having this single build, you can already kind of do the math. So I'm doing this many builds per day, per week. This probably the amount of savings I get per week and how much is that worth?

Okay. So far, it's has all been about making your build cacheable, but once you're there, you want to keep it cacheable. You did all this investment, you don't wanna regress. And we had customers, we helped them optimize the builds. They got the build times done significantly, and half a year later they call us again and say, "Hey, we're back to where we were." Why did this happen? It's not the fault of caching, but you add more people, you add more projects, you add more build logic, and all these things can lead to regression in your cacheability. It's like you have code and you make changes to your code. Well, you don't just test your code when you write it, you also test it over many iterations, if we heard from checkpoints yesterday, they still run tests from 20 years ago. So it's the same situation here in that we wanna make sure we don't regress.

And so, how can we do this? First of all, we need some automation, ideally that runs on CI. Unless you wanna do this everytime yourself when you go to work in the morning to catch these regressions. You wanna run that automation either based on some trigger, like a timing trigger or based on some changes. And then when you run that automation, it should fail if your build is not fully cacheable anymore. All right? So you instantly spot it just like you run tests, and then when they fail, you instantly know I broke something. Unless it's a flaky test. So we can run experiments to do so and catch these regressions. We can also look at historic build data. If we have a whole set of data and we don't even know where to start, we can also just look at historic build data and try to determine from the historic build data, should we have a cache hit here, but we had a miss? And the third approach is we can also look at build cache-related failures. I'll explain later why this is important. And then once we know, that's the key, we need to know. Fixing is usually easy, but the knowing is not the hard part, but it's the part where we need to do a little bit of investment. So we're gonna give you demo for the first case, which is we wanna run something on CI that catches these regressions.

Jim Hurne: All right, So here you can see we're in TeamCity, our CI system, and we've set up a couple of jobs to make sure that the spring projects remain fully cacheable, fully optimized. And so here you can see this job that's using the build validation scripts. That's how we've automated it. We can see that for a while it was pretty good. Everything remained cacheable, and then just recently, it stopped being cacheable. So we can dig in right here and see what's going on. If we go to the build log and then we go all the way down to the bottom, we get to see something very familiar by now, the same experiment summary that we saw when we were running the experiments directly from the command line. And so from here we can click on one of these links and load up the Build Scan. And once again, we can go through the same process. We can take a look at which tasks were avoided and which ones were not and then start an investigation into why this regression slipped into the build.

Etienne Studer: Cool. So it's basically reusing the experiments we've already had. We're just running them and we're making the experiment fail if something's not taken from the cache. So, that's not part of the presentation.

No. Oh, it's coming back. There it is. I must have stepped on it. All right. So, the second approach we can take is we can look at historic data. So we have a ton of builds, we have multiple projects. So how do we know if something regressed? And I think this can be done in different degrees of sophistication. One is, you take all the builds from the same project, you take the builds with the same commit ID and you take non-dirty builds. So they don't have any local modifications, whether it's a CI build or a local build, doesn't matter but if they're in that same stage, you can expect to have build cache hits. There's some nuances to it, but it gives you a list that is already quite a bit narrowed down, and you can go from there. And then you can start comparing such builds that meet these criteria. And you can check, Well, did my second bill take everything from the cache, which was already put into the cache by the first build? That's the idea here. And you could go even further. In my opinion, you could even apply something like machine-learning and try to predict which cache misses were due to actually a volatility in an input. So we did this on the Spring project. Spring project is always a good project. It's an open-source project. It uses Gradle Enterprise openly.

And they're in a good state. And without, it shouldn't look overwhelming. But what we see is when we did this run, we looked at these builds, we compared them. There was one that popped up and it's the Asciidoctor. If we had followed through on what Jim showed before in this experiment that we ran on CI, the Asciidoctor one was the one showing up. This way, the same task showed up again, with this approach. And once we know, oh, Asciidoctor should have taken in this second build here, it should have taken things from the cache, but it didn't, then we know what to do. We need to fix that and investigate and fix that cache miss. And this is an approach that would work even if you have 20,000 builds a day.

Just find those pairs where you expect cache hits, and then investigate when you don't have the cache hits. The third approach, and they're not exclusive, ideally they're used in combination with each other is around built cache failures. What we often see is people optimise the projects, they benefit from the cache highly, and then suddenly a month later, well, we don't get any... The build times have gone up, we don't see any savings anymore. What's going on? And then oftentimes it turns out either there was a network issue, suddenly the TeamCity or the Jenkins server cannot reach Gradle Enterprise anymore so things are not stored in the cache anymore. Or access keys have been switched but the CI has not been updated and so on. Not really things related to caching itself, but the whole setup, the infrastructure and build caching is no longer working.

And this can easily be missed. By going through all the builds and trying to find these build cache errors, which is very easy, this is all captured data and Build Scans, we can see did we have any build remote cache errors? And we see in Spring Boot and Spring security and Spring they had some... It's very low. If you have a serious problem, this number would be much higher, especially when you have a lot of builds. But just as an example, I wanted to show that we can expose these errors and then we can do something about them. Or maybe there could be a scenario where you have one cache entry that is really big. It's too big for what the cache accepts. And if that entry is put into the cache very early in the build, it'll be refused by the cache node.

And as a consequence, Gradle as well as Maven will turn off caching for the rest of that build. Anything else that happens in that build will not benefit from caching, it'll also not push anything to the cache. Any later build will also not benefit from the cache. It can be pretty severe and it can be hard to detect unless you put up something like this. Just a few final observations around build caching. You could see some negative avoidance savings, meaning, it takes longer to fetch it from the cache than to build it yourself or build it on CI. That can be the case in when you have really slow network connections. Maybe you're on a VPN and it just takes too long and then the benefit is not there anymore. But there are things you can do about this besides improving the network.

It can also be a situation where your cache entries on CI live longer in the local cache than in the remote cache. Because in the remote customer might get evicted faster, and as a consequence, you might end up with a situation where you run on CI, CI finds entries in the local cache so it doesn't do anything with the remote cache, but then when somebody else... But then it gets re-evicted from the remote cache. If I then build locally and I try to get it from the remote cache, it's not there, but for CI it's still in the local cache. That's something to keep in mind. And one strategy is to just turn off local caching for CI and just rely on the remote cache. And then a third thing we see around build caching is that sometimes these pipelines are set up that they fan out and they fan out very quickly. They fan out so quicky that they end up with jobs where the different jobs to do a lot of common work before they do the very specific work.

And one thing to tackle that is that you start introducing some seed jobs that happen before you start fanning out. They do the common work and then when we get to the specific work, they can take that common work from the cache and then do the specific work. Just a few things that maybe you want to take a look at in your setup. To round this off, some of you surely have a lot of projects. I know some big companies have one project but there are also those that have thousands of projects. And so then the question becomes, well, where do I even start? You cannot optimise your 10,000 projects.

It doesn't even make sense. They're not all worth optimizing. So where do you get the most return for your investment and what does that mean? You first need some data. We can capture data, but you probably... Or maybe you're not in a situation to connect all all 10,000 products to Gradle Enterprise first to get the data. But how can you still get the data to make that decision where to invest because once you have it, you can prioritize. And what we offer for different CI servers is you can install plugins for different CI servers and they instrument your build with Gradle Enterprise and capture data and send it as Build Scans to Gradle Enterprise without modifying the project. From one minute to the next when you turn this on, you start capturing a Build Scan data for every build you run. And that gives you the data to then reason about where should you invest your time in optimizing projects. Or maybe you see it's a very popular project, it builds a lot, but it's already very optimized. Well, okay, let's move to something else. We'll do a very short demo here as we come towards the end.

Jim Hurne: All right. Once again, we're going to use Apache Beam for our example here. And what we have is Apache Beam in TeamCity and we've applied two TeamCity, a Build Scan plugin. And that plugin enables Gradle Enterprise on the Apache Beam project without modifying the project in any way. We haven't changed any source files. All we've done is we've applied this plugin.

If we go into the build, we can see there's a Build Scan tab and then it allows us to load up a Build Scan. And what's really interesting here, however, is that that plugin did not just allow us to publish Build Scans. That in of itself would be useful but applying that plugin enabled build caching on this project as well. And that's what we can see here in the Build Scan. And so just by using a plugin in CI, we got publishing of Build Scans for this project and we got build caching. So we're already getting a lot of benefit on CI without having to make any project changes whatsoever.

Etienne Studer: Just last week we did this on a project. They already were using Gradle Enterprise and some project, but not on all. They enabled that plugin and they quadruple the number of Build Scans on CI really literally over the weekend. So over the weekend they enabled it, they already ran some builds and they were four times the number of Build Scans than they had before. And that gives them a sheer amount of data that they can now decide upon where should we invest our time to improve. And it's not just about improving, it's also finding instabilities, finding failures and so on all available without even modifying this project and now we can start prioritizing.

I wanna give you one example to come to an end here is we, again looked at the Spring instance, and it doesn't matter if you cannot read the exact numbers. But because we now have all this data from the Build Scans, we can accumulate, build numbers, build time, serial task, execution time, how much was avoided serially, how much could be avoided and so on.

And we can also put that into relation into the total task execution time or goal execution time. And then we can get an idea of how good a shape this project already is. And we help them optimize these projects. So the numbers here are really good. So we see 50% of all the task execution time is not done in Spring projects across many months of data because they just use the cache. But then there are projects, and I'll say the numbers because you probably cannot read them, that only get like 11% already from... Are being avoided, but they have a big potential of 63 or something percent that could come from the cache. And these are candidates when you have low numbers here of what is already avoided and high numbers of what could be avoided, but you want to invest those projects that you get the most return for your investment.

And imagine you have 10,000 projects, like I said before. You cannot manually determine this. But what you also need to take into account, it's not just where do I get the least thing and I could get the most, it's also how much time is actually spent building these projects? And if you see a project is only building for 25 minutes in a time of 30 days, well you probably don't wanna invest into this even if you could get a high percentage of task kits. So all that has to be taken into account, but you have the data to do so and you didn't even have to modify your project. And that really gets us to the end of what we wanted to show you. So it's basically giving you some inspiration how we can go about making your projects cacheable, keeping them cacheable and getting to the data to make informed decisions where you want to invest time into. And I would say avoiding work is work. It doesn't come for free but it pays off very quickly. Thank you and have a good rest of the...