What’s inside?

Imagine how productive developers would be if they could harness the full compute power of the cloud directly from their modest laptops. In this talk, LinkedIn’s Developer Productivity and Happiness team describes how they launched a new remote development experience that makes build times faster, more consistent and predictable, all while reducing manual efforts to set up the environment and start coding. The results?  Initial setup and build times dropped from 10-30 minutes to just 10 seconds for most of their products.

Summit Producer’s Highlight

“The build times improved a lot” is exactly the feedback LinkedIn’s Developer Productivity and Happiness team loves to hear from engineers. Faced with increasingly long build times and inconsistencies between local and CI development, they embarked on a mission using DPE practices to improve productivity with a remote developer acceleration platform called “RDev”. RDev provides developers access to a powerful, pre-built cloud environment that eliminates delays due to manual setup and local workstation performance inconsistencies. Learn how this continuing effort became an instant success across user teams by reducing build times from 10 minutes to just 10 seconds–freeing up countless hours of wasted time to do productive work.

About Shivani

Shivani Pai Kasturi is a Staff Software Engineer at LinkedIn. She is part of the Build Platforms team that mainly focuses on boosting Developer productivity by improving the build speed for LinkedIn products while keeping it reliable and stable. She has been working on the Remote Development project for the past 2 years and was involved from its ideation phase. She leads the Remote Development solution for Webapps and the AI/ML Persona. Prior to this, she was the key contributor to AI-based implementation for surfacing knowledge-base articles to enable LinkedIn developers to help themselves.

About Swati

Swati Gambhir is a Staff Software Engineer at LinkedIn. For her last 4 years at LinkedIn, she has been a part of various key projects to improve the developer experience and productivity. Recently, she led the Container strategy and Image Infrastructure for containerizing builds, development, and deployments. Prior to this, she received her master’s in Computer and Information science from the University of Florida and worked at Intel where she was a lead developer on Code Signing infrastructure.

More information related to this topic
Gradle Enterprise Solutions for Developer Productivity Engineering

Gradle Enterprise offers several build and test acceleration technologies to minimize feedback cycle time. They include Build Cache–which reduces build and test times by avoiding re-running code that hasn’t changed since the last successful build–and Test Distribution–which accelerates testing by parallelizing tests across all available infrastructure. Using these and other features, you can reduce your build and test times by 50-90%. Get started tracking down build and test performance issues with a free Build Scan™ for your Maven and Gradle Build Tool projects, leverage our video library, and register for a free, instructor-led Build Cache deep-dive training.

Check out these resources on keeping builds fast with Gradle Enterprise.

  1. Watch a 17-minute video on how to speed up your builds by 10X with Build Cache and other Gradle Enterprise features.

  2. Learn how to leverage a free Build Scan™ to more easily troubleshoot build performance problems.

  3. Sign up for our free training class, Build Cache Deep Dive, to get a more advanced course showing you how to optimize your build performance.

SHIVANI PAI KASTURI: Thank you. Hi, thanks. Hi everyone thank you for joining us today. Imagine what if I say all developers would have access to a development environment, but with the computing power of the cloud? What if I say no more setup issues, no more performance inconsistencies, and you’re all good. A new journey at LinkedIn can be productive in less than 30 seconds would you believe it? Yes. This is a reality and remote development makes this happen. I’m Shivani Pai Kasturi, one of the leads of the remote development project and I’ve been part of this project from its ideation phase design and also led the implementation of couple of key components. Today, I’m responsible for the remote development solutions for web apps, AIML and back-end persona.

SWATI GAMBHIR: Hi everyone. My name is Swati Gambhir, and I was also an initial developer on remote development. And I worked on creating an automated image pipeline that powers the remote development systems. And today I work on container strategy at LinkedIn, which also covers a few more use cases like GitHub actions, deployment containers and CI containers.

SHIVANI PAI KASTURI: And we are part of the developer productivity and happiness team. Being part of this team, we often strive to minimize the issues which our developers face and help improve their development lifecycle. So some of them can be like the setup hassle, which they go through, the ‘n’ number of steps which they need to follow, to set up their dev environment, be it for one stack or multiple stack depending on their team. The slow builds, this is pretty common, all of you go through it. And then the most famous one is it works on my box, but fails in CI or works for me, but does not work for my colleague. When the pandemic hit and we all moved to working remotely, these problems kept on increasing. It’s mainly because of the limitation of your resources, what you have, in the development environment or because of the network bandwidth that you’re using.

So that’s when we said, we put our foot down and said, we can totally do better than this and we need to solve this for our developers. With the hybrid world that we are in today, a better solution is needed more than before. And today we’ll walk you through how we started on this remote development journey, starting with our vision, introducing the concepts, share our architecture overview, and then also how we extended it for our use cases and what we are excited for next. So let’s start with our vision, we had three key considerations while defining our vision, we needed faster builds. We wanted the setup to be super easy, and then we wanted these builds to be consistent faster builds meant we had to move our development environment from our local to remote in our private cloud closer to services which require network operations like cloning or downloading dependencies. Ease of setup meant no more ‘n’ number of steps and it had to be fully automated. And then consistency meant we had to look at containerizing these environments so that we know exactly what we have in there, like only the necessary tools and packages and not influenced by whatever you run on your local.

So that’s when we introduced RDev, which is, remote development environment, which is fully containerized and has only the necessary tools and packages required to build your product. And as you can see in this graph, this shows the time it takes to download a dependency over multiple iterations. And when we tried this in an RDev, it’s not only faster, it’s also consistent like on your… This compares with your local environment where you are either just building or running multiple other stuff. Maybe you’re on a BlueJeans or a Zoom call, you’re attending meetings or you’re running a long running script, which is encroaching all your resources. So all these influence your build speeds. So now let’s look at a short demo of how easy and quick it is to get started and using a remote development environment.

So for this demo we’ll use a dummy product. This is a web app product and it uses Volta. Volta is also open sourced by LinkedIn. Basically it’s used to manage your Node and YAML versions. So, for the sake of compare and contrast, what we’ll do is we’ll look at how the local experience looks like and how the remote development experience looks like. So on your right hand side, what we are doing is on my local machine and on the left hand side we are trying to spin up remote development environment. Since this is a really small app, the clone could be really quick, but then remember on your local, you still need to get started with setting up, the environment. So you need to work with Volta here to install Node, install Ember, use YAML to install all your dependencies and then start the server.

But while we keep doing that, you see the remote development environment is already up and running. RDev is the CLI which we use to manage. So SSH uses, gives you access to the SSH terminal in this RDev this is what this does is it basically connects you to the Tmux session, which has the pre-configured init command already run. So in this case, the same commands which we showed on your local. So it has already run your Volta install commands, it has already run YAML to install all the dependencies and the server is already up and running while on the local side. You still see we have not yet even finished setting up the environment. You still need to start the server.

Before that, you need to install the dependencies. So you need to remember what all steps you need to do, but we remove all that in the RDev environment. So we can also connect using an ID to start coding. So for this demo, we’ll use VSCode. So, RDev code gives you an option and the experience is kind of similar to how you saw in the SSH terminal. So in the VSCode terminal, you would also see that it automatically attaches to the Tmux session, which has the initialization command already run. So we use LinkedIn custom VSCode extension to determine the presence of it. Now let’s access this server from the browser. This is a dummy LinkedIn homepage, and let’s try to make a code change. So we’ll go to the respective file, look for the line we want to change, and then it’s as simple as making your code changes. Same like your local, however you use your ID. You would use this in a similar way. So I’ll go ahead and make a small code change, change this one word. So as the server is already up and running, as soon as I save this file, the server starts rebuilding and your browser is automatically refreshed.

So it’s as simple as this. So any user, any app they want to work on. So they can start by using the RDev CLI and then spin up an environment, start coding. You can also, we have good integration with VSCode extensions. It’s already pre-configured in your RDev environment. So you can create Jira tickets if needed, raise PRs, everything from within this environment. Now let’s look at our journey and how we reached to this point. So this predates, like this is sometime in 2019 and 2020 beginning. That’s when we started ideating about, how can we solve this slow builds issue? How can we improve our developers’ productivity? And we started looking at why don’t we run remote builds instead of running local builds. And then also decided to take it a step forward and not just run builds but also have these environments fully remote instead of having it locally.

That’s when we had to define a container strategy at LinkedIn because this meant different products, from different personas are going to use this. And we needed an automated container pipeline. So we started with a POC. So we used a CLI server and a homegrown agent, which was responsible for basically deciding which host has availability and spinning up a docker container there. And we used this to manage the whole end-to-end system. This initially created the required excitement that we wanted, and a lot of people saw the obvious benefits of using these environments. That’s when we decided we need to take this a step forward and leverage the full potential of the container system. So we not only provide these environments, but we decided to pre-build them so that way, any user who requests for an environment always gets a fully pre-built environment and all their builds are warm builds and no more cold builds. The next step, obviously was to scale. So we looked at instead of reinventing the whole wheel, we decided to go ahead with using Kubernetes, and then we scaled our infrastructure. And after that we started demoing this and increasing the adoption across multiple personas.

SWATI GAMBHIR: Let’s look at a typical workflow that a developer used to follow before RDev came into picture. We all know the drill. Clone the code, which needs a remote network operation with the data centers and for cloning the code. Then run the build, which also needs network operation to download dependencies and then use your ID to develop. With remote development we change things a bit instead of users cloning the code, they just run RDev create command and we create a remote development environment, which is a container on the remote machine, which is a beefier machine closer to our network services. So all the network operations like pulling an image are very fast and building and downloading dependencies are faster too because they’re closer to the network. Now our users will connect to these remote development environments using their IDs that have remote SSH capabilities, or they can directly use SSH.

With the pre-built remote development workflow, we took this performance improvement a notch higher. Now instead of users creating the remote development environments, RDev infrastructure already pre-built these remote containers and have them set up for them. So they will already clone the code and build the code. And like Shivani showed in her demo, they can also customize what a pre-built environment mean by setting up the RDev init command, which could be either a short running command that has an exit code or if you wanna have a server up and running or initiate a database, a long running command can also be put as a setup instruction. And when users want to use these environments, they just send an RDev create command and we just assign them one of the pre-built RDevs. And if there is no RDev in the pre-built pool, it goes through the RDev create workflow, but usually they all will find an prebuilt RDev.

So this makes the performance very fast and they can have an environment set up and ready to code in a matter of seconds. So here is a product of considerable size at LinkedIn, and this is the performance improvement we achieved. So it used to take about five to seven minutes to clone this product and another 25 minutes to build this product. And with pre-built RDev, they can just get assigned this RDev environment in a matter of seconds. And imagine if this was a web app, they would also have to do additional setup before they can be productive by having a server up and running. And they can also put these setup commands to be included in the pre-built environments.

Now, let’s look at an overall, architecture for RDev. So on the leftmost side we have our container image registry. We provide a set of base images that we require. Most of our… We have multiple products using different technologies. So we create a set of base images like Python, JS, Java based images. Users can derive from these images and have their custom images created. And they can also customize their RDev environments by adding a Docker file that we will build for them. They can put the Docker file in our RDev configuration and we’ll build that to create the RDev containers, or they can just put the name of the image they want to use. We have semantic versioning available for these images, so they can always depend on the latest version of these images. RDev architecture contains three components, RDev CLI, RDev server, and RDev operator. We will have, we’ll discuss these in details in upcoming slides.

So let’s go over the container image infrastructure that we built. So like I mentioned, we have a set of base images that we created for our developers, and they are usually automatically, they can just use it and get onboarded, but if they have any customization, they can create their image type product and derive from one of these base images. We set up a pipeline, CI pipeline for the images, and we do a few extra steps there. I’ll just describe a couple of the important ones here. We add semantic versioning tags that help users to always depend on the latest version. We do that before publishing the images and we also add a dependency graph, while we build these images. So any package CLI or parent image dependencies are added in this graph and uploaded during the build CI build pipeline.

And using these two features, we can enable compatibility testing on these images. So whenever there is a change in the image, it goes through the products that are the consumers of these images. We build those products during the compatibility testing and we also enable automatic image updates, which means there is a service that runs daily jobs. So whenever there is any dependency package, or a parent image dependency update, we run these updates daily and we build all these images starting from the root OS image. Not necessarily root OS image, but wherever the package dependency has changed up to the leaf image. So this image updater keeps our images up to date. If there is a security update, we make sure, all the images get them as fast as possible. We can also trigger these updates manually.

But our jobs do run daily and once we have the CI pipeline, we publish our images. And instances of these images are used to run RDev containers and built in CI. So with that, we get rid of the problem that this was building on my machine, but it’s not building on CI. So this gives us a complete portability and consistency. So these are some options to customize your RDev setup. So this is inspired by VSCode’s remote SSH, and we have similar fields like, you can add your image, your Docker file, ports, environment variables, but also there are some LinkedIn specific fields that we have added. I’ll describe a couple of them here. So there’s exposed ports that are ports that are exposed outside of Kubernetes cluster. So this creates a shareable link using a free port on the remote host, and developers can share that with each other to share their in-progress work and get an early feedback. There is a build lifecycle field that our pre-built environments use. Previously we mentioned there is an RDev Net command. You can give your setup commands that we’ll use to make sure that pre-built environments are set up. So if that’s a long running command, like a running of a server, initializing a database, we use this URL, we hit this URL to make sure the setup is complete.

So RDev CLI is a Python based CLI, which is distributed to all users’ system. And we have multiple commands to manage these RDevs like RDev create, list the Rdevs, delete or other managing RDevs like command. RDev server is a RESTless service, which acts like a bridge between the CLI and Kubernetes operator. So once RDev CLI gives an RDev create command, RDev server will create a YAML configuration that is understandable by Kubernetes operator and get the results back. It also interacts with the database that has user configurations like dotfiles. Not only that, it also acts like a notifier to notify users of asking them to use their resources wisely.

And it also has a Kafka consumer that listens to publish events and keeps our pre-built pools up to date. It also has some scheduled jobs that make sure our pre-built pools are not broken due to a broken commit.

SHIVANI PAI KASTURI: Now let’s look into how the operator works. So we extend the Kubernetes API by leveraging the Kubernetes operator pattern, and we also define few custom resource definitions, which we call CRDs. So we have two CRDs. One is the RDev CRD, and the RDev Pool CRD. So let’s look into it a bit. So each RDev CRD basically is a single instance stateful application. So it has all the necessary details to spin up a new environment from scratch if needed. We have a service which is used, like Swati mentioned, to expose the ports outside the Kubernetes cluster. So we use NodePort for it. And then we use a persistent volume claim to reserve a persistent volume, which is a non-volatile volume. So basically the home directory, we have this guarantee that anything which user installs or checks out within the home directory will be persisted if in case there are restarts, or if in case the pod moves from one host to the other, then each RDev is backed by a Kubernetes pod. We have multiple immutable containers in this pod. So we have the rDev-init-workspace container, for example, which is responsible for initializing your workspace. So it is responsible for cloning your repo, setting up any environment variables needed, etcetera. Then we have the rDev-sshd container.

So as the name suggests, this provides the login service to the users. So we have the SSHD server running in this container. This container is spun up using the image mentioned in the dev container configuration, which Swati showed previously. And we have the sidecar container. Sidecar container does a bunch of stuff. So it’s basically responsible for also sending some lifecycle metrics, which we track. It’s also responsible for tracking user logins by monitoring the SSHD logs. It’s also responsible for checking whether the RDev pod is ready or not by running the startup probe and keeps probing to look whether the initialization command is finished or not, we have a bunch of volumes. So the home volume, pretty obvious it’s the home directory. So everything like the code, the environment variables, everything is set up here.

We have the RDev info volume. So this basically has some details necessary for us, like, for example, the pod details, the host name, et cetera. These are populated by using the downward API concept of Kubernetes. So we leverage the labels, annotations, and use downward API to automatically populate these files. Then we have a couple of other volumes as well. And then we have the certs and the secrets volume, which is mounted in the necessary containers to share the certs and access the secrets. Then we have the RDev Pool CRD. RDev Pool CRD uses the CloneSet CRD. CloneSet is a OpenKruise CRD. This is open sourced by Alibaba, so we leverage it. This is mainly responsible for maintaining a bunch of pods. So it maintains ‘n’ number of replicas depending on your spec. So these pods are basically run whenever, say you have a CloneSet spec defined. So based on the replicas, we spin up these pods and keep it ready. Now, how do we keep it ready is the point which, Swati shared previously in the configuration. So we leverage the build lifecycle URL, which we configure in the devcontainer.json. And if that is configured, that means we know that there is a long running process running in this pod, and we need to ensure that that process has finished running.

So we have the server up and running. So we pool and keep probing till this URL is up and running. If it’s something definite, like for example, Gradle build command, then what we do is we just look for the presence of a file in which we capture the exit code of the command. So based on that, we determine this pod is ready to be assigned to some developer. Then we have the RDev operator. RDev operator leverages the operator SDKQ builder framework. So this basically acts as a controller to determine the current state of the system and takes it to the desired state. So we have the RDev controller and the RDev pool controller. So RDev controller basically, whenever there’s a new request for creation, it looks for an existing, ready pod in the RDev pool controller and steals it and assigns it to the developer. The RDev pool controller now notices that the system is not in the desired state because one of its pod is missing or it’s been stolen. So then it tries to spin up one more in its place to maintain the state of the system.

SWATI GAMBHIR: Now that we looked at RDev infrastructure in detail, let’s look at some of the use cases that we also support. So we had a few users who wanted to leverage the remote development environments, but still have their source codes locally. So Rexec is, where they can do that. So they need to have their source code checked out locally and create an RDev, run an Rexec command. This will run the… This will basically, this users Rsync to sync the code changes first, and then users can give the command to run the computationally long heavy commands on remote environments, which also uses the pre dev workflow. That means the builds are incremental and not cold, so they are pretty fast, and then they can sync. The artifacts are synced back to the local, if required. So another use cases that containerized CI builds. So on the left hand side, you see, just the RDev workflow where the user creates the RDev environment and does their development. Once they push their code, it goes through OUR CI pipeline, where we have pre-merge validation jobs that are leveraging the same container that they configured for their RDevs. That means, they don’t see any inconsistency in the CI builds and pre-merge validation job also runs a few other static validations, project-specific validation, LinkedIn, et cetera.

Then the code is merged and then we run post merge validation job. That also creates, the same containers that are RDev leveraged, using the same image and it shares the results with other tasks of the job, like, compatibility testing. Once that validation is complete, the artifacts are published back to the Artifactory and RDev server that’s listening to the event, refreshes its pre-built pool, depending on the new changes and keeps the pre-built pool fresh. So here’s one of the biggest product that we have, and this is the average build time gains that we saw when they use RDev environments versus the previous development environments that they used. Here’s also some of the key metrics that we have about 28% of builds are happening on RDev today, and about 39% of the our developers are using RDev for some product or the other. There’s also more than 70% of improvement in P90 build duration that is noticed. So currently RDev is the recommended platform for our web apps. And we also support other personas like backend, Android, AIML personas, and we have received overwhelmingly positive feedback so far.

We are planning to extend it to other personas as well. RDev is their favorite part of LinkedIn [chuckle], so we often get asked this question, why not GitHub Codespaces? So answer is RDev predates GitHub Codespaces, but we are working with our partners at GitHub Codespaces to understand and evaluate performance and cost effectiveness of moving to Codespaces. We would also like to leverage and increase the reach of RDev by leveraging AKS instead of using our on-prem Kubernetes clusters that we are doing today. Please do check out our blog post and acknowledgement on remote development. There are also some amazing developer productivity blog posts available on LinkedIn Engineering blog posts. Q&A.

SHIVANI PAI KASTURI: With that, we are ready for Q&A.