Migrate to Cloud Run from a virtual machine #5

In this article, I'm going to outline the basic steps on how to migrate from a Virtual machine (VM) based hosting to Google Cloud Run. First I'll cover the general steps and then how we've approached this in our team while rolling out the release. Then I'll touch on the points of how.



Containerize application.

The ease of creating a container for a stack project can vary depending on the underlying technology. However, the process of creating a basic working container should not be excessively time-consuming. The main challenges come in fine-tuning the container to optimize its size and ensure all dependencies are properly configured. In our experience, we encountered issues when using an Alpine base image for certain dependencies, so we switched to Ubuntu, which increased the container size. Ultimately, the container image must be stored in a Google Cloud Artifact Repository to be used with Cloud Run.

Centralized secrets

This is a sort of mindset change. All secret values of .env file are stored in Google Cloud secret manager. While before it was stored in file on disk. This caused some developers (who are not into cloud or DevOps) some frustration. So something to look out beforehand and to communicate early.

Cloud Build

Google's CI/CD service. Nothing much to talk about here. Builds container images and can trigger a deployment to Cloud Run. This is what we went with anyway.

Deploy to Cloud Run

Deploying to Cloud Run can be a straightforward process for those with prior experience, however, configuring container instances can present a challenge. The main concern is balancing the number of instances to ensure stability and optimal billing. Too few cases may lead to overscaled and increased costs, while too many may result in wasted resources. Cloud Run automatically scales instances horizontally when CPU usage reaches 60%. However, it should be noted that Cloud Run currently has limitations when it comes to performance monitoring and tracking. To gain a deeper understanding of how Cloud Run is performing, it's recommended to monitor p95 and p99 of key metrics. Despite this, Google is actively working to improve the Cloud Run experience.

Monitoring

Monitoring your application on Google Cloud Run is crucial for infrastructure and application performance. It is important to ensure that each container is not consuming more resources than necessary, as determined during the testing phase. Additionally, creating log-based metrics can help monitor instances availability on Cloud Run - to track and build alerts when there is no available instance. If your service is unable to scale quickly enough to handle incoming traffic, it may lead to request failure. If your service processes queues or Pub/Sub messages, retrying requests on failure can be set. For user-facing applications served from a browser, the following measures can be taken:

  • Setting the application to retry requests based on status codes.
  • Configuring Load Balancer to retry requests. But the caveat is that it can be retried only for GET requests. (At least for now).
  • Reducing the start-up time for containers to improve their ability to handle requests quickly.

Monitoring infrastructure and application performance on Google Cloud Run is important to ensure an application is performing well.

How to move to Google Cloud Run when an application has an active user base.

One important piece of advice is to take a gradual approach. If your current setup does not utilize a Load Balancer, I recommend introducing one before delving into using Cloud Run. This can bring several benefits, such as:

  • A balance between computing resources by weights.
  • Balance by computing resources by application route. For example:
    api/v1/users/ would take the request to VM. /api/v2/users would take requests to Google Cloud Run service.
  • Offers an additional layer of monitoring. Ability to inspect how traffic is spread across various services by region.
  • Can (should?) act as a single point of entry into the network. Other Google Cloud services can be used in tandem. Such as Cloud Armor (WAF) to improve app security.

The load balancer is the key to a smooth migration to Cloud Run. Whatever goes wrong, then traffic can be routed back to the original compute resource, make adjustments and then try again.

Interacting with other services in your VPC.

Those services might be databases, and caching services. Since Google Cloud Run doesn't support direct access to the container instance (at least not at the time of writing), there are other ways to get inside but this would require using 3rd party tool. We've solved it by leaving one compute engine instance running where we can pull docker images, but introduces another problem syncing secrets with Secret Manager.

In Conclusion.

Migrating to Google Cloud Run can be a great way to improve the scalability, cost-efficiency, and performance of your application. While the process may seem daunting, containerizing your application, managing secrets, using Cloud Build, deploying to Cloud Run, and monitoring the application and infrastructure performance can be tackled with a clear plan and execution. I have provided some tips on how to approach the migration when the application has an active user base, such as taking a gradual approach and introducing a load balancer before delving into using Cloud Run. Migrating to Google Cloud Run may be a challenge but with the proper steps, you can achieve a smooth transition and enjoy the benefits (and suffer pains) that Cloud Run provides. This was a more technical article, more focused on the code & infrastructure part, but this type of change requires a "buy-in" from the team you find yourself working in and from an organization as well.