Cloud Run worker pools - Pull-Based Background Work · innFactory

What is Cloud Run worker pools?

Cloud Run worker pools is a Cloud Run resource type for non-HTTP, pull-based background workloads. Instances are long-lived and continuously pull work from sources such as Pub/Sub pull subscriptions, Kafka and Redis task queues, and self-hosted GitHub Actions runners. Unlike Cloud Run services, worker pools have no load-balanced endpoint and no URL, and they do not scale in response to incoming HTTP requests.

Worker pools solve the problem that request-driven serverless models fit poorly with continuously running processing. Teams that consume messages from queues, run distributed AI/ML jobs, or operate CI/CD runners previously needed either self-managed VMs or Kubernetes. With Cloud Run worker pools, this background work runs on the serverless Cloud Run platform, with resource-based billing and GPU support, and without operating your own cluster.

Core features

Pull-based background processing: Long-lived instances continuously pull work from queues such as Pub/Sub, Kafka and Redis, without a load-balanced endpoint or URL.
GPU support for AI/ML: NVIDIA L4 (24 GB VRAM) and NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), with a limit of one GPU per instance, for distributed inference and batch jobs.
Manual scaling and large instances: Instance count is configured manually; instances scale up to 44 vCPU and 176 GB RAM, with up to 10 containers (one main container plus up to nine sidecars).
Full Cloud Run integration: Environment variables, secrets, health checks, VPC egress and ingress, NFS and Cloud Storage volumes, and immutable revisions per deployment.

Typical use cases

Queue consumers: Worker pools continuously process messages from Pub/Sub pull subscriptions, Kafka topics or Redis task queues, replacing self-managed workers on VMs or in Kubernetes.

Distributed AI/ML jobs: With GPU support, inference and batch processing for AI models run serverlessly, without provisioning and operating a GPU cluster.

Self-hosted CI/CD runners: Worker pools operate self-hosted GitHub Actions runners that continuously wait for new jobs and scale as needed.

Benefits

Serverless model for continuously running background work without your own VMs or Kubernetes clusters
Resource-based billing, which Google states is around 40 percent cheaper than request-driven services or jobs for long-running work
GPU support for AI/ML workloads directly on the Cloud Run platform
EU regions available (europe-west1, europe-west4) for privacy-compliant processing

Integration with innFactory

As a certified Google Cloud Partner, innFactory supports you with the adoption and operation of this service.

Frequently Asked Questions

What is Cloud Run worker pools?

Cloud Run worker pools is a Cloud Run resource type for non-HTTP, pull-based background workloads. Instances are long-lived and continuously pull work from queues such as Pub/Sub pull subscriptions, Kafka or Redis. Unlike Cloud Run services, worker pools have no load-balanced endpoint and no URL, and they do not scale in response to incoming requests.

When should I use Cloud Run worker pools?

Worker pools are a fit when you run continuous consumers for message queues (Pub/Sub, Kafka, Redis), execute distributed AI/ML inference or batch jobs with GPU, or operate self-hosted CI/CD runners such as GitHub Actions runners. For request-driven HTTP endpoints, Cloud Run services are the right choice.

How much does Cloud Run worker pools cost?

Billing is resource-based and pay-per-use. vCPU and memory are billed for the full lifetime of the instance, and GPU is billed per second including idle uptime. Regional Tier 1 and Tier 2 rates apply. For long-running background work, Google states this billing is around 40 percent cheaper than request-driven services or jobs. Current pricing is listed on the official Cloud Run pricing page.

Which GPUs and limits apply to worker pools?

GPU support is generally available with NVIDIA L4 (24 GB VRAM) and NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), with a limit of one GPU per instance. GPU worker pools cannot be autoscaled. Instances can be configured with up to 44 vCPU and 176 GB RAM, with up to 10 containers per instance (one main container plus up to nine sidecars).

Cloud Run worker pools - Pull-Based Background Work

What is Cloud Run worker pools?

Core features

Typical use cases

Benefits

Integration with innFactory

Typical Use Cases

Frequently Asked Questions

What is Cloud Run worker pools?

When should I use Cloud Run worker pools?

How much does Cloud Run worker pools cost?

Which GPUs and limits apply to worker pools?

Quick Links

Google Cloud Partner

Similar Products from Other Clouds

Amazon EC2 - Virtual Servers

Amazon EC2 Auto Scaling - Automatic Capacity Adjustment

Amazon Lightsail - Simple Cloud Hosting

Amazon Linux 2023 - Optimized Linux Distribution for AWS

AWS App Runner - Container Hosting Without Infrastructure

AWS Batch - Batch Computing in the Cloud

Ready to start with Cloud Run worker pools - Pull-Based Background Work?