What is Cloud Run worker pools?
Cloud Run worker pools is a Cloud Run resource type for non-HTTP, pull-based background workloads. Instances are long-lived and continuously pull work from sources such as Pub/Sub pull subscriptions, Kafka and Redis task queues, and self-hosted GitHub Actions runners. Unlike Cloud Run services, worker pools have no load-balanced endpoint and no URL, and they do not scale in response to incoming HTTP requests.
Worker pools solve the problem that request-driven serverless models fit poorly with continuously running processing. Teams that consume messages from queues, run distributed AI/ML jobs, or operate CI/CD runners previously needed either self-managed VMs or Kubernetes. With Cloud Run worker pools, this background work runs on the serverless Cloud Run platform, with resource-based billing and GPU support, and without operating your own cluster.
Core features
- Pull-based background processing: Long-lived instances continuously pull work from queues such as Pub/Sub, Kafka and Redis, without a load-balanced endpoint or URL.
- GPU support for AI/ML: NVIDIA L4 (24 GB VRAM) and NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), with a limit of one GPU per instance, for distributed inference and batch jobs.
- Manual scaling and large instances: Instance count is configured manually; instances scale up to 44 vCPU and 176 GB RAM, with up to 10 containers (one main container plus up to nine sidecars).
- Full Cloud Run integration: Environment variables, secrets, health checks, VPC egress and ingress, NFS and Cloud Storage volumes, and immutable revisions per deployment.
Typical use cases
Queue consumers: Worker pools continuously process messages from Pub/Sub pull subscriptions, Kafka topics or Redis task queues, replacing self-managed workers on VMs or in Kubernetes.
Distributed AI/ML jobs: With GPU support, inference and batch processing for AI models run serverlessly, without provisioning and operating a GPU cluster.
Self-hosted CI/CD runners: Worker pools operate self-hosted GitHub Actions runners that continuously wait for new jobs and scale as needed.
Benefits
- Serverless model for continuously running background work without your own VMs or Kubernetes clusters
- Resource-based billing, which Google states is around 40 percent cheaper than request-driven services or jobs for long-running work
- GPU support for AI/ML workloads directly on the Cloud Run platform
- EU regions available (europe-west1, europe-west4) for privacy-compliant processing
Integration with innFactory
As a certified Google Cloud Partner, innFactory supports you with the adoption and operation of this service.
Typical Use Cases
Frequently Asked Questions
What is Cloud Run worker pools?
Cloud Run worker pools is a Cloud Run resource type for non-HTTP, pull-based background workloads. Instances are long-lived and continuously pull work from queues such as Pub/Sub pull subscriptions, Kafka or Redis. Unlike Cloud Run services, worker pools have no load-balanced endpoint and no URL, and they do not scale in response to incoming requests.
When should I use Cloud Run worker pools?
Worker pools are a fit when you run continuous consumers for message queues (Pub/Sub, Kafka, Redis), execute distributed AI/ML inference or batch jobs with GPU, or operate self-hosted CI/CD runners such as GitHub Actions runners. For request-driven HTTP endpoints, Cloud Run services are the right choice.
How much does Cloud Run worker pools cost?
Billing is resource-based and pay-per-use. vCPU and memory are billed for the full lifetime of the instance, and GPU is billed per second including idle uptime. Regional Tier 1 and Tier 2 rates apply. For long-running background work, Google states this billing is around 40 percent cheaper than request-driven services or jobs. Current pricing is listed on the official Cloud Run pricing page.
Which GPUs and limits apply to worker pools?
GPU support is generally available with NVIDIA L4 (24 GB VRAM) and NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), with a limit of one GPU per instance. GPU worker pools cannot be autoscaled. Instances can be configured with up to 44 vCPU and 176 GB RAM, with up to 10 containers per instance (one main container plus up to nine sidecars).
