Skip to main content
Cloud / Google Cloud / Products / Cloud Run worker pools - Pull-Based Background Work

Cloud Run worker pools - Pull-Based Background Work

Cloud Run worker pools: serverless resource for non-HTTP, pull-based workloads from queues like Pub/Sub and Kafka, with GPU support for AI/ML jobs.

Serverless
Pricing Model Pay-per-use, resource-based (vCPU plus memory over instance lifetime; GPU per second incl. idle)
Availability Multiple regions incl. EU (e.g. europe-west1 Belgium, europe-west4 Netherlands)
Data Sovereignty EU regions available (europe-west1, europe-west4)
Reliability 99.95% (Cloud Run SLA) SLA

What is Cloud Run worker pools?

Cloud Run worker pools is a Cloud Run resource type for non-HTTP, pull-based background workloads. Instances are long-lived and continuously pull work from sources such as Pub/Sub pull subscriptions, Kafka and Redis task queues, and self-hosted GitHub Actions runners. Unlike Cloud Run services, worker pools have no load-balanced endpoint and no URL, and they do not scale in response to incoming HTTP requests.

Worker pools solve the problem that request-driven serverless models fit poorly with continuously running processing. Teams that consume messages from queues, run distributed AI/ML jobs, or operate CI/CD runners previously needed either self-managed VMs or Kubernetes. With Cloud Run worker pools, this background work runs on the serverless Cloud Run platform, with resource-based billing and GPU support, and without operating your own cluster.

Core features

  • Pull-based background processing: Long-lived instances continuously pull work from queues such as Pub/Sub, Kafka and Redis, without a load-balanced endpoint or URL.
  • GPU support for AI/ML: NVIDIA L4 (24 GB VRAM) and NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), with a limit of one GPU per instance, for distributed inference and batch jobs.
  • Manual scaling and large instances: Instance count is configured manually; instances scale up to 44 vCPU and 176 GB RAM, with up to 10 containers (one main container plus up to nine sidecars).
  • Full Cloud Run integration: Environment variables, secrets, health checks, VPC egress and ingress, NFS and Cloud Storage volumes, and immutable revisions per deployment.

Typical use cases

Queue consumers: Worker pools continuously process messages from Pub/Sub pull subscriptions, Kafka topics or Redis task queues, replacing self-managed workers on VMs or in Kubernetes.

Distributed AI/ML jobs: With GPU support, inference and batch processing for AI models run serverlessly, without provisioning and operating a GPU cluster.

Self-hosted CI/CD runners: Worker pools operate self-hosted GitHub Actions runners that continuously wait for new jobs and scale as needed.

Benefits

  • Serverless model for continuously running background work without your own VMs or Kubernetes clusters
  • Resource-based billing, which Google states is around 40 percent cheaper than request-driven services or jobs for long-running work
  • GPU support for AI/ML workloads directly on the Cloud Run platform
  • EU regions available (europe-west1, europe-west4) for privacy-compliant processing

Integration with innFactory

As a certified Google Cloud Partner, innFactory supports you with the adoption and operation of this service.

Typical Use Cases

Continuously processing Pub/Sub pull subscriptions
Consumers for Kafka and Redis task queues
Distributed AI/ML inference and batch jobs with GPU
Self-hosted GitHub Actions runners

Frequently Asked Questions

What is Cloud Run worker pools?

Cloud Run worker pools is a Cloud Run resource type for non-HTTP, pull-based background workloads. Instances are long-lived and continuously pull work from queues such as Pub/Sub pull subscriptions, Kafka or Redis. Unlike Cloud Run services, worker pools have no load-balanced endpoint and no URL, and they do not scale in response to incoming requests.

When should I use Cloud Run worker pools?

Worker pools are a fit when you run continuous consumers for message queues (Pub/Sub, Kafka, Redis), execute distributed AI/ML inference or batch jobs with GPU, or operate self-hosted CI/CD runners such as GitHub Actions runners. For request-driven HTTP endpoints, Cloud Run services are the right choice.

How much does Cloud Run worker pools cost?

Billing is resource-based and pay-per-use. vCPU and memory are billed for the full lifetime of the instance, and GPU is billed per second including idle uptime. Regional Tier 1 and Tier 2 rates apply. For long-running background work, Google states this billing is around 40 percent cheaper than request-driven services or jobs. Current pricing is listed on the official Cloud Run pricing page.

Which GPUs and limits apply to worker pools?

GPU support is generally available with NVIDIA L4 (24 GB VRAM) and NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM), with a limit of one GPU per instance. GPU worker pools cannot be autoscaled. Instances can be configured with up to 44 vCPU and 176 GB RAM, with up to 10 containers per instance (one main container plus up to nine sidecars).

Google Cloud Partner

innFactory is a certified Google Cloud Partner. We provide expert consulting, implementation, and managed services.

Google Cloud Partner

Similar Products from Other Clouds

Other cloud providers offer comparable services in this category. As a multi-cloud partner, we help you choose the right solution.

AWS

AWS Lambda durable functions - Stateful Workflows

AWS Lambda durable functions: stateful, multi-step workflows with checkpointing, runtimes up to one year, and no compute …

Pricing Pay-per-use (standard Lambda: requests + …
SLA 99.95% (AWS Lambda SLA)
Compare →
AWS

AWS Lambda Managed Instances - Functions on EC2

AWS Lambda Managed Instances runs Lambda functions on dedicated EC2 compute: predictable performance, up to 32 GB …

Pricing Pay for EC2 compute (instance-hours), …
SLA N/A
Compare →
Azure

Azure CycleCloud - HPC Cluster Orchestration

Azure CycleCloud orchestrates HPC environments on Azure: provision infrastructure, deploy schedulers and autoscale …

Pricing No license or subscription fee, only the …
SLA N/A (tool without its own SLA, depends on the Azure services used)
Compare →
Azure

Azure Kubernetes Service (AKS): Managed Kubernetes

Azure Kubernetes Service (AKS): managed Kubernetes with a free control plane, 99.95% SLA, and AKS Automatic for …

Pricing Pay for agent nodes only (control plane …
SLA 99.95% with Availability Zones, 99.9% without
Compare →
Azure

Azure Modeling and Simulation Workbench - Design Env

Azure Modeling and Simulation Workbench: secure on-demand environment for engineering design and simulation with …

Pricing Pay-as-you-go (consumption): chamber …
SLA N/A
Compare →
Azure

Azure Service Fabric: Microservices Platform

Azure Service Fabric: distributed systems platform to package, deploy and manage microservices and containers, with …

Pricing Free service, you pay only for …
SLA Depends on cluster configuration (underlying VM scale sets)
Compare →

49 comparable products found across other clouds.

Ready to start with Cloud Run worker pools - Pull-Based Background Work?

Our certified Google Cloud experts help you with architecture, integration, and optimization.

Schedule Consultation