Platform Design

Scaling & Orchestration

How the platform scales across customers, regions, and versions: one single-tenant Kubernetes cluster per customer, provisioned with Pulumi and orchestrating the LiveKit rooms and sim-engine workers inside it.

Status: accepted design

The decisions on this page are tracked formally as ADR-0020 through ADR-0027 on the Architecture Decisions page, and grounded in LiveKit's official Kubernetes and distributed multi-region documentation.

The model in one picture

Global control plane

Org router · geo DNS · session orchestration · billing

Mercy Health

AWS · us-east-1

v1.4.5
Ingress / LBTLS termination
LiveKit node poolhost-net · 1 pod/node
Sim + agent workersautoscaled pods
Redis + observabilityper cluster

Charité Berlin

GCP · europe-west3

v1.4.9
Ingress / LBTLS termination
LiveKit node poolhost-net · 1 pod/node
Sim + agent workersautoscaled pods
Redis + observabilityper cluster

NHS Trust

Azure · uk-south

v1.5.0
Ingress / LBTLS termination
LiveKit node poolhost-net · 1 pod/node
Sim + agent workersautoscaled pods
Redis + observabilityper cluster

One single-tenant cluster per customer — independent cloud, region, and version. The control plane routes each org to its own cluster.

Pulumi and Kubernetes are different layers, not a choice

The recurring question “Pulumi or Kubernetes?” is a category error — they stack:

Pulumi — IaC layer

Provisions the cloud account, network, managed cluster, node pools, Redis, DNS, and secrets — and deploys the in-cluster workloads (including Helm releases) via its Kubernetes provider. All in TypeScript.

Kubernetes — workload layer

Runs and autoscales LiveKit nodes, sim-engine workers, and agent workers on the cluster Pulumi created. Managed control planes (AKS/GKE/EKS) mean we never run the control plane ourselves.

Result

One AI-writable IaC tool provisions many small, homogeneous per-customer clusters; Kubernetes orchestrates each one.

One cluster per customer (the cell is the customer)

Because each customer is an entire organization — a hospital or medical school with hundreds or thousands of users — operating in one region on their own pinned version, the deployment unit collapses to a single-tenant cluster per customer. This dissolves several hard problems at once:

  • Version independence — Org A on 1.4.5 and Org B on 1.4.9 are just different clusters with different Helm release versions. No mixed-version cluster, which is exactly what LiveKit's “homogeneous instances” model wants.
  • Data residency — the cluster lives in the customer's region, full stop.
  • Isolation — a single-tenant cluster is the strongest tenant boundary there is, ideal for HIPAA/PHI, and one customer's blast radius can't reach another's.
  • Trivial routing — no cross-region selector needed; the control plane is a simple org → cluster lookup.

The cost trade-off is deliberate

A full cluster per customer has a higher baseline cost than packing tenants together — but a customer is a whole organization paying a ~$30–40K annual license and demanding security and version control. Namespace-per-tenant was the cheaper alternative, consciously traded away for version and isolation independence. See the Cost Model page to tune the numbers.

How LiveKit expects to run on Kubernetes

Grounded in LiveKit's own Kubernetes and distributed docs, the media plane has specific requirements:

  • Host networking, one pod per node. LiveKit pods need direct network access so the node's rtc.udp/tcp ports are handled by the server — which limits it to one LiveKit pod per node (other workloads may co-reside).
  • TLS at the ingress. The official Helm chart terminates signal-connection TLS at the cloud LB (GKE LB, AWS ALB via the Load Balancer Controller, or nginx-ingress + cert-manager). A separate SSL cert is still needed for the embedded TURN/TLS server.
  • Redis coordinates the cluster. Nodes report load to Redis; new rooms are placed on the least-loaded node; a room is pinned to a single node.
  • Sim & agent workers run as ordinary autoscaled pods alongside the media node pool.

Versioning & zero-disruption upgrades

Each customer upgrades on their own schedule by rolling their cluster's Helm release to a new image version. LiveKit's native connection draining makes this safe: on SIGTERM a node keeps its active rooms running, accepts new participants into those rooms, rejects brand-new rooms, and shuts down only once empty. The Helm chart sets terminationGracePeriodSeconds to 5 hours, so a live simulation session is never cut off by an upgrade.

Two deployment tiers: dedicated vs shared

A full cluster per customer is the right unit for high-security customers, but it duplicates a fixed per-cluster baseline (control plane, Redis, ingress, observability, and HA-floor nodes) for every org — which dominates cost for smaller customers. So the same control plane and Pulumi templates offer two tiers (ADR-0027). Crucially, version independence and logical isolation are preserved in both; the difference is physical isolation and cost.

Tier A — Dedicated cluster

One single-tenant cluster per customer (ADR-0021): full physical isolation, any version, region-pinned. For military, strict-HIPAA, and data-residency customers — the higher baseline is a feature they pay for.

Tier B — Shared, namespace-isolated

Generic infrastructure is shared (LiveKit media pool, managed control plane, Redis with per-tenant logical isolation, ingress, observability); each customer gets its own namespace running its own pinned sim-engine/agent version. The fixed baseline amortizes across tenants.

Versioning lives at the app layer

Customers pin our product (sim engine, agents) via per-namespace Helm releases — so different versions coexist in a shared cluster. LiveKit server is shared infrastructure, kept homogeneous across co-located tenants.

Why the shared tier matters

For a customer on a $30–50K license, a dedicated cluster's ~$600/mo fixed baseline is the largest avoidable cost — egress and real compute scale with usage and aren't reducible by sharing. Amortizing that baseline across N tenants (≈ baseline ÷ N) is what makes the unit economics work for the majority who don't need physical isolation. Compare the two side by side on the Cost Model page.

Onboarding is a Pulumi stack

This is the cleanest justification for infrastructure-as-code: adding a new organization is provisioning a new parameterized stack.

Stack per customer

Each org is a Pulumi stack parameterized by (cloud, region, version, sizing).

Automated at signup

Onboarding a new org triggers provisioning of its stack; teardown is equally codified.

Special requirements = parameters

“Must be on AWS,” “must be in Frankfurt,” “pin to 1.4.5” become stack inputs an AI agent can set.

Observability & security

Dashboards

LiveKit exports Prometheus metrics natively; the sim engine, agents, and control plane add OpenTelemetry. Grafana shows live sessions, per-customer and per-region load, node/room utilization, and worker queue depth.

Security by default

Default-deny network policies, namespace isolation per plane, KMS-backed secret stores for BYOK, private clusters, RBAC, encryption in transit and at rest, and per-region BAAs.

Per-customer state

On Tier A each cluster owns its Redis, secrets, and PHI/runtime plane. On Tier B state is logically isolated per namespace (Redis keyspace/ACL, scoped secrets, default-deny NetworkPolicies).