Platform Design
Scaling & Orchestration
How the platform scales across customers, regions, and versions: one single-tenant Kubernetes cluster per customer, provisioned with Pulumi and orchestrating the LiveKit rooms and sim-engine workers inside it.
Status: accepted design
The model in one picture
Org router · geo DNS · session orchestration · billing
Mercy Health
AWS · us-east-1
Charité Berlin
GCP · europe-west3
NHS Trust
Azure · uk-south
One single-tenant cluster per customer — independent cloud, region, and version. The control plane routes each org to its own cluster.
Pulumi and Kubernetes are different layers, not a choice
The recurring question “Pulumi or Kubernetes?” is a category error — they stack:
Pulumi — IaC layer
Provisions the cloud account, network, managed cluster, node pools, Redis, DNS, and secrets — and deploys the in-cluster workloads (including Helm releases) via its Kubernetes provider. All in TypeScript.
Kubernetes — workload layer
Runs and autoscales LiveKit nodes, sim-engine workers, and agent workers on the cluster Pulumi created. Managed control planes (AKS/GKE/EKS) mean we never run the control plane ourselves.
Result
One AI-writable IaC tool provisions many small, homogeneous per-customer clusters; Kubernetes orchestrates each one.
One cluster per customer (the cell is the customer)
Because each customer is an entire organization — a hospital or medical school with hundreds or thousands of users — operating in one region on their own pinned version, the deployment unit collapses to a single-tenant cluster per customer. This dissolves several hard problems at once:
- Version independence — Org A on
1.4.5and Org B on1.4.9are just different clusters with different Helm release versions. No mixed-version cluster, which is exactly what LiveKit's “homogeneous instances” model wants. - Data residency — the cluster lives in the customer's region, full stop.
- Isolation — a single-tenant cluster is the strongest tenant boundary there is, ideal for HIPAA/PHI, and one customer's blast radius can't reach another's.
- Trivial routing — no cross-region selector needed; the control plane is a simple
org → clusterlookup.
The cost trade-off is deliberate
How LiveKit expects to run on Kubernetes
Grounded in LiveKit's own Kubernetes and distributed docs, the media plane has specific requirements:
- Host networking, one pod per node. LiveKit pods need direct network access so the node's
rtc.udp/tcpports are handled by the server — which limits it to one LiveKit pod per node (other workloads may co-reside). - TLS at the ingress. The official Helm chart terminates signal-connection TLS at the cloud LB (GKE LB, AWS ALB via the Load Balancer Controller, or nginx-ingress + cert-manager). A separate SSL cert is still needed for the embedded TURN/TLS server.
- Redis coordinates the cluster. Nodes report load to Redis; new rooms are placed on the least-loaded node; a room is pinned to a single node.
- Sim & agent workers run as ordinary autoscaled pods alongside the media node pool.
Versioning & zero-disruption upgrades
Each customer upgrades on their own schedule by rolling their cluster's Helm release to a new image version. LiveKit's native connection draining makes this safe: on SIGTERM a node keeps its active rooms running, accepts new participants into those rooms, rejects brand-new rooms, and shuts down only once empty. The Helm chart sets terminationGracePeriodSeconds to 5 hours, so a live simulation session is never cut off by an upgrade.
Two deployment tiers: dedicated vs shared
A full cluster per customer is the right unit for high-security customers, but it duplicates a fixed per-cluster baseline (control plane, Redis, ingress, observability, and HA-floor nodes) for every org — which dominates cost for smaller customers. So the same control plane and Pulumi templates offer two tiers (ADR-0027). Crucially, version independence and logical isolation are preserved in both; the difference is physical isolation and cost.
Tier A — Dedicated cluster
One single-tenant cluster per customer (ADR-0021): full physical isolation, any version, region-pinned. For military, strict-HIPAA, and data-residency customers — the higher baseline is a feature they pay for.
Tier B — Shared, namespace-isolated
Generic infrastructure is shared (LiveKit media pool, managed control plane, Redis with per-tenant logical isolation, ingress, observability); each customer gets its own namespace running its own pinned sim-engine/agent version. The fixed baseline amortizes across tenants.
Versioning lives at the app layer
Customers pin our product (sim engine, agents) via per-namespace Helm releases — so different versions coexist in a shared cluster. LiveKit server is shared infrastructure, kept homogeneous across co-located tenants.
Why the shared tier matters
N tenants (≈ baseline ÷ N) is what makes the unit economics work for the majority who don't need physical isolation. Compare the two side by side on the Cost Model page.Onboarding is a Pulumi stack
This is the cleanest justification for infrastructure-as-code: adding a new organization is provisioning a new parameterized stack.
Stack per customer
Each org is a Pulumi stack parameterized by (cloud, region, version, sizing).
Automated at signup
Onboarding a new org triggers provisioning of its stack; teardown is equally codified.
Special requirements = parameters
“Must be on AWS,” “must be in Frankfurt,” “pin to 1.4.5” become stack inputs an AI agent can set.
Observability & security
Dashboards
LiveKit exports Prometheus metrics natively; the sim engine, agents, and control plane add OpenTelemetry. Grafana shows live sessions, per-customer and per-region load, node/room utilization, and worker queue depth.
Security by default
Default-deny network policies, namespace isolation per plane, KMS-backed secret stores for BYOK, private clusters, RBAC, encryption in transit and at rest, and per-region BAAs.
Per-customer state
On Tier A each cluster owns its Redis, secrets, and PHI/runtime plane. On Tier B state is logically isolated per namespace (Redis keyspace/ACL, scoped secrets, default-deny NetworkPolicies).