Operations

Cost Model

An estimate of what the per-customer-cluster architecture costs to run. The assumptions are placeholders — edit them live as we get real numbers from Azure, GCP, and AWS.

These numbers are assumptions, not quotes

Every input below is an editable placeholder meant to be refined with real cloud pricing and observed usage. Treat the outputs as order-of-magnitude guidance for the one-cluster-per-customer model (ADR-0021), not a committed budget.

Interactive estimate

Adjust any assumption to see the per-customer and fleet-wide impact update immediately. Org size and peak concurrency drive the LiveKit and sim-engine node counts; managed services are monthly estimates and egress is derived from concurrency.

Deployment model
Per customer / mo: Dedicated $1,037 · Shared $655(−37%)

Assumptions

Org usage & demand

Capacity per node (from load tests)

Compute pricing

Managed services (monthly)

Network & commercial

Derived demand & node counts

Peak concurrent users100
Participants / session (users + agents)6
Concurrent sessions25
Concurrent participants150
LiveKit nodes1
Sim / agent nodes1
System nodes1

Infra / customer / mo

$1,037

Infra / customer / yr

$12,439

Gross margin / customer

$22,561

64% of license

Annual license

$35,000

Monthly breakdown / customer

LiveKit nodes (1 × $0.24/hr)$175
Sim / agent nodes (1 × $0.20/hr)$146
System nodes (1 × $0.10/hr)$73
Managed K8s control plane$75
Redis$50
Load balancer / ingress$20
Storage & logs$30
Observability$30
Network egress (~4,860 GB)$437
Total / mo$1,037

Fleet projection (25 customers)

Annual infra cost

$310,980

Annual revenue

$875,000

Annual gross margin

$564,020

How the model is built

The model is demand-driven: org size and peak concurrency determine how many concurrent sessions and participants a customer generates, which size two independent node pools.

  • Demand — peak concurrent users = total users × peak-concurrency %. Concurrent sessions = peak users ÷ avg users per session; concurrent participants also count the AI agents in each session.
  • LiveKit nodes — concurrent participants ÷ participants-per-node capacity. Media (bandwidth/CPU) is the bottleneck, and host networking means one LiveKit pod per node, so capacity is added in whole nodes.
  • Sim / agent nodes — concurrent sessions ÷ sessions-per-node capacity. The sim engine is light per session, so these pack far denser than LiveKit and scale independently.
  • Managed services — the Kubernetes control plane, Redis, load balancer/ingress, storage and logs, and the observability stack, each as a monthly figure.
  • Network egress — derived from concurrent participants × media bitrate × monthly usage hours × the cloud's per-GB price (the most usage-sensitive line, and the one to watch as concurrency grows).

Gross margin per customer is the annual license minus annual infra cost; the fleet projection multiplies by the customer count. As real telemetry arrives, the per-node capacities, media bitrate, and peak concurrency are the inputs most worth revisiting — they swing the node counts that dominate cost.

What drives cost up or down

  • Org size & concurrency — a small medical school with low peak concurrency may need only the HA-floor nodes, while a large hospital's peaks add LiveKit and sim nodes. This is why the model is keyed to users and concurrency, not fixed node counts.
  • Idle baseline — a dedicated cluster costs money even with no active sessions. Scale-to-low node pools and right-sized control planes keep the floor down between sessions.
  • Concurrency peaks — more simultaneous sessions add LiveKit and worker nodes; autoscaling means you pay for peaks, not 24/7 maximums.
  • Egress — audio/video and high-frequency transforms scale with participants and session hours; this is the line most likely to surprise at scale.
  • Cloud choice — node, egress, and managed-service prices differ across Azure, GCP, and AWS; the model is per-customer, so a customer's required cloud sets their cost basis.