Platform Design

Real-Time Sim Engine

The authoritative server that owns the simulation: patient physiology, 3D positions, and timed events. Clients are thin renderers; the engine is the single source of truth and runs alongside the LiveKit rooms.

Status: living draft (in progress)

A working capture of the sim-engine design discussion. The decisions here are tracked formally on the Architecture Decisions page (ADR-0011 through ADR-0019). Expect this to evolve.

What it is, and why it exists

Today this logic lives inside the Unreal Engine application: the game server holds the state, replicates it, and clients receive updates. A core goal of this project is to lift that logic out of the game engine and onto a real-time API service.

Authoritative

The server owns world state. Clients send inputs and render; they never hold the source of truth.

Engine-agnostic

Thin clients on Unreal, Unity, React-Three-Fiber, Expo + WebGPU, or even chat / voice-only all consume the same state.

AI-buildable

Out of the game engine, the system is far easier to build with AI agents and ships without app-store friction.

How LiveKit scaling actually works

A common misconception is “one Docker container per room.” That is not how LiveKit works.

A single livekit-server process hosts many rooms at once — a room is a lightweight in-memory construct, not a container.
Nodes form a cluster coordinated through Redis; each node publishes its load and the rooms it owns.
A room is pinned to one node (chosen by least load); all participants in that room connect there. One room does not span nodes in open-source LiveKit.
You scale by adding more nodes, not more containers per room.

LiveKit does not provision infrastructure

LiveKit places rooms onto nodes that already exist, but it does not spawn VMs or containers itself. Growing and shrinking the node pool is the orchestrator's job (Kubernetes HPA + cluster-autoscaler, or VM autoscaling groups) — an open decision tracked in ADR-0016.

It is a game server, not a physics engine

The instinct that the engine “must be blazing fast because it ticks many times per second” is only half right. Breaking down what the server is authoritative for changes the picture:

Workload	Frequency / cost	Who drives it
Physiology (vitals, event scheduling)	Low frequency, low compute (1–10 Hz)	Server
Human avatar transforms	High frequency, but I/O-bound fan-out	Client-originated, server relays
AI-driven entities (nurse walking)	A few entities, light steering	Server

So the engine is a modest-frequency state machine plus a relay plus a few lightly-simulated entities — not a rigid-body physics solver. That is what makes TypeScript a sound choice for the entity counts medical sim actually has.

Core architecture: ECS + a fixed-timestep tick

Entity-Component-System

World state is entities (patient, avatars, equipment) composed of data components (transform, vitals, animation-state). AI-legible and the standard for this kind of simulation.

Authoritative clock

A drift-corrected, fixed-timestep loop derives sim time from a monotonic source — decoupled from wall-clock to allow pause/resume and time-scaled debrief replay.

Event scheduler

Events fire against sim time (“the patient codes at t=95s”) with accuracy far tighter than medical sim needs.

Logging every state-changing event against sim time (event sourcing, ADR-0014) gives debrief replay for free, and the same stream projects into the xAPI tracking spine from the content model.

State sync over LiveKit data channels

The engine joins each room as a server-side participant and publishes state over the same WebRTC connection clients already hold — no parallel transport, one auth model, one connectivity story.

State	Channel	Why
3D transforms (positions, poses)	Lossy / unordered	High-frequency; newest wins, like UDP game netcode
Discrete events (code, med delivered, attach)	Reliable	Must not be dropped or reordered

The same room hosts media, agents, and sim

Human participants, AI agents (nurse / patient / evaluator), and the sim engine are all participants in one LiveKit room. Audio/video flows as media; world state flows as data.

One ECS graph — authored and ticked

The most important concept: authoring and running are the same operation on the same data. There is no separate editor format and runtime format — there is one entity/component graph. Designing a scenario, spawning a defib in VR, and the live simulation are all mutations of that graph.

World / Environment the ICU room: bounds, spawn points, ambient state
  └─ Entities (instances) patients, staff, family, equipment, fixtures
      └─ Components (data) transform · vitals · physiology · animation-state ·
                      interactable · attachment · authority · render-ref (prefab id)
  + Systems (logic) physiology tick · steering · scheduler · rules · state-sync
  + Scenario definition declarative initial graph + scheduled events + rules

Three properties make it customer-authorable with AI agents:

Data-driven — Entities are instances of prefabs. 'Spawn a defib and set it on the cart' = instantiate the defib prefab, then write its transform (and an attachment to the cart). No code — just graph edits.
Composable — An ICU with 2 patients, 3 nurses, a doctor, and a family member is just an initial set of entities. Adding a family member or ordering a medication mid-session is the same kind of edit.
Scriptable, safely — Common logic is declarative rules / event graphs; anything needing real code runs in a sandbox with a capability-scoped API (ADR-0019), since customer- and AI-generated logic can't run unsandboxed in a multi-tenant healthcare service.

How it snaps onto the content model

The authoring model reuses machinery already decided in the Content & Versioning Model:

A scenario is structured content (the sim activity inside a lesson), so it inherits Wikipedia-style versioning, forking, and semantic diffs.
Prefabs are versioned content packages with declared capability requirements, so a scenario only loads prefabs its runtime and clients can support — “spawn defib” degrades gracefully instead of breaking.
Thin clients map a prefab id → local render asset and render whatever the components say; the authoritative graph stays server-side.

So a customer's AI agent authors a scenario by generating and editing the entity/component graph plus declarative rules — the exact artifacts the engine ticks at runtime.

Tech stack: TypeScript, with a native escape hatch

The engine is written in TypeScript/Node (ADR-0015) to:

Unify the stack — web portal, R3F 3D client, Expo mobile, LiveKit Node Agents, and Pulumi infra are all TypeScript.
Share types end to end — content model, capability contracts, and world/session schemas are one shared package consumed by server and clients.
Maximize AI-codegen velocity — the most training data and tooling, and the engine is meant to be built and run by AI agents.

The escape hatch

The core is kept transport-agnostic and ECS-structured, so any future CPU hot-path (a real physiology ODE solver, crowd simulation) can be extracted into a Go sidecar or Rust/WASM module without rewriting the engine. We commit to TypeScript now for velocity, without painting ourselves into a corner.

Execution & orchestration

Like LiveKit's own nodes and agent workers, the engine runs as an autoscaled worker pool (ADR-0016): each worker hosts several sessions, sessions shard across workers, and the pool scales with demand — cheaper and faster than a container per session.

Control plane

On session start, an orchestrator ensures a LiveKit room, assigns a sim session to a worker, dispatches the needed AI agents, and mints scoped tokens.

Agents as participants

AI nurse / patient / evaluator are LiveKit Agents that join the room and subscribe to sim state and audio.

Sim workers

A pool of TypeScript workers ticking authoritative worlds, sharded by session and autoscaled.

Open questions & next decisions

Orchestration platform — Kubernetes (AKS/GKE) vs Pulumi + VM autoscaling for the node pool and worker pools.
Session lifecycle — the exact choreography from a Sessions record to a running room + sim + agents, including open vs invite-only join.
Tick & transform rates — concrete Hz targets for sim vs transform streams, and interpolation strategy on clients.
Scripting surface — the shape of the sandbox API and rules format authors and AI agents target.