Platform Design

Content & Versioning Model

A lesson is the core, portable unit of the platform. This page captures how lessons are authored, versioned, distributed to tenants, and kept compatible across web, VR, and game clients — and across evolving AI services.

Status: living draft (in progress)

This page is a working capture of an ongoing design discussion, not a finished spec. It exists to ground future architecture work and AI context. Expect it to evolve; open questions are listed at the bottom.

The core reframe: independently versioned planes

Today everything ships as one blob, so any change is a forced change. The “forced upgrade” pain is a coupling problem, not an isolation problem. The fix is to split “the product” into independently versioned axes, each with its own cadence and its own pinning:

Platform / runtime version — The APIs, sim engine, LiveKit, and the web / VR / game clients.
Content schema version — The format and contract a lesson is authored against.
Content package version — The actual lesson bytes — both our upstream library and the tenant's customized fork.
Tenant pin / channel — What each customer has chosen to take (pinned version or release channel).

Once these are separate, “the customer takes an upgrade” becomes an explicit, per-axis version bump — never a fleet-wide forced rollout. The reference experience is a self-updating desktop app: it ships often, you choose when to take it, and it doesn't break your existing work because the new runtime stays backward-compatible within a major version.

Two planes, joined at a seam

Every change is one of two kinds, and they have very different rules.

Editorial / content plane

Text, objectives, assessment checklists, lab images, vital sets, patient prompts. Reviewable, mergeable, and never breaks the runtime. Largely immutable once published; low / no PHI.

Capability / runtime plane

New features, bug fixes, breaking changes between content and clients / services. Governed by capability negotiation; can be breaking, so changes here are version-gated.

The seam

A lesson revision declares the capabilities it depends on. An edit that starts using a new capability is both a content revision and a bump in the lesson's capability requirement.

That declaration is what lets the runtime answer “yes / no, I can run this” instead of breaking silently in front of a learner.

Distribution: the package-manager model

Treat the default library exactly like an upstream open-source package, and the tenant's customization like a fork with a lockfile.

Publish — We publish versioned content packages to a registry with semantic versions, e.g. scenario-sepsis@2.4.1.
Fork — Tenants fork to customize; they now own a downstream branch of that scenario.
Sync from upstream — When we ship 2.5.0, the tenant gets a 'sync from upstream' prompt — they choose to merge, with conflict resolution against their customizations.
Pin & channel — Tenants pin versions and subscribe to a release channel (stable / beta / pinned). Our release cadence is decoupled from their adoption.

This single pattern solves both stated pains at once: customer-controlled upgrades and content versioning that preserves customization.

The compatibility contract

Borrowed from package.json engines / capability negotiation — this is what stops a lesson from breaking when services change:

Each content package declares what it requires: requires: { schema: "^3", capabilities: ["sim.vitals.v2", "branching.v1"] }.
Each runtime / client advertises the capabilities it provides.
A lesson only launches if the contract is satisfiable; otherwise the tenant sees a clear “this lesson needs runtime ≥ X” message instead of a silent failure.
Breaking change = major version bump. Tenants on the old major keep working; upgrading is an explicit, migration-gated action.

Why this matters doubly for VR / game clients

VR and game-engine clients update on app-store / Steam cadences we don't fully control. Capability negotiation — not “everyone is on the same version” — is the only sane way to keep heterogeneous clients interoperating.

Authoring UX: Wikipedia over git

The versioning must feel familiar to clinician-authors, not developers. The key is to separate storage semantics from presentation.

Underneath: git-like

Immutable revisions, content-addressing, branch / merge, full lineage. Never exposed to the user.

On top: Wikipedia-like

Page history, “who changed what and when,” side-by-side diffs, one-click revert, “an update is available.” Zero version-control literacy required.

Structured, not blobs

Because a lesson is structured fields, diffs are semantic (“Vitals: HR 92 → 110”), giving granularity and friendliness at once.

The upstream merge becomes a field-level three-way merge, presented gently: “We updated 3 sections of the library scenario. You've customized 1 of them — review that one; we'll auto-apply the other 2.”

One store, three consumers

Structured + versioned content is also the best substrate for AI grounding: it's curated, it evolves, and it's diffable. The same store serves authors (who edit it), runtime clients (who render it), and AI agents (who ground on it).

Standards posture: clean core, standards at the edges

The internal content model stays clean; interoperability standards live at the boundaries as adapters.

Standard	Role	Where it lives
xAPI	Universal tracking spine (actor–verb–object → LRS) across every modality	Core
cmi5	Launch & packaging contract; Assignable Units launchable in any client (web, VR, game)	Core
SCORM	Export / import interop with customers' existing LMSs	Edge adapter
LTI 1.3 / Advantage	Connect into a customer LMS: deep linking, roster, grade passback	Edge adapter

SCORM's tracking model is dated and tightly coupled to a browser + LMS runtime, so it must not shape the core — cmi5 & xAPI fit the multi-client world far better.

Composition hierarchy

A multi-modal “lab” — videos → reading → quiz → sim scenario → debrief → quiz — is just a sequenced lesson of mixed activities. It maps onto cmi5's hierarchy almost 1:1:

Course
  └─ Lesson / Module  (cmi5 "Block")
      ├─ Activity: video  (cmi5 "AU")
      ├─ Activity: reading
      ├─ Activity: quiz
      ├─ Activity: sim scenario
      └─ Activity: debrief

Each Activity is independently versioned, capability-declaring, and launchable on the appropriate client. Sequencing / gating rules (must-pass-quiz-before-sim) live at the lesson level. No new primitive is needed for a “lab” — a good sign the model is right.

AI provider abstraction (per-lesson, per-agent)

AI services change constantly, so a lesson must depend on capabilities, not model strings. The config tree is Lesson → Scenario → Agent (patient) → { llm, tts, stt }, each independently selectable.

The canonical case: ElevenLabs v2 → v3

v3 adds emotion tags like [whispers] but removes the old character-escaping behavior — an additive capability and a breaking input-contract change in one release. A lesson pinned to v2 keeps working forever; opting into emotion tags means opting into a capability (tts.emotion-tags.v1), which implies a v3-class provider and a new escaping contract. The rule flip lives in a versioned provider adapter, not scattered through content.

Selection should be constraint-driven to avoid runtime footguns (e.g. Deepgram strong at STT for one language, ElevenLabs better at TTS for another):

Author picks language + desired features for the agent.
System shows only compatible providers / models (capability profiles filtered by that constraint).
Author pins one or accepts the default.

The LLM side can ride a provider-neutral SDK (e.g. Vercel AI SDK); TTS / STT use a thin equivalent with the same capability-profile shape (languages, streaming, latency class, features).

BYOK: declare, supply, resolve

The rule that keeps lessons portable and credentials secure:

Lessons declare

A lesson references providers / models abstractly (provider: elevenlabs, capability: emotion-tags-v1). Never contains keys, endpoints, or tenant specifics — so it stays shareable, exportable, and groundable.

Tenants supply

BYOK credentials (API tokens, custom endpoint URLs) live in a tenant-scoped, encrypted, strongly-isolated secret store — in the runtime / PHI plane, never in content packages.

Runtime resolves

At launch, the runtime binds abstract provider refs to concrete tenant credentials. Default to vendor-managed keys, with BYOK as an override (differs for billing, quota, and the HIPAA BAA chain).

Localization is a first-class axis

Localization runs through three distinct layers — don't collapse them:

UI chrome — Portal labels; classic i18n (ICU message catalogs), per-user locale.
Content — Per-lesson locale variants within the structured model; versioning works per-locale, so 'the English changed' can flag 'the French is now stale.'
AI capability + prompts — Language constrains valid LLM / TTS / STT providers (the support gap problem) and prompts often need per-language tuning, not literal translation.

Language is therefore a first-class axis woven through content variants, provider selection, and prompt authoring — retrofitting locale-aware versioning later is brutal, so it's designed in from the start.

The isolation payoff

This reframing answers the tenancy question by splitting data into two planes with different isolation needs:

Plane	Contents	Isolation strategy
Content / catalog	Vendor upstream + tenant forks; versioned, largely immutable, low / no PHI	Pooled, CDN-distributable
Runtime / results	Learner identity, attempts, xAPI statements, sim recordings, BYOK secrets — PHI lives here	Strong (silo / bridge) isolation

So we likely don't need one isolation model — we need a different one per plane.

Open questions

Forking semantics — when a tenant customizes, is it a copy, an override layer, or a true branch? This drives how painful upstream-merge becomes.
Standalone-ness — must a lesson be a fully offline-runnable bundle (important for VR / game clients), or may it call home at runtime?
Authoring scope — do tenants author net-new content, only customize ours, or is there a co-author / marketplace ambition?
Merge UX — exactly how “take the library update” looks to a non-technical author.
Launch-resolution flow — capability check → provider / key binding → xAPI emission, end to end.