Spaces:
Running
User Story 009: Web Worker Architecture (Main-thread Safe Web Library)
Story
As a user building robotics UIs that also render live camera previews and interactive controls
I want @lerobot/web to run heavy control/recording work off the main thread
So that my UI stays smooth (no flicker/jank) even when teleoperation and recording are active
Background
The current browser implementation runs teleoperation control loops, dataset assembly, and export logic on the main thread. When activating keyboard teleoperation while previewing a camera stream, the preview can flicker due to main-thread contention. This is a UX blocker for real-world apps that combine live video, UI interactions, and hardware control.
A worker-based architecture lets us move CPU-intensive, frequent, or bursty work off the main thread. The main thread remains responsible for DOM, video rendering and user interactions. The library must preserve the existing API (calibrate(), teleoperate(), record()) while transparently using workers when available, and cleanly falling back to the current approach otherwise.
Goals
- Identical public API to todayβs
@lerobot/web(no breaking changes) - Main-thread safe by default: heavy or frequent work executes in a Web Worker
- Graceful fallback when workers or specific APIs arenβt available
- Type-safe, minimal-copy message protocol using Transferables when possible
- Strict library/demo separation: UI and storage remain in demos
- Maintain Python lerobot UX parity and behavior
Non-Goals (for this story)
- Changing dataset formats or camera acquisition approach
- Rewriting Web Serial API usage into worker (browser support is limited in workers)
- Introducing new external dependencies
Acceptance Criteria
- Smooth UI under load:
- With at least one active camera preview and keyboard teleoperation at 60β120 Hz, the preview does not flicker and UI remains responsive at ~60 FPS
- API compatibility:
calibrate(),teleoperate(),record()signatures and return shapes are unchanged- Feature-detect workers; automatically use worker-backed runtime when available, otherwise use current main-thread runtime
- Clear separation of responsibilities:
- Worker executes control loops, interpolation, dataset assembly, export packaging, and CPU-heavy transforms
- Main thread owns DOM/UI and browser-only APIs that are unavailable in workers (e.g., Web Serial write calls)
- Type-safe protocol:
- Strongly typed request/response messages with versioned
typefields; Transferable payloads used for large data
- Strongly typed request/response messages with versioned
- Reliability & fallback:
- If the worker crashes or becomes unavailable, operations fail gracefully with descriptive errors and suggest retry
- Fallback path (main-thread) is automatically used when worker creation fails
- Tests & docs:
- Unit tests cover protocol routing and basic round-trips
- Planning docs updated; README notes main-thread-safe architecture
Architecture Overview
Worker Boundaries
- Execute in Worker:
- Control loop scheduling and target computation for teleoperation (keyboard/direct and future teleoperators)
- Episode/frame buffering and interpolation (regularization) for recording
- Dataset assembly (tables/metadata), packaging (ZIP writer), and background export streaming
- Lightweight telemetry aggregation for UI
- Execute on Main Thread:
- DOM, UI, and camera previews (
<video>elements) - Web Serial API read/write bridge (if browser does not permit worker access)
- MediaRecorder handling (browser-optimized implementation already off main CPU in many engines)
- DOM, UI, and camera previews (
Threading Model
- Main thread spawns one worker per βprocessβ instance as needed:
- TeleoperationProcess β TeleopWorker
- RecordProcess β RecordWorker (can be shared or composed with teleop worker depending on lifecycle)
- The public process objects returned from
teleoperate()/record()are proxies. Method calls post messages to the worker and return promises where appropriate. - SerialBridge (main-thread): worker requests motor write/read; main thread performs Web Serial operations and returns results. This preserves worker advantages while respecting browser API constraints.
Message Protocol (Typed)
All messages include a discriminant type and a requestId when a response is expected.
- Teleoperation (examples):
teleop/start,teleop/stopteleop/update_key_state{ key, pressed }teleop/move_motor{ motorName, position }teleop/state_update{ motorConfigs, keyStates, lastUpdate } (worker β main)serial/write_position{ id, position } (worker β main) βserial/ack
- Recording (examples):
record/start,record/stop,record/next_episoderecord/frame_append{ payload transferable }record/export_zip{ options } β streaming progress events
- Error & lifecycle:
worker/error,worker/ready,worker/teardown
Use Transferables (ArrayBuffer/MessagePort) for large payloads to avoid copies.
File Structure (web package)
packages/web/src/
βββ workers/
β βββ teleop.worker.ts # Teleoperation control loop
β βββ record.worker.ts # Recording assembly/export
β βββ protocol.ts # Message types & guards
β βββ utils.worker.ts # Worker-side helpers (interpolation, zip)
βββ bridges/
β βββ serial-bridge.ts # Main-thread serial proxy for workers
βββ teleoperate.ts # Spawns worker, returns proxy process
βββ record.ts # Spawns worker, returns proxy process
βββ types/
βββ worker.ts # Public worker-related types (narrow)
Lifecycle & Fallback
- On
teleoperate()/record()call:- Try to instantiate corresponding worker via
new Worker(new URL(...), { type: 'module' }) - If success: wire protocol channels and return proxy-backed process
- If fail: fall back to current main-thread implementation (no behavioral changes)
- Try to instantiate corresponding worker via
- On
process.stop()or page unload: sendworker/teardownand terminate the worker
Performance Notes
- Control loop cadence generated inside worker to avoid main-thread timers
- Batch serial commands from worker to main-thread bridge to minimize postMessage overhead
- Use coarse-to-fine update: high-rate calculations in worker; lower-rate UI state updates to main thread (e.g., 10β20 Hz) for rendering
- For export, stream chunks from worker; main thread triggers download or HF upload
Error Handling
- All request/response messages enforce timeouts with descriptive errors
- Worker initialization guarded with feature detection and clear fallback
- Protocol version field enables future evolution without breaking older callers
Phased Implementation Plan
Phase 1: Dataset & Export Offload (Low Risk)
- Move episode interpolation, dataset assembly, and ZIP packaging to
record.worker.ts - Main thread keeps MediaRecorder and camera preview as-is
- Public API unchanged; verify ZIP download and HF upload via streamed messages
Phase 2: Teleoperation Offload with SerialBridge
- Move control loop scheduling and target computation to
teleop.worker.ts - Implement SerialBridge on main thread for Web Serial commands
- Worker posts motor write requests; main thread executes and responds
- Throttle state updates to UI while maintaining high-rate control internally
Phase 3: Fine-Grained Optimizations
- Introduce Transferables for large buffers
- Optional OffscreenCanvas pipelines for future video transforms (not required for current scope)
- Tune batching and message cadence under hardware testing
Phase 4: Reliability & Observability
- Heartbeat messages and auto-restart policy for worker failures
- Dev diagnostics toggles; production minimal logging
Risks & Mitigations
- Web Serial availability in workers: use main-thread SerialBridge (design accounts for this)
- Message overhead at high Hz: batch commands and reduce UI state update frequency
- Browser differences: feature-detect and test on Chromium, Firefox (where supported), Safari Technology Preview
Definition of Done
- UI remains smooth with active camera preview and keyboard teleoperation; no flicker observed in manual tests
- Worker-backed runtime enabled by default when available; fallback path verified
calibrate(),teleoperate(),record()maintain identical signatures and behavior- Typed protocol implemented with Transferables where applicable
- Unit tests for protocol routing and error timeouts
- Documentation updated (this user story + README note)