NERDDISCO commited on
Commit
41bd3fd
·
1 Parent(s): 9ab266c

docs(story): try out act to control the arm

Browse files
Files changed (1) hide show
  1. docs/planning/008_experiment_act.md +179 -0
docs/planning/008_experiment_act.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # User Story: Validate ACT Inference with LeRobot (Python) on Real Robot
2
+
3
+ ## Summary
4
+
5
+ As the maintainer of **LeRobot.js**, I want to **prove that an ACT policy controls my robot end‑to‑end using LeRobot (Python)** before I invest in the ONNX Runtime Web port. This experiment must run a well‑documented ACT checkpoint, drive the robot in a simple task, and capture latency/fps so we know the approach is viable.
6
+
7
+ ## Goals
8
+
9
+ - Use a **well‑documented ACT checkpoint** (ALOHA Transfer‑Cube) as the baseline.
10
+ - Run **LeRobot’s policy server + robot client** for ACT inference on my robot.
11
+ - Achieve **stable control at ≥ 15 fps** with safe motions.
12
+ - Record **metrics (fps, latency)** and a short **video**.
13
+
14
+ ## Non‑Goals
15
+
16
+ - No training or data collection.
17
+ - No browser/ONNX work yet.
18
+ - No multi‑robot orchestration.
19
+
20
+ ## Environment & Dependencies
21
+
22
+ - Python environment managed with `uv`
23
+ - Packages (minimum):
24
+ - `lerobot` (from GitHub/Hub per official install instructions)
25
+ - `torch`, `torchvision` (matching CUDA/CPU build)
26
+ - `opencv-python`, `pyserial`, `numpy`, `tqdm`, `pyyaml`
27
+ - Hardware:
28
+ - **SO‑100** robot (or your target robot) connected via USB
29
+ - External camera (USB/webcam) aimed at the workspace
30
+
31
+ ## Acceptance Criteria
32
+
33
+ 1. **Model loads** in the policy server without errors.
34
+ 2. **Robot client connects** and streams observations to the server.
35
+ 3. **Inference loop runs ≥ 15 fps**, reporting average **model latency < 60 ms** and **end‑to‑end loop < 100 ms** on a laptop.
36
+ 4. The robot performs **smooth, safe motions** for at least **30 seconds** without stalls.
37
+ 5. Metrics (`metrics.json`) and a **short video** (10–20 s) are produced and saved in the experiment folder.
38
+ 6. If the robot is disconnected, the system **falls back to simulation** without crashing.
39
+
40
+ ## Experiment Folder Layout
41
+
42
+ ```
43
+ experiments/act-inference-python/
44
+ scripts/
45
+ setup_env.sh
46
+ start_policy_server.sh
47
+ start_robot_client.sh
48
+ configs/
49
+ policy_server.yaml # model path, host/port, normalization, chunk size
50
+ robot_client.yaml # robot type/port, camera device, fps target
51
+ logs/
52
+ policy_server.log
53
+ robot_client.log
54
+ artifacts/
55
+ metrics.json
56
+ demo.mp4
57
+ README.md
58
+ ```
59
+
60
+ ## Procedure
61
+
62
+ ### 1) Setup
63
+
64
+ - Create and activate environment with uv:
65
+ ```bash
66
+ uv venv act-py --python 3.10
67
+ source act-py/bin/activate # On Windows: act-py\Scripts\activate
68
+ uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 # or cpu wheels
69
+ uv pip install opencv-python pyserial numpy tqdm pyyaml
70
+ # Install LeRobot (per upstream instructions)
71
+ uv pip install "git+https://github.com/huggingface/lerobot.git"
72
+ ```
73
+ - Verify camera and serial access:
74
+ ```bash
75
+ python - << 'PY'
76
+ import cv2, serial.tools.list_ports; print("cams ok?", cv2.getBuildInformation() is not None)
77
+ print("serial ports:", [p.device for p in serial.tools.list_ports.comports()])
78
+ PY
79
+ ```
80
+
81
+ ### 2) Calibrate robot
82
+
83
+ - Use the LeRobot calibration utility for your robot (SO‑100 or your target) and **save calibration data**.
84
+ - Confirm you can **tele‑operate** the robot (keyboard/joystick) for a quick smoke test.
85
+
86
+ ### 3) Test basic robot control first
87
+
88
+ - Calibrate the robot using LeRobot Python:
89
+
90
+ ```bash
91
+ python -m lerobot.calibrate \
92
+ --robot.type=so100_follower \
93
+ --robot.port=/dev/ttyACM0 # or your actual port
94
+ ```
95
+
96
+ - Test teleoperation to ensure robot works:
97
+ ```bash
98
+ python -m lerobot.teleoperate \
99
+ --robot.type=so100_follower \
100
+ --robot.port=/dev/ttyACM0 \
101
+ --teleop.type=keyboard # or gamepad if available
102
+ ```
103
+
104
+ ### 4) Run ACT policy evaluation
105
+
106
+ - Use the LeRobot evaluation script with an ACT model:
107
+ ```bash
108
+ python -m lerobot.scripts.eval \
109
+ --policy.path=lerobot/act_aloha_sim_transfer_cube_human \
110
+ --env.type=aloha_sim_transfer_cube \
111
+ --eval.batch_size=1 \
112
+ --eval.n_episodes=5 \
113
+ --device=cuda # or cpu
114
+ ```
115
+ - For real robot evaluation (if supported):
116
+ ```bash
117
+ python -m lerobot.scripts.eval \
118
+ --policy.path=lerobot/act_aloha_sim_transfer_cube_human \
119
+ --env.type=real_world \
120
+ --robot.type=so100_follower \
121
+ --robot.port=/dev/ttyACM0 \
122
+ --eval.batch_size=1 \
123
+ --eval.n_episodes=3 \
124
+ --device=cuda
125
+ ```
126
+
127
+ ### 5) Create custom inference script
128
+
129
+ - Create a simplified inference script based on the evaluation example:
130
+
131
+ ```python
132
+ # custom_act_inference.py - simplified version for testing
133
+ import torch
134
+ from lerobot.common.policies.factory import make_policy
135
+
136
+ # Load policy
137
+ policy = make_policy(policy_path="lerobot/act_aloha_sim_transfer_cube_human")
138
+ policy.eval()
139
+
140
+ # Run inference loop with robot
141
+ # ... (see examples/2_evaluate_pretrained_policy.py)
142
+ ```
143
+
144
+ ### 6) Observe & record
145
+
146
+ - Record a **10–20 s video** of the behavior (screen + robot) and save to `artifacts/demo.mp4`.
147
+ - Collect metrics from the evaluation output (LeRobot eval script provides detailed metrics).
148
+ - Check the generated `eval_info.json` for performance data.
149
+
150
+ ### 7) Tuning if needed
151
+
152
+ - If fps < 15:
153
+ - Reduce image size (e.g., 224×224), drop sequence horizon.
154
+ - Lower camera fps (e.g., 30 → 20).
155
+ - Ensure USB bandwidth is not saturated.
156
+ - If motions are jerky:
157
+ - Smooth actions (EMA) in client; clamp deltas per step.
158
+ - Verify calibration and units match the policy’s action space.
159
+
160
+ ### 8) Exit & cleanup
161
+
162
+ - Stop any running processes; ensure robot is safely positioned and torque is disabled.
163
+
164
+ ## Deliverables
165
+
166
+ - `artifacts/eval_info.json` with evaluation metrics from LeRobot
167
+ - `artifacts/demo.mp4` short clip of robot behavior
168
+ - `artifacts/RESULTS.md` summary of findings and next steps for LeRobot.js
169
+
170
+ ## Risks & Fallbacks
171
+
172
+ - **I/O schema mismatch** (obs/action names or shapes): add a small adapter layer in client to map to the policy’s expected schema.
173
+ - **Camera latency**: prefer MJPEG or raw; set fixed resolution; check exposure.
174
+ - **Serial jitter**: set consistent baud rate; use non‑blocking writes; cap action deltas.
175
+ - **Model not compatible with robot**: switch to a simpler behavior checkpoint or run in simulation to validate the server–client link first.
176
+
177
+ ## Definition of Done
178
+
179
+ - The ACT checkpoint controls the robot (or the simulator) via LeRobot Python with stable fps and safe motion, producing metrics and a demo video. This de‑risks the next step: **export to ONNX and port to ORT Web**.