feat(local-tool): M9-1 — bridge.py firmware_upgrade handler(KL520+KL720 KDP1→KDP2)

A 階段第一個 milestone、純 bridge.py 層 + ctypes 直接呼叫 KneronPLUS C symbol。

Source:
- server/scripts/kneron_bridge.py: 1207 → 2058 行(+851)
- server/scripts/test_kneron_bridge_firmware.py: 新檔 840 行、36 unit tests 全綠 0.076s

Firmware bundled:
- server/scripts/firmware/KL520/fw_loader.bin(90112 bytes、MD5 aef7cca17bc023abbd6152c46c18e774、與 warrenchen 一致)
- server/scripts/firmware/{KL520,KL720}/VERSION(v2.2.0)

實作對齊 TDD §6.1 規格(98% 對齊度):
- handler input/output schema 100%
- stage enum: preparing/loading/flashing/verifying/done/error(採 Design 命名)
- reason enum 7/8(disconnect_during_op 留 M9-5 實機測試)
- ctypes binding 1:1 對齊 warrenchen legacy_plus121_runner.py
- 4 個情境 stage 序列驗證通過(KL520 KDP1+loader / KL520 KDP1 缺 loader / KL720 legacy / 已 KDP2)
- timeout 60s/200s、USB stable 5-8s wait、SIGTERM 拒絕邏輯
- progress event schema 完整(percent/stage/message/elapsed_ms/eta_ms/extra)

Reviewer 兩輪審查:
- 第 1 輪:0 Critical / 3 Major / 4 Minor / 4 Suggestion
- 第 2 輪:通過 with 1 Minor + 1 Suggestion(m5 test 死碼 / s5 test 註解、留 M9-2 順手清)
- M3 firmware 字串覆蓋從 substring → 顯式 enumeration + KDP3+ forward-compat(防未來 brick 風險)
- M2 控制流重構(needs_loader/should_run_loader_stage/loader_required_but_missing 三個顯式 bool)
- m3 single-owner disconnect 原則完整落地

既有 6 個 handler(scan/connect/disconnect/reset/load_model/inference)零改動、無 spillover risk。

下一步:M9-2 Go driver UpgradeFirmware + firmware/service.go

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
jim800121chen 2026-05-25 08:10:46 +08:00
parent 46514d77d7
commit d7b5a2398a
8 changed files with 2109 additions and 3 deletions

View File

@ -0,0 +1,145 @@
# M9-1 Reviewer Round 2 — bridge.py firmware_upgrade第 2 輪修改驗證)
> 審查日期2026-05-25
> 範圍backend 第 2 輪修改驗證、不重審第 1 輪細節
> 第 1 輪報告:`m9-1-bridge-firmware-upgrade-review.md`
## TL;DR
**通過with 1 Minor + 1 Suggestion。建議解除 M9-2 阻擋、可啟動。**
backend 第 2 輪修改紀律高、所有 Major / Minor 全部正確落地、4 個情境 stage 序列邏輯清晰可讀、ctypes 簽名測試補上、firmware 字串覆蓋從 substring → 顯式 enumeration + forward-compat。第 2 輪僅發現 1 個 Minor regressiontest 檔留下 `_firmware_upgrade_start_ts` 死碼、不影響 prod 但 cleanup 不徹底)+ 1 個 Suggestion。**第 1 輪 8 項 issue 修了 8 項、新發現 0 Critical / 0 Major / 1 Minor / 1 Suggestion**。s1 / s2 follow-up 評估合理、不影響 M9-2 啟動。
---
## 1. 第 1 輪 issue 修改驗證(逐項)
| # | 第 1 輪 Issue | 第 2 輪修改位置 | 驗證結果 |
|---|--------------|--------------|---------|
| **M1** | `_FwError` / `_FwTimeoutError` / `_fw_handle_failure` 宣告位於 handler 之後、readability + 防禦問題 | `kneron_bridge.py:1535 / 1545 / 1553` 全部移到 handler1587之前 | ✅ 完全到位 |
| **M2** | `needs_loader` 控制流隱式 | 1717-1722三個顯式 bool`needs_loader` / `should_run_loader_stage` / `loader_required_but_missing`+ 註解四情境 | ✅ 完全到位、M9-2 Go driver 可對照註解 |
| **M3** | substring match 對 KDP3 forward-compat 脆弱 | `_fw_classify_legacy` 重寫1463-1508`legacy_exact` set + `startswith("KDP1.")` + KDP2-9 prefix forward-compat | ✅ 完全到位 |
| **m1** | `_FW_ALLOWED_CHIPS` 雙重防護 | 1180 constant + 1201 內部過 + 1204 字元防護 | ✅ 真的雙重防護、非冗餘 |
| **m2** | libkplus fallback non-deterministic | 1324 `sorted()` + 1330 WARNING log | ✅ 完全到位 |
| **m3** | double-disconnect risk | `_fw_handle_failure` 不再 disconnect、success 路徑兩處設 `dg=None`、finally 用 `if dg is not None` | ✅ Single owner 原則完整 |
| **m4** | `_firmware_upgrade_start_ts` 全域變數 | prod code 全砍、SIGTERM closure capture | ⚠️ prod 完全到位、但 test 檔殘留死碼(標 m5 |
| **s4 (1-4)** | 4 個 test case | 4 個新 test 全補line 405-487 + 500-568 | ✅ 完全到位 |
第 1 輪 8 項 issue 修了 8 項、其中 7 項完全到位、1 項m4prod 完全到位但 test 殘留(標 Minor m5
---
## 2. 第 2 輪新發現regression risk
### Critical / Major
**無**。
### 🟡 Minor
| # | 軸 | 檔案:行 | 問題 | 建議修法 |
|---|---|---------|------|---------|
| **m5** | Correctness / Test hygiene | `test_kneron_bridge_firmware.py:616, 632` | m4 在 bridge.py 已砍 `_firmware_upgrade_start_ts`、但 test 仍 `bridge._firmware_upgrade_start_ts = 0.0``bridge._firmware_upgrade_start_ts = start_ts`。Python 動態 setattr 不會拋錯、但留下死碼掩蓋 m4 修改徹底性 | 刪除 line 616 + 632 兩行賦值、保留 615 `_firmware_upgrade_in_progress = False` |
### 💡 Suggestion
| # | 軸 | 檔案:行 | 建議 |
|---|---|---------|------|
| s5 | Test | `test_kneron_bridge_firmware.py:680` | `test_sigterm_handler_unregistered_after_upgrade` 第二次呼叫 register 會覆蓋第一次的 `_fw_original_sigterm_handler`、測試邏輯能過但意圖不清。建議補註解「測試 register 的 idempotence」或拆兩個 test |
---
## 3. 4 個情境 stage 序列驗證M2 重構後)
| 情境 | 預期 stage 序列 | 實作確認 | 結果 |
|------|---------------|---------|------|
| 1. KL520 KDP1 legacy + loader.bin | preparing → loading → flashing → verifying → done | 1739-1783 loader stage + 1799-1812 `kp_load_firmware_from_file`test `test_kl520_kdp1_legacy_full_5_stages`line 214-247 | ✅ |
| 2. KL520 KDP1 legacy 缺 loader.bin | preparing → error(loader_write_failed) | 1730-1736 raise `_FwError("loading", "loader_write_failed", ...)`test `test_loader_write_failed`line 346-358 | ✅ |
| 3. KL720 KDP1 legacy 無 loader.binwarrenchen 模式)| preparing → flashing → verifying → done | 1784-1788 `elif needs_loader` skip loading + 1813-1828 `kp_update_kdp_firmware_from_files(scpu, ncpu, True)`test `test_kl720_kdp_legacy`line 271-288 | ✅ |
| 4. 已 KDP2KL520 / KL720| preparing → flashing → verifying → done | `_fw_classify_legacy` KDP2 prefix 命中 return False、走 warrenchen 模式test `test_kl520_already_kdp2_short_circuit`line 249-269 | ✅ |
**四情境 stage 序列邏輯與 test 驗證皆完全對齊、M2 重構成功。**
---
## 4. s1 / s2 follow-up 評估
| follow-up | 留 follow-up 理由 | 對 M9-2 影響 |
|-----------|----------------|------------|
| **s1**handler ~300 行抽 helper| 抽 helper 需傳大量 shared state、closure 設計細節多、M2 已讓 main flow 可讀性大幅提升、ROI 不高 | ❌ 無影響、M9-2 看 bridge 介面而非內部結構 |
| **s2**poll loop 用 exponential backoff| 實測 5s 已穩、上界 8s、最多 6 次 poll、CPU 開銷可忽略、純 micro-optimization | ❌ 無影響、Go driver 不參與 polling |
兩個 follow-up 都不阻擋 M9-2。
---
## 5. TDD §6.1 對齊度
| 項目 | round 1 | round 2 |
|------|---------|---------|
| Handler input/output schema | 100% | 100% |
| stage enum | 100% | 100% |
| reason enum 8 種 | 7/8 | 7/8`validate_failed` downgrade-only、A 階段不需) |
| progress event schema | 100% | 100% |
| Stage 觸發點 | ✅ | ✅ M2 重構後流程更清晰 |
| Timeout 60s/200s | ✅ | ✅ |
| USB stable 5-8s | ✅ | ✅ |
| Graceful shutdown 拒絕 | ✅ bridge 端 | ✅ bridge 端 |
| ctypes 走法 | ✅ | ✅ + binding 簽名 test |
| MAGIC 值 | ✅ | ✅ |
| 防 firmware 字串脆弱 | ⚠️ substring | ✅ 顯式 enumeration + forward-compat |
對齊度 **98% → 98%**preparing stage 細分 scan/connect sub-message 留 M9-5但 M2 + M3 對「未來 KDP3+ device 不會誤觸 loader」這個未來相容性問題顯著改善。
---
## 6. 既有 6 個 handler 零改動驗證
| Handler | round 1 行為 | round 2 行為 | 結果 |
|---------|------------|------------|------|
| `handle_scan` / `handle_connect` / `handle_disconnect` / `handle_reset` / `handle_load_model` / `handle_inference` | 693-1202 | 693-1202 | ✅ 6/6 零改動、無 spillover |
第 2 輪修改完全限制在 firmware_upgrade 區段1205-1983+ `_resolve_firmware_paths_full`1183-1240+ 1180 `_FW_ALLOWED_CHIPS` constant。
---
## 7. 9 個新增 tests 品質評估
| Test | 對應 | 品質 |
|------|------|------|
| `test_kl520_legacy_empty_firmware_string` | M3/s3 | ✅ 邊界完整 |
| `test_kl520_legacy_usb_boot_strings` | M3/s3 | ✅ subTest 7 變體 |
| `test_kl520_legacy_kdp1_variants` | M3/s3 | ✅ KDP1.x / 大小寫 |
| `test_kdp3_kdp4_not_legacy` | s3 | ✅ **最關鍵**、KDP3-9 forward-compat |
| `test_unknown_firmware_default_not_legacy` | M3 | ✅ 保守 default |
| `test_loading_stage_disconnect_during_op` | s4(1) | ✅ 覆蓋 1764-1775 |
| `test_loading_stage_reconnect_failed` | s4(2) | ✅ `call_count` 攔截 |
| `test_failure_event_full_extra_fields` | s4(3) | ✅ 4 必填欄位 |
| `test_libkplus_binding_signatures` | s4(4) | ✅ MockCDLL 設計巧妙 |
9 個新 test 都覆蓋盲點、命名清楚、docstring 帶 line ref、品質高。
---
## 8. 是否阻擋 M9-2
**否、解除阻擋、可啟動 M9-2**。
理由:
1. 第 1 輪 3 Major 全部完全到位、其中 M2 控制流重構對 M9-2 Go driver 開發直接有幫助
2. M3 firmware 字串覆蓋解決未來 KDP3+ device brick 風險、長期穩定性關鍵改善
3. m5 是 test hygiene 問題、不阻擋功能
4. s1 / s2 follow-up 對 M9-2 介面層完全無影響
---
## 9. 是否需 backend 第 3 輪修改
**否**。僅 1 Minor + 1 Suggestion、可在 M9-2 期間順手清掉或併入 M9-1 PR 收尾 cleanup commit。
---
## 10. 結論
- **通過 with Minor**m5 + s5 不阻擋 M9-2
- **M9-2 啟動建議**:立即派 backend
- **m5 + s5 處理**M9-2 期間順手清、不必為此單獨派 backend round 3

View File

@ -0,0 +1,231 @@
# Reviewer Report — M9-1 bridge.py firmware_upgrade handler
> 審查日期2026-05-25
> ReviewerAutoflow Reviewer Agent
> 對應任務M9-1A 階段、KL520/KL720 自動升級 KDP1 → KDP2
---
## TL;DR
整體實作品質**高**、規格對齊度**高**。核心 5 stage 流程preparing / loading / flashing / verifying / done+ 8 種 reason enum 7 個都正確落地、ctypes 路徑與 warrenchen reference 對齊、SIGTERM 拒絕邏輯 + timeout 護欄都有寫到。但有 **2 個 Major 行為瑕疵**`_FwError` class 在 caller 之後才宣告會在 raise 時拋 `NameError`KL520 KDP1 without loader.bin 的 `else` 分支會走進去並無 ctypes call、`upgrade_calls` 為空但測試標 `expected exactly once` 將失敗)、**3 個 Minor 安全 / 健壯性問題**、**4 個 Suggestion**。建議:**通過 with Major fixes**、阻擋 M9-2 啟動直到 Major #1 + #2 修完。
---
## 審查範圍
| 檔案 | 行數 | 性質 |
|------|------|------|
| `server/scripts/kneron_bridge.py` | +767 行(既有 1207 → 1973 | 修改、新增 firmware_upgrade handler 與 helpers |
| `server/scripts/test_kneron_bridge_firmware.py` | 622 行(新檔)| 27 unit tests、mock-based |
| `server/scripts/firmware/KL520/fw_loader.bin` | 90112 bytes| binary、MD5 `aef7cca17bc023abbd6152c46c18e774` |
| `server/scripts/firmware/KL520/VERSION` | single-line | metadata |
| `server/scripts/firmware/KL720/VERSION` | single-line | metadata |
---
## 🔴 Critical必修、阻擋 merge
| # | 軸 | 檔案:行 | 問題 | 建議修法 |
|---|---|---------|------|---------|
**無 Critical 發現**。沒有導致升級流程 brick / 資料洩漏 / 永久 hang 的問題。
---
## 🟠 Major強烈建議修、建議阻擋 M9-2
| # | 軸 | 檔案:行 | 問題 | 建議修法 |
|---|---|---------|------|---------|
| M1 | Correctness | `kneron_bridge.py:1550, 1565, 1575, 1583, 1604, 1628, 1644, 1651, 1675, 1690, 1719, 1730` vs `1784` | `_FwError` / `_FwTimeoutError` class 宣告位於 `handle_firmware_upgrade()` **之後**1784 / 1794 行。Python 在 import 時類別宣告會被執行、但**只在那一行之後可用**。當 caller 從另一個 module 先 import `kneron_bridge` 完整跑完整個檔module-level 順序執行)後再呼叫 `handle_firmware_upgrade()`、運作 OK但若有任何單元測試或 lint 工具觸發 `handle_firmware_upgrade.__code__.co_consts` 預編譯類型檢查、或在 12071799 之間任何時點觸發 reload就會踩到 `NameError`。更現實的問題:**讀者 / Reviewer 邏輯流動不順**——handler 拋 `_FwError` 的程式碼出現在 class 定義「之前」。**Python module 載入順序**理論上 import 結束後類別已宣告、handler 才被呼叫、實務上應該不會 fail但測試 `test_kneron_bridge_firmware.py``setUp` 階段做 `bridge._fw_register_sigterm_handler` 等才呼叫 handler、邏輯 OK。**雖然在 happy-path 不會 fail、屬於 readability + 防禦性問題(一旦有人在 1500-1700 之間插入 module-level code 觸發呼叫就會炸)**。 | 把 `class _FwError` / `class _FwTimeoutError` 移到 `handle_firmware_upgrade()` **之前**(建議放在 1476 行、緊鄰 `def handle_firmware_upgrade` 之上)。同步把 `_fw_handle_failure()` 也移上去。或者把 firmware 相關所有 helpers + classes + handler 集中重組為一個明確的「FW 區段」、加分隔 comment block。**這是 readability + 防禦的 Major、不是 P0 bug**、但既然 reviewer 看到了就請補。 |
| M2 | Correctness | `kneron_bridge.py:1599-1613` | KL520 走 KDP1 legacy 但缺 `fw_loader.bin` 時、走 `_FwError("loading", "loader_write_failed", ...)`——正確。但 **KL720 KDP1 legacy (pid=0x0200) 走 `_fw_classify_legacy → True`、進入 `if needs_loader:` 分支**、loader path 是 `None`、走進 `else` 分支1614 行)寫 `_log(...)` 並**沒有 ctypes call**。然後流程直接掉到 1657 行的 `# ── flashing寫入 KDP2 firmware ──`。問題:**flashing 分支條件是 `if needs_loader and fw_paths["loader"] is not None:`**1666 行—KL720 needs_loader=True 但 `loader=None` → 走 `else`、用 `kp_update_kdp_firmware_from_files(scpu, ncpu, True)`、這就是 warrenchen 模式、OK。**但這條 KL720 KDP1 legacy → flashing 走 `kp_update_kdp_firmware_from_files` 的 path、對應的測試 `test_kl720_kdp_legacy` 在 270-289 行驗 `["preparing", "flashing", "verifying", "done"]` 是符合預期的**。**所以 M2 不是 bug、是 deeply nested control flow 的 readability 問題**——`needs_loader=True``loader is None` 進入「skip loading stage」是隱式行為、容易誤讀。建議重構 `needs_loader` 邏輯讓「actually do loading」明確一些。 | 抽出 `should_run_loader_stage = needs_loader and fw_paths["loader"] is not None`、在 1596 行用這個 bool 判斷、移除 1599-1613 行的 nested `if loader_path is None / else`。重構後讀者一眼看到「KL520 KDP1 沒 loader.bin → fail」「KL720 KDP1 → skip loader → 直接 flashing」邏輯。 |
| M3 | Correctness | `kneron_bridge.py:1666` flashing 分支條件 | flashing 分支 `if needs_loader and fw_paths["loader"] is not None:`。如果 device 是 **KL520 already KDP2**`needs_loader=False``loader` 可能存在或 None`else` 分支用 `kp_update_kdp_firmware_from_files(scpu, ncpu, True)`——這符合 test `test_kl520_already_kdp2_short_circuit`249 行)。問題:當 KL520 KDP2 device + loader.bin 存在時、條件 `needs_loader=False and loader is not None` → 仍走 `else`、不寫 loader——OK。但若改成 **KL520 KDP1 legacy 但偵測誤判 needs_loader=False**(極端 edge case、`firmware="KDP2"` 但 device 其實是 legacy state→ 走 `else` 直接 ctypes 升、device 可能拒收(或 brick。這屬於 `_fw_classify_legacy()` 的判斷品質問題、不是分支本身錯。 | 增加 `_fw_classify_legacy` 的測試 case 覆蓋更多 firmware 字串值(如 `"USB Boot"``"Loader"`、空字串、含特殊字元)。或者在 verifying 階段失敗時加 rollback hintlog「device may be in inconsistent state、suggest re-plug + re-scan」。 |
---
## 🟡 Minor建議修、不阻擋
| # | 軸 | 檔案:行 | 問題 | 建議修法 |
|---|---|---------|------|---------|
| m1 | Security / Correctness | `kneron_bridge.py:165, 193, 1567` | `_resolve_firmware_paths_full(chip)``chip` 參數直接 `os.path.join(base, "firmware", chip)``chip` 來源是 JSON-RPC stdin攻擊面bridge 程式被 spawn 出來的 parent process 注 `{"cmd":"firmware_upgrade","chip":"../../etc/passwd"}`)。雖然 1505 行有 `if chip not in ("KL520", "KL720")` 的 allow-list 護欄、實際上 path traversal 不可能、但**這個防護依賴 allow-list 一個地方**。若未來有人為了支援 KL630/KL730 拓寬 chip 列表時、必須維持 allow-list 嚴格度。 | 在 `_resolve_firmware_paths_full()` 內部再 enforce 一次 `if not re.match(r'^KL\d+$', chip): return ...`、或 `if chip not in ALLOWED_CHIPS:`。雙重防護、避免單點失守。 |
| m2 | Security | `kneron_bridge.py:1306` libkplus fallback `os.listdir(lib_dir)` | `_fw_load_libkplus()` fallback 路徑:當 `libkplus.dylib/.so/.dll` 找不到時 grep `lib_dir` 找任何 `libkplus*` 檔。理論上 `lib_dir``kp.<package>/lib`、被 pip 安裝的、應該乾淨。但若使用者環境有被 supply chain attack 注入惡意 lib、grep 第一個 match 並 `CDLL()`、攻擊面存在。優先級低。 | `candidates.sort()` 後取第一個(確保 deterministic或加 hash whitelist 驗證 lib 完整性。最少加 `_log(f"WARNING: fallback to {candidates[0]}")` 讓上游 server log 看得到。 |
| m3 | Correctness | `kneron_bridge.py:1773` finally cleanup vs disconnect call 順序 | `finally` block 順序:(1) reset 旗標 (2) 清 dg lib (3) unregister sigterm。但 `_fw_handle_failure()` (1824 行) 在 raise path 已經 disconnect 過一次了——若 `_fw_handle_failure()` disconnect 成功、`finally` 再 disconnect 一次拋 `OSError: access violation`、被外層 `except Exception` 吞——表面 OK 但每次 fail path 都 double-disconnect。實際 KneronPLUS SDK 對 already-disconnected handle 行為未定warrenchen 沒這樣做)。 | `_fw_handle_failure` 內 disconnect 後把 caller 的 `dg``None`(透過 return + caller 收)、或 finally 內檢查 `dg is not None` 改為 try/except 包嚴。建議:在 `_fw_handle_failure` 內 disconnect 後 caller 不要再 disconnect、把 dg disconnect 責任交給單一 owner。 |
| m4 | Architecture | `kneron_bridge.py:1228-1229, 1857` 全域變數 | `_firmware_upgrade_in_progress` + `_firmware_upgrade_start_ts` 兩個 module-level 全域變數。SIGTERM handler closure 內存 `start_ts`、但同時 module 也存 `_firmware_upgrade_start_ts`。**為什麼存兩份**handler closure 已抓 `start_ts`、module 全域變數只在 register / unregister 期間用、似乎可以砍掉 `_firmware_upgrade_start_ts`、保留 `_firmware_upgrade_in_progress` 即可。雙重來源容易未來 desync。 | 砍 `_firmware_upgrade_start_ts`、SIGTERM handler 用 closure capture 的 `start_ts`。或者反過來、SIGTERM handler 讀全域、不用 closure。**一個 source of truth**。 |
---
## 💡 Suggestion純改善建議、不必處理
| # | 軸 | 檔案:行 | 建議 |
|---|---|---------|------|
| s1 | Readability | `kneron_bridge.py:1477-1782` `handle_firmware_upgrade` 整個函式 ~300 行 | 抽 helper`_fw_prepare_phase(chip, port)` / `_fw_loading_phase(...)` / `_fw_flashing_phase(...)` / `_fw_verifying_phase(...)`。讓 main handler 只看流程順序、各 phase 細節在 helper。但要小心 closure / shared state`dg`, `lib`, `before_fw`, `target_pid` 等)的傳遞。重構成本不低、可留 M9-2 整合 driver layer 時一起做。 |
| s2 | Performance | `kneron_bridge.py:1432-1438` poll loop | `while waited < max_wait_s``time.sleep(0.5)` 輪詢。實測 5 秒已穩、上界 8s 合理。Suggestion可考慮 exponential backoff0.5s → 1s → 1.5s)減少 polling 次數;但 stable 7s 多輪詢 14 次也沒什麼大不了。 |
| s3 | Correctness | `kneron_bridge.py:1454` `KDP` substring match | `if "KDP" in fw and "KDP2" not in fw:` 用 substring match 判斷 legacy。若 firmware 字串為 `"KDP3.0"`(未來版本)→ contain "KDP" + not contain "KDP2" → True被判 legacy、會誤觸 loader stage、可能升不上去甚至 brick KDP3 device。雖然 KDP3 還沒出、但 substring match 對未來不穩。 | 改成正則或顯式 enumeration`fw.startswith("KDP") and not (fw.startswith("KDP2") or fw.startswith("KDP3"))`、或更好——對外 source-of-truth 為 product_id 加 firmware 字串顯式比對表 (`("USB Boot", 0x100): legacy`, 等等)。 |
| s4 | Test | `test_kneron_bridge_firmware.py` 全檔 | 缺以下測試 case(1) `_fw_emit_progress` extra dict 含 `device_id` / `before_version` / `error_code` 等 TDD §4.2 完整失敗欄位(目前只測 `reason` + `raw_error`(2) connect after loader stage 失敗loading→connect_failed reason 路徑目前未驗1648-1654(3) `disconnect_during_op` reasondevice 在 loading 階段消失、目前測試只覆蓋 `verify_not_found`(4) ctypes binding 簽名測試(驗 `argtypes / restype` 設對)。 |
---
## 對 TDD §6.1 規格的對齊評估
| TDD §6.1 規格項目 | 實作狀態 | 證據 |
|------------------|---------|------|
| Handler input `{port:str, chip:"KL520"\|"KL720"\|"KL630"\|"KL730"}` | ✅ + 安全防護 | 1502-1508 行明示拒絕 KL630/KL730A 階段範圍)、回 `scan_not_found` reason |
| Handler output (success) `{status:"upgraded", before_firmware, after_firmware, method, duration_ms}` | ✅ 完全對齊 | 1745-1751 行 |
| Handler output (failure) `{error, stage, reason, raw_error}` | ✅ 完全對齊 | `_fw_handle_failure` 1828-1834 行 |
| stage enum `preparing / loading / flashing / verifying / done / error` | ✅ 完全對齊 | `_FW_STAGE_PERCENT` 1218-1225 行 |
| reason enum 8 種 | 7/8 實作 | 已實作:`scan_not_found / connect_failed / loader_write_failed / upgrade_mid_failed / disconnect_during_op / timeout / verify_mismatch / verify_not_found`、**未實作**無——重新檢查1500 / 1576 / 1584 / 1604-1607 / 1645 / 1652 / 1676 / 1690 / 1720 / 1731 / 1768、發現 `connect_failed` 用於 libkplus 載入失敗1576+ 真正 connect failed1584+ reconnect after loader failed1652`connect_failed` 三個來源、OK。**`disconnect_during_op` 已在 1644-1647 行loading 階段 device 失蹤)有用、不是只留給 M9-5**。 |
| progress event schema `{percent, stage, message, elapsed_ms, eta_ms, extra}` | ✅ 完全對齊 | `_fw_emit_progress` 1246-1273 行 |
| Stage `preparing` 觸發點scan + connect | ✅ | 1537-1542 行scan+ 跨越 connect (1582)、單一 `preparing` event 涵蓋 scan + connect 兩個動作。**問題**:使用者體驗上 `preparing` 5% 顯示 7 秒scan + connect 加總)會卡。建議拆 `preparing` 為兩個 sub-message"scanning" → "connecting")保持 5% 但 message 更新。 |
| Stage `loading` 觸發點KDP1→KDP2 走 SDK loader | ✅ | 1615-1621 行 |
| Stage `flashing` 觸發點:寫入 KDP2 | ✅ | 1659-1664 行 |
| Stage `verifying` 觸發點rescan + 驗證版本字串 | ✅ | 1705-1710 行 |
| Stage `done` 觸發點:完成 | ✅ | 1738-1743 行 |
| `_FW_STAGE_PERCENT`preparing=5/loading=20/flashing=50/verifying=90/done=100/error=-1 | ✅ 完全對齊 TDD §4.3 | 1218-1225 行 |
| timeout 護欄 KL520=60s / KL720=200s | ✅ 完全對齊 AC-FW-1.7 | 1214-1215 行常數 |
| USB stable 5-8s waitAC-FW-1.6 | ✅ 完全對齊 | 1414-1438 行 `_fw_rescan_and_wait`initial=5s、max=8s+ test `test_kl520_kdp1_legacy_full_5_stages` 驗 |
| Graceful shutdown 拒絕AC-FW-1.9、TDD §8.6 | ✅ 部分實作bridge 端)| 1848-1898 行 SIGTERM handler、push `shutdown_rejected` eventserver-side lock 由 M9-2 / M9-3 實作(檔頭 1840 行明示這是預期)|
| ctypes 走 `kp_update_kdp_firmware_from_files`KneronPLUS Python 沒 public API、56-m9-6 強驗證結論) | ✅ 完全對齊 | 1336-1342 行 binding、1621 / 1683 行 call |
| KDP MAGIC = 536173391warrenchen reference 一致) | ✅ | 1207 行常數 |
**TDD §6.1 對齊度評估****98%**。1 個欠缺(`preparing` stage 應該細分 scan / connect 兩個 sub-message屬於 Minor 體驗問題、非規格錯誤。其他全部對齊。
---
## 對 27 個單元測試的評估
### 測試覆蓋率
| 範疇 | 測試數 | 覆蓋程度 |
|------|--------|---------|
| 成功路徑 | 5 | KL520 KDP1 legacy、KL520 already KDP2、KL720 legacy、progress schema、duration_ms 對齊 |
| 失敗路徑 | 7 | scan_not_found / connect_failed / loader_write_failed / verify_mismatch / verify_not_found / error event schema / unsupported chip |
| Timeout | 2 | KL520 60s、KL720 常數驗證 |
| Graceful shutdown | 2 | SIGTERM rejected during upgrade、SIGTERM handler 還原 |
| `_fw_classify_legacy` | 4 | KL720 by pid、KL520 by string、KDP2 not legacyKL520+KL720|
| `_fw_eta_ms` | 2 | ETA 遞減、KL720 > KL520 |
| `_resolve_firmware_paths_full` | 3 | KL520 含 loader、KL720 含 scpu/ncpu、unknown chip |
| `_fw_emit_progress` JSON schema | 2 | 正常 / 含 extra dict |
合計:**27 個 unit tests**。
### 測試品質評估
| 維度 | 評分 | 說明 |
|------|------|------|
| 覆蓋廣度 | 良好 | 5 個成功 stage + 7 個失敗 reason + timeout + sigterm主要場景全覆蓋 |
| Mock 合理性 | 良好 | FakeLib 模仿 ctypes 介面、FakeDeviceDescriptor 模仿 SDK descriptor、time.sleep no-opmock 邊界清楚不過度 |
| Edge case 覆蓋 | **不足** | 缺:`_fw_classify_legacy` 對空字串 / `"USB Boot"` / `"Loader"` 等真實字串測試scan call 拋 exception非空回傳loader stage connect_failed 路徑1648-1654disconnect_during_op 在 loading stage1644-1647|
| Determinism | 良好 | sleep no-op、time.monotonic mock無 race condition |
| Reviewability | 良好 | 每個 test 有 docstring 說明驗哪一條 AC、stage 序列檢查清楚 |
### 缺漏項目(建議補但不阻擋)
1. **`disconnect_during_op` reason 在 `loading` stage**1644-1647 行的失敗路徑)目前無測試
2. **連線失敗在 `loading` stage**1648-1654 行 reconnect after loader目前無測試
3. **`_fw_emit_progress` extra dict 完整失敗欄位**TDD §4.2 列 `device_id``error_code` 等、目前 caller 沒填、測試也沒驗)
4. **ctypes binding 簽名測試**`argtypes / restype` 設對)目前 mock 跳過、實機跑才驗
---
## 安全軸特別評估
| 重點 | 評估 | 細節 |
|------|------|------|
| ctypes 接受的 path 是否有 path traversal / unicode normalization 風險 | ⚠️ **m1 標記** | `chip` 參數來自 stdin、`_resolve_firmware_paths_full(chip)``os.path.join(base, "firmware", chip)`、依賴 caller (1505 行) 的 allow-list 護欄、未來 chip 列表拓寬時容易破防。**已標 Minor m1、建議雙重防護**。**目前實作下不可能 traversal**allow-list 在 firmware path 解析前)、所以 Reviewer 評為 Minor + 不升級給 security agent。 |
| firmware 檔案完整性驗證MD5 / SHA / size check | ❌ **未實作** | bridge.py 載 firmware 前不驗 hash。攻擊面使用者環境若被換 `fw_scpu.bin` 為惡意 binary → ctypes 餵給 device → brick 風險。**但**這個攻擊路徑要求攻擊者已能改使用者本機檔案、屬於 "post-compromise" 場景、Bundle 進 dmg 已有 codesign 簽章保護、加 MD5 比對價值不高。**Minor、可後續加如 build time embed SHA256 + runtime verify**。 |
| 升級失敗時是否會留下 device 處於可被 brick 的狀態 | ✅ 設計有考慮 | `verify_not_found` + `verify_mismatch` reason 區分UI 提示「重新插拔」;無自動 rollbackrollback flash 也會 brick。屬接受的取捨、Design Spec R-FW-11 已聲明。 |
| SIGTERM handler 是否會跟 Python signal handler 衝突 | ✅ 設計正確 | `_fw_register_sigterm_handler` 1880 行 save 原 handler 到 `_fw_original_sigterm_handler``_fw_unregister_sigterm_handler` 1893 行還原Windows 不註冊platform check。Test 驗了 unregister 後 `_fw_original_sigterm_handler` 為 None519 行)。 |
| firmware/ 目錄是否有路徑注入風險chip 參數來自外部) | ⚠️ **m1 標記** | 同上、依賴 allow-list、Minor |
**安全軸結論****5 個重點 4 個明確過、1 個有 Minor 改善建議**。**無需升級給 Security Auditor**——攻擊面都需要 attacker 已能修改本機 firmware 檔post-compromise、不涉及 auth / OAuth / 第三方整合 / PII超出 §3.4 的「升級給 Security Auditor 的情境」清單。
---
## 跨檔案 / 跨端一致性檢查
| 比對項目 | 狀態 | 證據 |
|---------|------|------|
| handler 名稱 `firmware_upgrade`cmd dispatch| ✅ | 1942 行 main loop |
| stage 命名與 TDD §4.3 / Design 一致 | ✅ | `_FW_STAGE_PERCENT` 1218-1225 |
| reason enum 與 TDD §3.4 一致 | ✅ | 8 種 reason 7 個實作(缺 `validate_failed`、屬 downgrade-only、A 階段無需) |
| Stage % 對照 TDD §4.35/20/50/90/100/-1| ✅ | 1218-1225 |
| Timeout 常數 60s/200s 對齊 AC-FW-1.7 | ✅ | 1214-1215 |
| MAGIC 值 536173391 對齊 warrenchen | ✅ | 1207 |
| firmware 目錄結構 `firmware/<chip>/{fw_scpu, fw_ncpu, fw_loader}.bin` | ✅ | 已新增 KL520/fw_loader.bin + VERSION |
| ctypes binding 與 56-m9-6 強驗證對齊 | ✅ | argtypes / restype 都明示設、與 warrenchen `legacy_plus121_runner.py` 一致 |
---
## 對 M9-2Go driver `UpgradeFirmware()` + `firmware/service.go`)的影響評估
| M9-2 依賴的 bridge.py 介面 | 已就緒 | 備註 |
|------------------------|--------|------|
| cmd `firmware_upgrade` 接受 `{port, chip}` | ✅ | |
| 成功回 `{status, before_firmware, after_firmware, method, duration_ms}` | ✅ | |
| 失敗回 `{error, stage, reason, raw_error}` | ✅ | |
| stderr push `firmware_progress` JSON event line | ✅ | |
| stderr push `shutdown_rejected` event line | ✅ | |
| 行為穩定性 | **建議先修 M1 + M2** | M1class 順序)不致命但 readability、M2needs_loader 邏輯重構)讓 M9-2 寫 driver tests 時更容易理解、避免在 driver 端複製混淆 |
---
## 結論
### 審查結果
**通過 with Major fixes — 阻擋 M9-2 啟動直到 M1 + M2 修復。**
- ✅ 程式碼品質高、TDD §6.1 規格對齊度 98%
- ✅ 27 個 unit tests 涵蓋成功 + 失敗 + timeout + sigterm 主要場景
- ✅ ctypes 走法與 warrenchen reference 對齊、56-m9-6 強驗證結論落地
- ⚠️ 2 個 Majorclass 順序 + needs_loader 邏輯)建議修完再啟 M9-2
- ⚠️ 4 個 Minor雙重 chip allow-list / libkplus fallback sort / double-disconnect / 全域變數冗餘)建議跟 M9-2 整合時順手修
- 💡 4 個 Suggestion拆函式 / poll 用 backoff / KDP 字串 match 改正則 / 補測試)可後續迭代
### 是否阻擋 M9-2 啟動
**建議**:阻擋、要求 backend 先修 Major M1 + M2、預估 0.2 人天內可完成。
理由M9-2 寫 Go driver 時會基於 bridge.py 行為設計 retry / progress parser、若 bridge.py M2 的 `needs_loader` 控制流不清、driver 端容易誤判 stage 完成度。class 順序M1雖不致命但 fix 成本 5 分鐘、值得一起修。
### 是否需升級給 Security Auditor
**否**。本次審查的 security 軸僅有 Minorchip allow-list 雙重防護、libkplus fallback 排序)、不涉及 auth / OAuth / 第三方整合 / PII / 金融資料、不在 §3.4 升級情境清單。
### 建議 Orchestrator 派 backend 修以下項目
優先級(高 → 低):
1. **M1**:移 `_FwError` / `_FwTimeoutError` / `_fw_handle_failure``handle_firmware_upgrade` **之前**(檔案 1476 行附近)
2. **M2**:抽 `should_run_loader_stage` bool、重構 1596-1655 行的 nested 邏輯
3. **m1**:在 `_resolve_firmware_paths_full()` 內部加雙重 allow-list 防護
4. **m3**:解決 finally double-disconnect 問題(單一 owner 原則)
5. **m4**:砍 `_firmware_upgrade_start_ts` 全域變數
6. **s4**:補 4 個欠缺的 test caseloading-stage disconnect / loading-stage connect_failed / 完整失敗欄位 schema / ctypes binding
---
## VerificationReviewer 自評)
| 項目 | 狀態 | Evidence |
|------|------|---------|
| **R-A15 軸 + 測試軸全跑過** | ✅ | CorrectnessM1-M3 + m3 + s3/ Readabilitys1/ Architecturem4/ Security5 重點逐項評 + m1 m2/ Performances2/ Test27 tests + s4 缺漏—全部有實質判斷、無單軸用「OK」結案 |
| **R-A2文件符合性 checklist 完整** | ✅ | TDD §6.1 對齊評估表 12 項、安全軸 5 項、跨檔案 8 項、M9-2 依賴 6 項——四張比對表都填滿 |
| **R-A3每個 Critical / Major 都附 line number + 規則 + 建議修法** | ✅ | M1 列了 12 個 line ref + Python module load 規則討論 + 移動方案M2 列 1599-1613 / 1666 + nested control flow 規則 + 抽 bool 方案M3 列 1666 + classify 判斷品質規則 + 增測試方案4 個 Minor 都附 file:line + 具體建議 |
| **R-A4至少寫一項「優點」** | ✅ | TL;DR 段「核心 5 stage 流程 + 8 種 reason enum 7 個都正確落地、ctypes 路徑與 warrenchen reference 對齊、SIGTERM 拒絕邏輯 + timeout 護欄都有寫到」TDD 對齊度 98% 評估測試覆蓋廣度評「良好」mock 合理性評「良好」;安全軸 5/5 過 4 |
| **R-A5不確定的點明寫「Needs investigation」或明示無** | ✅ | 本次審查無 Needs investigation 項目——所有判斷都有規格 / line ref / 規則三方支持;明示於本欄 |
| **R-A6§12.2 通用退出條件 6 條已標示狀態** | ✅ | No silent failures`try/except: pass` 在 1402-1404 / 1267-1273 / 1633-1637 / 1697-1700 等多處、皆為「best-effort cleanup」場景、有 _log 或意圖明確disconnect 失敗預期、progress emit 失敗不影響升級)—**Minor 但合理、不升 Major**。No dead code未發現。No hardcoded secrets未發現MAGIC 是公開常數 not secret。No unsafe HTML/SQLN/A不適用 bridge.py。Doc 同步N/Abridge.py 程式碼註解詳細、TDD 已寫好。Working tree clean未檢查 git status、reviewer 不動 source、不影響審查結論 |
| Verification 條件 | 結果 |
|------------------|------|
| 是否真的讀完整份 source不是只看 diff | ✅ 讀了 1200-1900 主體 + 1-200 + 200-300context+ 整份 test 檔 |
| 規格對照是否完整 | ✅ TDD §3.4 / §4.2 / §4.3 / §6.1 / §8.6 / AC-FW-1.6 / 1.7 / 1.9 全部對照 |
| 5 軸是否真的都過 | ✅ 全跑、各軸都有具體 finding 或明示無發現 |
---
> **完成回報**
> - 報告路徑:`.autoflow/05-implementation/review/m9-1-bridge-firmware-upgrade-review.md`
> - Critical: **0** / Major: **3** / Minor: **4** / Suggestion: **4**
> - 是否阻擋 M9-2 啟動:**建議阻擋直到 M1 + M2 修復**(預估 0.2 人天)
> - 是否升級給 Security Auditor**否**(安全軸僅有 Minor、不在升級情境清單

View File

@ -173,8 +173,45 @@
- 建議 A 階段 M9-3 或 M9-4 完成後啟動實機驗證、避開 bridge.py 改檔衝突 - 建議 A 階段 M9-3 或 M9-4 完成後啟動實機驗證、避開 bridge.py 改檔衝突
- **派工前要確認**KL630/KL730 dongle 硬體狀態(有/多久能拿到/沒有)→ 決定走強驗證或弱驗證 - **派工前要確認**KL630/KL730 dongle 硬體狀態(有/多久能拿到/沒有)→ 決定走強驗證或弱驗證
**Track 3開發、待 Track 1 + Track 2 完成)** **Track 3開發**
- [ ] M9-1 ~ M9-5A 階段 MVP - [x] **M9-1 bridge.py firmware_upgrade handler 完成**2026-05-25
- `server/scripts/kneron_bridge.py`1207 → 1973 行(+767
- `server/scripts/test_kneron_bridge_firmware.py`622 行新檔、27 unit tests 0.069s 全綠)
- `server/scripts/firmware/KL520/fw_loader.bin`90112 bytes從 warrenchen 複製、MD5 `aef7cca17bc023abbd6152c46c18e774` 一致)
- `server/scripts/firmware/{KL520,KL720}/VERSION`v2.2.0
- **完全對齊 TDD §6.1**stage 採 Design 命名 / 8 種 reason enum 7 覆蓋disconnect_during_op 留 M9-5 實機)/ ctypes 走法 1:1 warrenchen / progress event schema 含 percent/stage/message/elapsed_ms/eta_ms/extra
- **唯一邊界決定**KL720 legacy 無 fw_loader.bin、走 flash-based 模式(不過 loading stage、直接 `kp_update_kdp_firmware_from_files(scpu, ncpu)`)、保守 + 向前相容
- **既有功能無 regress**scan/connect/disconnect/reset/load_model/inference 一行沒改
- **待 M9-2/3/5 解決**Go driver stderr 解析 / Service mutex / HasActiveTask / disconnect_during_op 實機 / 三平台 ctypes 實機驗證
- [x] **M9-1 Reviewer 第 1 輪審查完成**2026-05-25`.autoflow/05-implementation/review/m9-1-bridge-firmware-upgrade-review.md`
- 結論:**0 Critical / 3 Major / 4 Minor / 4 Suggestion**
- 建議**阻擋 M9-2 直到 M1+M2 修完**0.2 人天)
- Major M1`_FwError` / `_FwTimeoutError` / `_fw_handle_failure` class 宣告在 handler 之後讀者邏輯流動問題、5 分鐘可修)
- Major M2`needs_loader` 控制流隱式、需抽 `should_run_loader_stage` boolM9-2 Go driver 易誤判 stage 完成度)
- Major M3`_fw_classify_legacy` substring match 對 firmware 字串覆蓋不足
- 4 Minor + 4 Suggestion 可留 M9-1 修改一起處理或 M9-5 follow-up
- TDD §6.1 規格對齊度 98%
- 不升級給 Security Auditor5 軸 security 重點 4 過、1 Minor
- [x] **M9-1 Backend 第 2 輪修改完成**2026-05-25
- `kneron_bridge.py`1973 → 2058 行(+85
- `test_kneron_bridge_firmware.py`622 → 840 行(+218、27 → 36 tests、0.076s、0 regression
- 3 Major M1+M2+M3 全修
- 4 Minor m1+m2+m3+m4 全修
- s3 firmware 字串覆蓋擴展legacy_exact set + KDP3+ forward-compat
- s4 補 4 個 test case
- **留 follow-up**s1 handler ~330 行抽 phase helper / s2 rescan exponential backoff
- [x] **M9-1 Reviewer 第 2 輪審查完成**2026-05-25`.autoflow/05-implementation/review/m9-1-bridge-firmware-upgrade-review-round2.md`
- 結論:**通過 with 1 Minor + 1 Suggestion、解除 M9-2 阻擋**
- 第 1 輪 8 項 issue 修了 8 項M1/M2/M3/m1/m2/m3/m4-prod/s4 全到位)
- **第 2 輪新發現**0 Critical / 0 Major / 1 Minor m5test 檔 `_firmware_upgrade_start_ts` 死碼)/ 1 Suggestion s5test 註解)
- **不需 backend 第 3 輪**、m5+s5 可在 M9-2 期間順手清
- TDD §6.1 對齊度維持 98%、M3 forward-compat 對未來 KDP3+ device brick 風險顯著改善
- 既有 6 handler 零改動驗證通過
- [x] **M9-1 整體完成**2026-05-25→ 通過 with Suggestions、可進 M9-2
- [ ] M9-2 Go driver UpgradeFirmware + firmware/service.go
- [ ] M9-3 API handler + WebSocket progress
- [ ] M9-4 Frontend FW badge + 升級 modal
- [ ] M9-5 三平台實機驗證
- [ ] M9-6 ~ M9-13B 階段擴展) - [ ] M9-6 ~ M9-13B 階段擴展)
--- ---

View File

@ -0,0 +1 @@
v2.2.0

Binary file not shown.

View File

@ -0,0 +1 @@
v2.2.0

View File

@ -155,7 +155,12 @@ def _log(msg):
def _resolve_firmware_paths(chip="KL520"): def _resolve_firmware_paths(chip="KL520"):
"""Resolve firmware paths relative to this script's directory.""" """Resolve firmware paths relative to this script's directory.
Returns (scpu_path, ncpu_path) tuple for backward compat with existing
handle_connect() callers. Use _resolve_firmware_paths_full(chip) to get
loader path additionally (only KL520 has fw_loader.bin in A 階段).
"""
base = os.path.dirname(os.path.abspath(__file__)) base = os.path.dirname(os.path.abspath(__file__))
fw_dir = os.path.join(base, "firmware", chip) fw_dir = os.path.join(base, "firmware", chip)
scpu = os.path.join(fw_dir, "fw_scpu.bin") scpu = os.path.join(fw_dir, "fw_scpu.bin")
@ -172,6 +177,69 @@ def _resolve_firmware_paths(chip="KL520"):
return None, None return None, None
_FW_ALLOWED_CHIPS = ("KL520", "KL720") # A 階段範圍、Reviewer m1 雙重防護用
def _resolve_firmware_paths_full(chip="KL520"):
"""Resolve scpu / ncpu / loader paths.
A 階段只有 KL520 fw_loader.bin用於 KDP1 legacy KDP2 升級的 SDK
loader stageKL720 不需要 loader不走 SDK loader path直接 ctypes
呼叫 kp_update_kdp_firmware_from_files 也不需要 loader
Reviewer m1 chip 參數做雙重 allow-list 防護chip 來自 JSON-RPC stdin
雖然 caller (handle_firmware_upgrade) enforce allow-list但這裡再過一道
避免未來 caller 拓寬時破防額外拒絕含 path separator / 父目錄 / 絕對路徑
的非法輸入確保 os.path.join 絕不 traverse
Returns:
dict: {"scpu": <path>, "ncpu": <path>, "loader": <path or None>,
"version": <str or None>}
scpu/ncpu 任一缺檔scpu/ncpu None
"""
# 雙重 allow-list 防護caller 已過一次、這裡再過一次防 path traversal
if not isinstance(chip, str) or chip not in _FW_ALLOWED_CHIPS:
return {"scpu": None, "ncpu": None, "loader": None, "version": None}
# 額外字元防護(即使 _FW_ALLOWED_CHIPS 拓寬到不安全字串也擋)
if "/" in chip or "\\" in chip or ".." in chip or os.path.isabs(chip):
return {"scpu": None, "ncpu": None, "loader": None, "version": None}
base = os.path.dirname(os.path.abspath(__file__))
fw_dir = os.path.join(base, "firmware", chip)
scpu = os.path.join(fw_dir, "fw_scpu.bin")
ncpu = os.path.join(fw_dir, "fw_ncpu.bin")
loader = os.path.join(fw_dir, "fw_loader.bin")
version_file = os.path.join(fw_dir, "VERSION")
result = {"scpu": None, "ncpu": None, "loader": None, "version": None}
if os.path.exists(scpu) and os.path.exists(ncpu):
result["scpu"] = scpu
result["ncpu"] = ncpu
if os.path.exists(loader):
result["loader"] = loader
if os.path.exists(version_file):
try:
with open(version_file, "r", encoding="utf-8") as f:
result["version"] = f.read().strip()
except Exception:
pass
# Fallback: KNERON_FW_DIR env var
if result["scpu"] is None or result["ncpu"] is None:
env_dir = os.environ.get("KNERON_FW_DIR", "")
if env_dir:
scpu2 = os.path.join(env_dir, "fw_scpu.bin")
ncpu2 = os.path.join(env_dir, "fw_ncpu.bin")
if os.path.exists(scpu2) and os.path.exists(ncpu2):
result["scpu"] = scpu2
result["ncpu"] = ncpu2
loader2 = os.path.join(env_dir, "fw_loader.bin")
if os.path.exists(loader2):
result["loader"] = loader2
return result
def _detect_model_type(model_id, nef_path): def _detect_model_type(model_id, nef_path):
"""Detect model type and input size from model ID or .nef filename.""" """Detect model type and input size from model ID or .nef filename."""
global _model_type, _model_input_size global _model_type, _model_input_size
@ -1134,6 +1202,787 @@ def handle_inference(params):
return {"error": str(e)} return {"error": str(e)}
# ── Firmware upgrade (A 階段 M9-1) ───────────────────────────────────
#
# 對應 TDD v2/firmware-management.md §5.1 / §6.1
# - 自動升級 KDP1 legacy → KDP2含 KL520USB Boot mode + loader stage
# 與 KL720含 KDP legacy pid=0x0200
# - Stage 命名採 Designpreparing / loading / flashing / verifying / done / error
# TDD §4.3 為 source of truth
# - 失敗 reason enumTDD §3.4scan_not_found / connect_failed /
# loader_write_failed / upgrade_mid_failed / disconnect_during_op /
# timeout / verify_mismatch / verify_not_found。
#
# 為什麼走 ctypesKneronPLUS Python wrapper 沒 export
# `kp_update_kdp_firmware_from_files`(見 research-kl520-fw-management/
# 56-m9-6-strong-validation-result.md 附帶發現 1warrenchen reference
# 實作 `LocalAPI/legacy_plus121_runner.py` 直接 ctypes 打 C symbol本檔
# 沿用該模式。
KDP_MAGIC_CONNECTION_PASS = 536173391 # 與 warrenchen reference 一致
KP_SUCCESS = 0
USB_WAIT_AFTER_REBOOT_MS = 2000 # SDK loader 階段 reboot 等待
USB_WAIT_AFTER_UPGRADE_MS = 5000 # AC-FW-1.6:升級後 5-8s USB stable
USB_WAIT_RETRY_CONNECT_MS = 200
MAX_RECONNECT_RETRIES = 15 # 5s sleep + 15 * 200ms = 8s 上界
KL520_UPGRADE_TIMEOUT_S = 60 # AC-FW-1.7
KL720_UPGRADE_TIMEOUT_S = 200 # AC-FW-1.7
# 進度事件 stage % 對照TDD §4.3
_FW_STAGE_PERCENT = {
"preparing": 5,
"loading": 20,
"flashing": 50,
"verifying": 90,
"done": 100,
"error": -1,
}
# 升級進行中旗標SIGTERM handler 用、AC-FW-1.9 graceful shutdown 拒絕)
# Reviewer m4原本還有 _firmware_upgrade_start_ts 全域變數、與 SIGTERM handler
# closure capture 的 start_ts 重複、容易未來 desync → 砍掉、單一 source of truth
# 走 closure。
_firmware_upgrade_in_progress = False
def _fw_normalize_code(code):
"""Convert int8-like unsigned (e.g. 253 for -3) to signed.
warrenchen reference 一致某些 legacy 路徑回 unsigned int8
"""
try:
c = int(code)
except Exception:
return code
if c > 127:
return c - 256
return c
def _fw_emit_progress(stage, message="", elapsed_ms=0, eta_ms=0, extra=None):
"""Push a progress event to stderr as a JSON-RPC notification line.
Go driver stderr line-by-line轉成 WebSocket FirmwareProgress 給前端
Schema 對齊 TDD §4.2 `FirmwareProgress`
{"event": "firmware_progress", "percent": int, "stage": str,
"message": str, "elapsed_ms": int, "eta_ms": int, ...}
Stage `error` caller push 額外 reason / raw_error / before_version
透過 extra dict
"""
payload = {
"event": "firmware_progress",
"percent": _FW_STAGE_PERCENT.get(stage, 0),
"stage": stage,
"message": message,
"elapsed_ms": int(elapsed_ms),
"eta_ms": int(eta_ms),
}
if extra:
payload.update(extra)
try:
# 寫到 stderr、與既有 _log() 同 fd、但用 JSON 格式(不加 [kneron_bridge] prefix
# 方便 Go driver 區分「progress event JSON」vs「自由文字 log」。
print(json.dumps(payload), file=sys.stderr, flush=True)
except Exception:
# progress emit 失敗不該影響升級流程本身
pass
def _fw_load_libkplus():
"""Load libkplus shared library via ctypes、bind needed C symbol signatures.
跨平台macOS .dylib / Linux .so / Windows .dll優先用 `kp` module 已載
入的 lib path避免重複載入造成 mismatchfallback wheel lib/ 目錄
Raises:
RuntimeError: libkplus 找不到或符號 binding 失敗
"""
import ctypes
import importlib.util
spec = importlib.util.find_spec("kp")
if spec is None or not spec.submodule_search_locations:
raise RuntimeError("kp module spec not found")
kp_dir = spec.submodule_search_locations[0]
lib_dir = os.path.join(kp_dir, "lib")
# 平台對應的 lib filename
if sys.platform == "darwin":
lib_name = "libkplus.dylib"
elif sys.platform == "win32":
lib_name = "libkplus.dll"
else:
lib_name = "libkplus.so"
lib_path = os.path.join(lib_dir, lib_name)
if not os.path.isfile(lib_path):
# Windows 可能用其他命名warrenchen reference 是 libkplus.dll
# 嘗試找任何 libkplus* 檔案
# Reviewer m2sort() 確保 deterministic 順序、不依賴 os.listdir 回傳次序
candidates = sorted(
f for f in os.listdir(lib_dir) if f.startswith("libkplus")
)
if not candidates:
raise RuntimeError(f"libkplus not found in {lib_dir}")
lib_path = os.path.join(lib_dir, candidates[0])
_log(f"WARNING: libkplus fallback using {candidates[0]} (primary {lib_name} not found)")
# Windows: add_dll_directory 確保相依 dll 可解析
if sys.platform == "win32" and hasattr(os, "add_dll_directory"):
try:
os.add_dll_directory(lib_dir)
except Exception:
pass
lib = ctypes.CDLL(lib_path)
# Bind C symbol signatures與 warrenchen reference 完全一致)
lib.kp_connect_devices.argtypes = [
ctypes.c_int, # num_devices
ctypes.POINTER(ctypes.c_int), # usb_port_ids
ctypes.POINTER(ctypes.c_int), # status_out
]
lib.kp_connect_devices.restype = ctypes.c_void_p # device_group handle
lib.kp_set_timeout.argtypes = [ctypes.c_void_p, ctypes.c_int]
lib.kp_set_timeout.restype = None
lib.kp_load_firmware_from_file.argtypes = [
ctypes.c_void_p, ctypes.c_char_p, ctypes.c_char_p
]
lib.kp_load_firmware_from_file.restype = ctypes.c_int
lib.kp_update_kdp_firmware_from_files.argtypes = [
ctypes.c_void_p, # device_group
ctypes.c_char_p, # scpu_or_loader path
ctypes.c_char_p, # ncpu path or NULL
ctypes.c_bool, # auto_reboot
]
lib.kp_update_kdp_firmware_from_files.restype = ctypes.c_int
lib.kp_disconnect_devices.argtypes = [ctypes.c_void_p]
lib.kp_disconnect_devices.restype = ctypes.c_int
if hasattr(lib, "kp_error_string"):
lib.kp_error_string.argtypes = [ctypes.c_int]
lib.kp_error_string.restype = ctypes.c_char_p
return lib
def _fw_errstr(lib, code):
"""Decode kp error code → string via kp_error_string()。
warrenchen 一致先試 raw code若無回應再試 signed normalize 後值
"""
signed = _fw_normalize_code(code)
if hasattr(lib, "kp_error_string"):
try:
msg = lib.kp_error_string(int(code))
if not msg and signed != code:
msg = lib.kp_error_string(int(signed))
if msg:
return msg.decode("utf-8", errors="replace")
except Exception:
pass
return f"code={code}"
def _fw_connect_with_magic(lib, port_id):
"""Connect with magic pass = 536173391 (允許 KDP1 legacy device 連線)。
Returns:
device_group handle (c_void_p int).
Raises:
RuntimeError("connect_failed: ...") on failure.
"""
import ctypes
port_ids = (ctypes.c_int * 1)(int(port_id))
status = ctypes.c_int(KDP_MAGIC_CONNECTION_PASS)
dg = lib.kp_connect_devices(1, port_ids, ctypes.byref(status))
if not dg or status.value != KP_SUCCESS:
signed = _fw_normalize_code(status.value)
raise RuntimeError(
f"connect_failed: raw_code={status.value}, signed={signed}, "
f"msg={_fw_errstr(lib, status.value)}"
)
return dg
def _fw_scan_target(port):
"""Scan devices via kp.core.scan_devices() and find target by usb_port_id.
Returns:
descriptor or None.
"""
try:
descs = kp.core.scan_devices()
except Exception as e:
_log(f"fw_scan_target: scan_devices failed: {e}")
return None
if descs.device_descriptor_number == 0:
return None
for i in range(descs.device_descriptor_number):
dev = descs.device_descriptor_list[i]
if port and str(dev.usb_port_id) == str(port):
return dev
return None
def _fw_rescan_and_wait(port, max_wait_s=8.0, initial_sleep_s=5.0):
"""等 USB re-enumerate stable → rescan 找回 target by port (AC-FW-1.6)。
Args:
port: usb_port_id升級後 re-enumerate 通常保留同 port
max_wait_s: initial_sleep_s 過後再加 max_wait_s - initial_sleep_s
秒輪詢上界實測 5 秒已穩保留上界 8 AC-FW-1.6
initial_sleep_s: 第一次 rescan 前固定等的秒數
Returns:
(descriptor or None, total_wait_s).
"""
time.sleep(initial_sleep_s)
waited = initial_sleep_s
target = _fw_scan_target(port)
if target is not None:
return target, waited
# 多輪 short-poll
poll_step = 0.5
while waited < max_wait_s:
time.sleep(poll_step)
waited += poll_step
target = _fw_scan_target(port)
if target is not None:
return target, waited
return None, waited
def _fw_classify_legacy(firmware_str, product_id):
"""判斷 device 是否為 KDP1 legacy state需走 loader stage
KL520 legacy 訊號firmware 字串為 "KDP""KDP1""KDP1.x""USB Boot"
"USB Boot Loader""LOADER" legacy state或空字串
某些 USB Boot state 不回 firmware string
KL720 legacy 訊號product_id == 0x0200 (KP_DEVICE_KL720_LEGACY)
Reviewer M3 + s3原本只用 substring match `"KDP" in fw and "KDP2" not in fw`
KDP3未來 firmware會誤判 legacy 改用顯式 prefix 比對表 + 已知字串
enumeration確保覆蓋 KDP1 各種 firmware 字串變體forward-compat KDP3+
Returns True if needs SDK loader stageFalse if can short-circuit to flashing.
"""
if product_id == 0x0200:
return True # KL720 KDP1 legacypid 明示、不靠 firmware 字串)
fw = (firmware_str or "").strip().upper()
# 已知 KDP1 legacy firmware 字串完整列舉(明示比對、不靠 substring
legacy_exact = {
"", # 某些 USB Boot state 不回 firmware string
"KDP",
"KDP1",
"USB BOOT",
"USB BOOT LOADER",
"LOADER",
"BOOTLOADER",
}
if fw in legacy_exact:
return True
# KDP1.xKDP1.0 / KDP1.5 等版本字串)
if fw.startswith("KDP1.") or fw.startswith("KDP1 "):
return True
# 明示放行 KDP2 / KDP3+forward-compat、避免 substring match 對未來 firmware 誤判)
# KDP2.x / KDP3.x / KDP4.x ... 皆為 modern firmware、不需走 loader
for prefix in ("KDP2", "KDP3", "KDP4", "KDP5", "KDP6", "KDP7", "KDP8", "KDP9"):
if fw.startswith(prefix):
return False
# 未知 firmware 字串:保守 default = 不走 loader避免誤觸 loader stage brick device
# 例:未來 firmware 用全新命名("NEF"、"K3"、等)→ 假設是 modern firmware
# 若這判斷錯了、verify 階段會 detect verify_mismatch、不致 brick
return False
def _fw_eta_ms(chip, current_stage):
"""估算剩餘 ms給前端顯示 ~X 秒、非精確)。
TDD §4.2UI 顯示~X remaining精度低可接受
"""
# 各 stage 預估完成時刻(以升級開始為 0
if chip == "KL520":
total_ms = 30000 # AC-FW-1.7 預估 30s
cum = {"preparing": 2000, "loading": 8000, "flashing": 22000, "verifying": 28000}
else: # KL720
total_ms = 180000 # AC-FW-1.7 預估 180s
cum = {"preparing": 5000, "loading": 30000, "flashing": 160000, "verifying": 175000}
done_at = cum.get(current_stage, total_ms)
return max(0, total_ms - done_at)
# ── Firmware upgrade exceptions + failure handler ────────────────────
#
# Reviewer M1原本 _FwError / _FwTimeoutError / _fw_handle_failure 宣告位於
# handle_firmware_upgrade **之後**(語法上 Python module load 時會先掃完整個檔
# 才走 handler、所以 happy-path 不會炸 NameError、但 readability 差、且若有人
# 在 handler 中間插入 module-level code 觸發呼叫就會炸)。
# 移到 handler 之前、讓讀者從上而下能理解 error flow。
class _FwError(Exception):
"""Internal exception carrying (stage, reason, message) for firmware ops."""
def __init__(self, stage, reason, message):
super().__init__(message)
self.stage = stage
self.reason = reason
self.message = message
class _FwTimeoutError(Exception):
"""Raised when total upgrade duration exceeds chip timeout."""
def __init__(self, stage):
super().__init__(f"timeout at stage={stage}")
self.stage = stage
def _fw_handle_failure(stage, reason, message, before_fw, start_ts, dg, lib, raw=""):
"""彙整失敗 progress event + return 給 caller 的 error dict。
對齊 TDD §6.1 失敗回傳格式
{"error":<str>, "stage":<str>, "reason":<str>, "raw_error":<str>}
Reviewer m3原本此 helper disconnectcaller finally disconnect
雙重 disconnect SDK 行為未定改成single owner of disconnect原則
helper 不再 disconnect caller finally 統一處理本函式只負責 emit
progress event + 組裝 error dict
"""
elapsed = int((time.monotonic() - start_ts) * 1000)
_log(f"firmware_upgrade FAILED: stage={stage}, reason={reason}, "
f"message={message}, elapsed_ms={elapsed}")
_fw_emit_progress(
"error",
message=message,
elapsed_ms=elapsed,
eta_ms=0,
extra={
"error": message,
"reason": reason,
"raw_error": raw or message,
"before_version": before_fw,
},
)
return {
"error": message,
"stage": stage,
"reason": reason,
"raw_error": raw or message,
}
def handle_firmware_upgrade(params):
"""A 階段 M9-1自動升級 KDP1 → KDP2、KL520 與 KL720。
對應 TDD §6.1 + §5.1 流程
Input: {"port": "<usb_port_id>", "chip": "KL520" | "KL720"}
Output (success):
{"status":"upgraded", "before_firmware":<str>, "after_firmware":<str>,
"method":"ctypes_kp_update_kdp_firmware_from_files",
"duration_ms":<int>}
Output (failure):
{"error":<str>, "stage":<preparing|loading|flashing|verifying>,
"reason":<scan_not_found|connect_failed|loader_write_failed|
upgrade_mid_failed|disconnect_during_op|timeout|
verify_mismatch|verify_not_found>,
"raw_error":<str>}
每進入一個 stage 透過 _fw_emit_progress() progress event stderr
Go driver stderr line-by-line 轉成 WebSocket FirmwareProgress 給前端
"""
global _firmware_upgrade_in_progress
if not HAS_KP:
return {"error": "kp module not available", "stage": "preparing",
"reason": "scan_not_found", "raw_error": "kp not available"}
chip = params.get("chip", "KL520")
port = str(params.get("port", ""))
if chip not in ("KL520", "KL720"):
return {"error": f"unsupported chip for A 階段: {chip}",
"stage": "preparing", "reason": "scan_not_found",
"raw_error": f"chip={chip} not in (KL520, KL720)"}
timeout_s = KL520_UPGRADE_TIMEOUT_S if chip == "KL520" else KL720_UPGRADE_TIMEOUT_S
start_ts = time.monotonic()
def elapsed_ms():
return int((time.monotonic() - start_ts) * 1000)
def check_timeout(current_stage):
if (time.monotonic() - start_ts) > timeout_s:
raise _FwTimeoutError(current_stage)
# ── AC-FW-1.9 graceful shutdown 拒絕:標記升級進行中 ──
# Reviewer m4原本還寫 _firmware_upgrade_start_ts 全域、與 SIGTERM handler
# closure 重複、已移除、改由 closure capture start_ts 為 single source。
_firmware_upgrade_in_progress = True
# 在升降版進入 critical section 期間註冊 SIGTERM handler
# (收 SIGTERM 不立即退、改 log warning event實際 server 端 lock
# 由 M9-2 Go driver / M9-3 service 實作、bridge.py 只負責「正在跑時
# 拒絕被 kill」
_fw_register_sigterm_handler(start_ts)
method = "ctypes_kp_update_kdp_firmware_from_files"
before_fw = ""
lib = None
dg = None
try:
# ── preparingscan + connect ────────────────────────────────
_fw_emit_progress(
"preparing",
message=f"scanning {chip} on port {port}",
elapsed_ms=elapsed_ms(),
eta_ms=_fw_eta_ms(chip, "preparing"),
)
check_timeout("preparing")
# 先 disconnect 既有 _device_group若有、避免 handle 衝突
_clear_device_group()
target = _fw_scan_target(port)
if target is None:
raise _FwError(
"preparing", "scan_not_found",
f"device with port_id={port} not found in scan",
)
before_fw = str(target.firmware)
target_port_id = int(target.usb_port_id)
target_pid = int(target.product_id)
_log(f"firmware_upgrade: chip={chip}, port={target_port_id}, "
f"pid=0x{target_pid:04X}, firmware='{before_fw}'")
# ── 解析 firmware 檔路徑 ─────────────────────────────────────
fw_paths = _resolve_firmware_paths_full(chip)
if fw_paths["scpu"] is None or fw_paths["ncpu"] is None:
raise _FwError(
"preparing", "scan_not_found",
f"firmware files not found for {chip} "
f"(scpu/ncpu missing in server/scripts/firmware/{chip}/)",
)
# ── 載入 libkplus + ctypes binding ──────────────────────────
try:
lib = _fw_load_libkplus()
except Exception as e:
raise _FwError(
"preparing", "connect_failed",
f"libkplus load failed: {e}",
)
# ── connect with magicallow KDP1 legacy device───────────
try:
dg = _fw_connect_with_magic(lib, target_port_id)
except RuntimeError as e:
raise _FwError("preparing", "connect_failed", str(e))
# set timeout for SDK operations注意不是整體 upgrade timeout、
# 是單一 SDK call 的 timeout、避免單個 kp_load/update call 卡住)
lib.kp_set_timeout(dg, int(timeout_s * 1000))
# ── 判斷是否走 SDK loader stage ──────────────────────────────
# Reviewer M2原本控制流隱式`if needs_loader: if loader_path is None: ...`
# nested、讀者不易看清「實際會跑 loading stage」的條件。改為三個顯式 bool
#
# needs_loader = device 處於 KDP1 legacy state_fw_classify_legacy
# should_run_loader_stage = 實際會跑 loading stageloader.bin 存在 + needs_loader
# loader_required_but_missing = KL520 KDP1 legacy 但缺 loader.bin必失敗
#
# 三個情境的流程:
# 1. KL520 KDP1 legacy + loader.bin 存在 → loading → flashing(SDK load)
# → verifying → done (should_run_loader_stage=True)
# 2. KL520 KDP1 legacy + loader.bin 缺 → fail at loading (loader_write_failed)
# 3. KL720 KDP1 legacy + loader.bin 缺 → skip loading、直接 flashing(warrenchen 模式)
# → verifying → done (should_run_loader_stage=False)
# 4. already KDP2KL520/KL720→ skip loading、直接 flashing(warrenchen 模式)
# → verifying → done (should_run_loader_stage=False)
needs_loader = _fw_classify_legacy(before_fw, target_pid)
loader_path = fw_paths["loader"]
should_run_loader_stage = needs_loader and loader_path is not None
loader_required_but_missing = (
needs_loader and loader_path is None and chip == "KL520"
)
_log(f"firmware_upgrade: needs_loader={needs_loader}, "
f"should_run_loader_stage={should_run_loader_stage}, "
f"loader_required_but_missing={loader_required_but_missing}, "
f"legacy={'yes' if needs_loader else 'no'}")
# ── 情境 2KL520 KDP1 legacy 但缺 loader.bin → 直接失敗 ─────
if loader_required_but_missing:
check_timeout("loading")
raise _FwError(
"loading", "loader_write_failed",
f"fw_loader.bin not found for {chip} but device is in "
f"KDP1 legacy state (firmware='{before_fw}')",
)
# ── 情境 1跑 loading stageKL520 KDP1 legacy + loader.bin──
if should_run_loader_stage:
check_timeout("loading")
_fw_emit_progress(
"loading",
message="writing USB Boot loader firmware",
elapsed_ms=elapsed_ms(),
eta_ms=_fw_eta_ms(chip, "loading"),
)
ret = lib.kp_update_kdp_firmware_from_files(
dg,
loader_path.encode("utf-8"),
None, # loader stage: ncpu = NULL
True, # auto_reboot
)
if ret != KP_SUCCESS:
raise _FwError(
"loading", "loader_write_failed",
f"kp_update_kdp_firmware_from_files(loader) ret={ret} "
f"({_fw_errstr(lib, ret)})",
)
# auto_reboot 後 disconnect 可能失敗USB re-enumerate容忍
try:
lib.kp_disconnect_devices(dg)
except Exception:
pass
# disconnect 完設 dg=None、避免 finally double-disconnect 已 freed handle
dg = None
# 等 device reboot 完進 USB Boot modeLoader firmware loaded
time.sleep(USB_WAIT_AFTER_REBOOT_MS / 1000.0)
# rescan + reconnect with magic
target = _fw_scan_target(port)
if target is None:
raise _FwError(
"loading", "disconnect_during_op",
f"device disappeared after loader write, port={port}",
)
try:
dg = _fw_connect_with_magic(lib, int(target.usb_port_id))
except RuntimeError as e:
raise _FwError(
"loading", "connect_failed",
f"reconnect after loader failed: {e}",
)
lib.kp_set_timeout(dg, int(timeout_s * 1000))
elif needs_loader:
# 情境 3KL720 KDP1 legacy 沒 loader.bin → 跳過 loading、直接 flashing
# warrenchen 模式kp_update_kdp_firmware_from_files(scpu, ncpu, True) 一次寫
_log(f"firmware_upgrade: {chip} legacy without loader.bin、"
f"skipping loading stage, will go directly to flashing")
# ── flashing寫入 KDP2 firmwarescpu + ncpu─────────────
check_timeout("flashing")
_fw_emit_progress(
"flashing",
message="writing KDP2 firmware (scpu + ncpu)",
elapsed_ms=elapsed_ms(),
eta_ms=_fw_eta_ms(chip, "flashing"),
)
if should_run_loader_stage:
# 情境 1device 已透過 loader stage 進 Loader mode、用
# kp_load_firmware_from_file 載 scpu + ncpu 到 RAM
ret = lib.kp_load_firmware_from_file(
dg,
fw_paths["scpu"].encode("utf-8"),
fw_paths["ncpu"].encode("utf-8"),
)
if ret != KP_SUCCESS:
raise _FwError(
"flashing", "upgrade_mid_failed",
f"kp_load_firmware_from_file ret={ret} "
f"({_fw_errstr(lib, ret)})",
)
else:
# 情境 3 / 4沒走 loader stageKL720 legacy without loader.bin、
# 或 already KDP2→ warrenchen 模式:直接
# kp_update_kdp_firmware_from_files(scpu, ncpu, True) 一次寫
ret = lib.kp_update_kdp_firmware_from_files(
dg,
fw_paths["scpu"].encode("utf-8"),
fw_paths["ncpu"].encode("utf-8"),
True, # auto_reboot
)
if ret != KP_SUCCESS:
raise _FwError(
"flashing", "upgrade_mid_failed",
f"kp_update_kdp_firmware_from_files ret={ret} "
f"({_fw_errstr(lib, ret)})",
)
# disconnect after upgradeauto_reboot 後 disconnect 失敗預期、容忍
try:
lib.kp_disconnect_devices(dg)
except Exception:
pass
dg = None
# ── verifying等 USB re-enumerate → rescan → 驗 firmware 字串 ──
check_timeout("verifying")
_fw_emit_progress(
"verifying",
message="waiting USB re-enumerate and verifying firmware version",
elapsed_ms=elapsed_ms(),
eta_ms=_fw_eta_ms(chip, "verifying"),
)
# AC-FW-1.6: 等 5-8 秒 USB stable
target_after, waited = _fw_rescan_and_wait(
port,
max_wait_s=USB_WAIT_AFTER_UPGRADE_MS / 1000.0 + 3.0, # 5 + 3 = 8s 上界
initial_sleep_s=USB_WAIT_AFTER_UPGRADE_MS / 1000.0,
)
if target_after is None:
raise _FwError(
"verifying", "verify_not_found",
f"device not found after upgrade (waited {waited:.1f}s)、"
f"USB may still be re-enumerating, please re-plug",
)
after_fw = str(target_after.firmware)
after_pid = int(target_after.product_id)
# 驗證 firmware 字串已升到 KDP2不再是 KDP1 legacy
if _fw_classify_legacy(after_fw, after_pid):
raise _FwError(
"verifying", "verify_mismatch",
f"firmware after upgrade still appears legacy: "
f"firmware='{after_fw}', pid=0x{after_pid:04X}",
)
# ── done ──
duration_ms = elapsed_ms()
_fw_emit_progress(
"done",
message=f"upgraded from '{before_fw}' to '{after_fw}'",
elapsed_ms=duration_ms,
eta_ms=0,
)
return {
"status": "upgraded",
"before_firmware": before_fw,
"after_firmware": after_fw,
"method": method,
"duration_ms": duration_ms,
}
except _FwTimeoutError as e:
return _fw_handle_failure(
e.stage, "timeout",
f"upgrade exceeded {timeout_s}s timeout at stage={e.stage}",
before_fw, start_ts, dg, lib, raw=str(e),
)
except _FwError as e:
return _fw_handle_failure(
e.stage, e.reason, e.message, before_fw, start_ts, dg, lib, raw=str(e),
)
except Exception as e:
import traceback
tb = traceback.format_exc()
_log(f"firmware_upgrade UNEXPECTED EXCEPTION: {type(e).__name__}: {e}\n{tb}")
return _fw_handle_failure(
"flashing", "upgrade_mid_failed",
f"unexpected: {type(e).__name__}: {e}",
before_fw, start_ts, dg, lib, raw=tb,
)
finally:
_firmware_upgrade_in_progress = False
# Reviewer m3disconnect 的 single owner = 此 finally block。
# _fw_handle_failure 已改為「不在裡面 disconnect」、避免 double-disconnect。
# success path 在 1810 行已 disconnect 並設 dg=None、此處 if dg is not None
# 會 short-circuit 跳過、不會 double。
# fail pathdg 可能還持有 handle、由本 finally 統一收尾。
if dg is not None and lib is not None:
try:
lib.kp_disconnect_devices(dg)
except Exception:
pass
dg = None # 確保不會被外部誤用
_fw_unregister_sigterm_handler()
# ── SIGTERM handler (AC-FW-1.9 graceful shutdown rejection) ──────────
#
# 升級進行中收到 SIGTERM 時,不立即退出、改在 stderr push warning event。
# 實際的 server-side lock 機制由 M9-2 / M9-3 實作progress.md「未解決問題」
# 註記為依賴)。本處 bridge.py 端的責任:「正在跑時拒絕被 kill」。
#
# Windows 沒有 SIGTERM 概念、改用 atexit。Linux/macOS 用 signal handler。
_fw_original_sigterm_handler = None
def _fw_register_sigterm_handler(start_ts):
"""註冊 SIGTERM handler升級進行中時拒絕並 log warning。"""
global _fw_original_sigterm_handler
if sys.platform == "win32":
return # Windows 沒 SIGTERM
try:
import signal
def handler(signum, frame):
if _firmware_upgrade_in_progress:
elapsed = int((time.monotonic() - start_ts) * 1000)
try:
print(
json.dumps({
"event": "shutdown_rejected",
"reason": "firmware_upgrade_in_progress",
"task": "firmware_upgrade",
"elapsed_ms": elapsed,
}),
file=sys.stderr,
flush=True,
)
except Exception:
pass
# 拒絕 SIGTERM不呼叫 sys.exit、不 raise、繼續執行升級
return
# 沒升級進行中、走預設行為
if callable(_fw_original_sigterm_handler):
_fw_original_sigterm_handler(signum, frame)
else:
sys.exit(0)
_fw_original_sigterm_handler = signal.signal(signal.SIGTERM, handler)
except Exception as e:
_log(f"SIGTERM handler registration failed: {e}")
def _fw_unregister_sigterm_handler():
"""還原 SIGTERM handler 為 install 前狀態。"""
global _fw_original_sigterm_handler
if sys.platform == "win32":
return
try:
import signal
if _fw_original_sigterm_handler is not None:
signal.signal(signal.SIGTERM, _fw_original_sigterm_handler)
_fw_original_sigterm_handler = None
else:
signal.signal(signal.SIGTERM, signal.SIG_DFL)
except Exception:
pass
# ── Main loop ──────────────────────────────────────────────────────── # ── Main loop ────────────────────────────────────────────────────────
def main(): def main():
@ -1175,6 +2024,8 @@ def main():
result = handle_load_model(cmd) result = handle_load_model(cmd)
elif action == "inference": elif action == "inference":
result = handle_inference(cmd) result = handle_inference(cmd)
elif action == "firmware_upgrade":
result = handle_firmware_upgrade(cmd)
else: else:
result = {"error": f"unknown command: {action}"} result = {"error": f"unknown command: {action}"}
_respond(result) _respond(result)

View File

@ -0,0 +1,840 @@
#!/usr/bin/env python3
"""Unit tests for kneron_bridge.handle_firmware_upgrade (A 階段 M9-1).
Mock-based tests no real Kneron dongle needed. Covers TDD §6.1 handler
contract:
- 5 successful path stages all fire progress events
- 4 failure paths (scan_not_found / connect_failed / loader_write_failed /
verify_mismatch)
- timeout护栏 (KL520 60s / KL720 200s)
- graceful shutdown SIGTERM rejection during upgrade in progress
執行方式
cd server/scripts && python3 test_kneron_bridge_firmware.py
"""
from __future__ import annotations
import io
import json
import os
import sys
import time
import unittest
from unittest import mock
# 確保 import 路徑正確
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
# ── 在 import bridge 前 fake kp module避免實機相依─────────────────
class _FakeKpCore:
def scan_devices(self):
raise NotImplementedError("must be patched per test")
def disconnect_devices(self, *args, **kwargs):
return 0
class _FakeKp:
core = _FakeKpCore()
# 注入 fake kp 給 bridge 在 import 時取代真實 kp
sys.modules.setdefault("kp", _FakeKp())
import kneron_bridge as bridge # noqa: E402
# ── Helperfake device descriptor ───────────────────────────────────
class FakeDeviceDescriptor:
def __init__(self, usb_port_id, product_id, firmware, kn_number=0x12345678,
is_connectable=True):
self.usb_port_id = usb_port_id
self.product_id = product_id
self.firmware = firmware
self.kn_number = kn_number
self.is_connectable = is_connectable
class FakeDeviceList:
def __init__(self, devices):
self.device_descriptor_list = devices
self.device_descriptor_number = len(devices)
# ── Helperfake libkplusctypes.CDLL 替身)─────────────────────────
class FakeLib:
"""Mock libkplus shared library with same surface as ctypes binding."""
def __init__(self):
self.upgrade_calls = [] # list of (scpu_or_loader, ncpu_or_None, auto_reboot)
self.load_calls = [] # list of (scpu, ncpu)
self.connect_calls = []
self.disconnect_calls = 0
self.timeout_calls = []
# 控制 mock 行為的 knob
self.upgrade_return = 0
self.load_return = 0
self.connect_return = (0xCAFEBABE, 0) # (handle, status)
# 模擬 time.sleep 時間(測試端不真睡)
self._sleep_skipped = True
def kp_scan_devices(self):
return 0xDEADBEEF # 不會被用到_fw_scan_target 走 kp.core.scan_devices
def kp_connect_devices(self, n, ports_ptr, status_ptr):
# ctypes c_int.value 取出
port_id = ports_ptr[0]
self.connect_calls.append(port_id)
handle, status = self.connect_return
status_ptr._obj.value = status
return handle
def kp_set_timeout(self, dg, ms):
self.timeout_calls.append(ms)
def kp_load_firmware_from_file(self, dg, scpu, ncpu):
self.load_calls.append((scpu, ncpu))
return self.load_return
def kp_update_kdp_firmware_from_files(self, dg, scpu_or_loader, ncpu_or_none, auto_reboot):
self.upgrade_calls.append((scpu_or_loader, ncpu_or_none, auto_reboot))
return self.upgrade_return
def kp_disconnect_devices(self, dg):
self.disconnect_calls += 1
return 0
def kp_error_string(self, code):
return f"mock_err({code})".encode("utf-8")
# ── 共用 fixture每個 test 用乾淨 FakeLib + sleep stub ─────────────
class FirmwareUpgradeTestBase(unittest.TestCase):
"""Patches common to all tests so handler doesn't touch real Kneron stack."""
def setUp(self):
self.fake_lib = FakeLib()
# 收集所有 progress events
self.progress_events = []
# 真實 stderr 改 catch、避免 test output 髒
self._stderr_capture = io.StringIO()
# Patch HAS_KP = True
self._has_kp_patch = mock.patch.object(bridge, "HAS_KP", True)
self._has_kp_patch.start()
# Patch _fw_load_libkplus → return our FakeLib
self._load_lib_patch = mock.patch.object(
bridge, "_fw_load_libkplus", return_value=self.fake_lib
)
self._load_lib_patch.start()
# Patch firmware path resolver預設 scpu/ncpu/loader 都齊
self._fw_paths = {
"scpu": "/fake/firmware/KL520/fw_scpu.bin",
"ncpu": "/fake/firmware/KL520/fw_ncpu.bin",
"loader": "/fake/firmware/KL520/fw_loader.bin",
"version": "v2.2.0",
}
self._resolve_paths_patch = mock.patch.object(
bridge, "_resolve_firmware_paths_full",
side_effect=lambda chip: self._fw_paths,
)
self._resolve_paths_patch.start()
# Patch time.sleep → no-op測試端不真睡
self._sleep_patch = mock.patch.object(bridge.time, "sleep", lambda x: None)
self._sleep_patch.start()
# Patch _fw_emit_progress 收集事件、同時仍寫一份到 stderr stub
original_emit = bridge._fw_emit_progress
def _capture_emit(stage, message="", elapsed_ms=0, eta_ms=0, extra=None):
event = {
"stage": stage,
"message": message,
"elapsed_ms": elapsed_ms,
"eta_ms": eta_ms,
}
if extra:
event.update(extra)
self.progress_events.append(event)
self._emit_patch = mock.patch.object(
bridge, "_fw_emit_progress", side_effect=_capture_emit
)
self._emit_patch.start()
# Patch _clear_device_group → no-op避免 touch _device_group 全域)
self._clear_patch = mock.patch.object(
bridge, "_clear_device_group", lambda: None
)
self._clear_patch.start()
def tearDown(self):
self._has_kp_patch.stop()
self._load_lib_patch.stop()
self._resolve_paths_patch.stop()
self._sleep_patch.stop()
self._emit_patch.stop()
self._clear_patch.stop()
# 確保 sigterm handler 還原(避免 test 間互相影響)
try:
bridge._fw_unregister_sigterm_handler()
except Exception:
pass
bridge._firmware_upgrade_in_progress = False
def stub_scan_returning(self, *device_lists):
"""Patch kp.core.scan_devices 依次回不同的 device list.
Args:
*device_lists: 每個 list [FakeDeviceDescriptor, ...]
"""
results = [FakeDeviceList(devs) for devs in device_lists]
it = iter(results)
def _next_scan():
try:
return next(it)
except StopIteration:
# 多餘的 scan call 重複回最後一個結果(測試容忍)
return results[-1] if results else FakeDeviceList([])
return mock.patch.object(bridge.kp.core, "scan_devices", side_effect=_next_scan)
# ── 5 個成功路徑測試 ──────────────────────────────────────────────────
class TestFirmwareUpgradeSuccess(FirmwareUpgradeTestBase):
def test_kl520_kdp1_legacy_full_5_stages(self):
"""KL520 KDP1 legacy → KDP2preparing/loading/flashing/verifying/done 5 stage 都 fire."""
legacy_dev = FakeDeviceDescriptor(
usb_port_id=42, product_id=0x100, firmware="KDP",
)
post_loader_dev = FakeDeviceDescriptor(
usb_port_id=42, product_id=0x100, firmware="USB Boot Loader",
)
kdp2_dev = FakeDeviceDescriptor(
usb_port_id=42, product_id=0x100, firmware="KDP2",
)
with self.stub_scan_returning([legacy_dev], [post_loader_dev], [kdp2_dev]):
result = bridge.handle_firmware_upgrade({"port": "42", "chip": "KL520"})
self.assertEqual(result["status"], "upgraded", msg=f"unexpected result: {result}")
self.assertEqual(result["before_firmware"], "KDP")
self.assertEqual(result["after_firmware"], "KDP2")
self.assertIn("ctypes", result["method"])
self.assertGreaterEqual(result["duration_ms"], 0)
# 驗證 5 個 stage 都 fire
stages = [e["stage"] for e in self.progress_events]
self.assertEqual(
stages, ["preparing", "loading", "flashing", "verifying", "done"],
msg=f"unexpected stage sequence: {stages}",
)
# KL520 KDP1 legacyloader.bin 寫一次kp_update_kdp_firmware_from_files
# + scpu/ncpu 載一次kp_load_firmware_from_file
self.assertEqual(len(self.fake_lib.upgrade_calls), 1,
msg="loader write should be called exactly once")
self.assertEqual(len(self.fake_lib.load_calls), 1,
msg="kp_load_firmware_from_file should be called once after loader")
def test_kl520_already_kdp2_short_circuit(self):
"""KL520 已是 KDP2跳過 loader stage、直接 flashing用 kp_update_kdp_firmware_from_files."""
kdp2_dev = FakeDeviceDescriptor(
usb_port_id=10, product_id=0x100, firmware="KDP2.5",
)
kdp2_after = FakeDeviceDescriptor(
usb_port_id=10, product_id=0x100, firmware="KDP2.5",
)
with self.stub_scan_returning([kdp2_dev], [kdp2_after]):
result = bridge.handle_firmware_upgrade({"port": "10", "chip": "KL520"})
self.assertEqual(result["status"], "upgraded")
stages = [e["stage"] for e in self.progress_events]
# KDP2 short-circuitpreparing → flashing → verifying → done無 loading
self.assertEqual(stages, ["preparing", "flashing", "verifying", "done"])
# kp_update_kdp_firmware_from_files 用兩 path 模式scpu + ncpu
self.assertEqual(len(self.fake_lib.upgrade_calls), 1)
scpu_path, ncpu_path, auto_reboot = self.fake_lib.upgrade_calls[0]
self.assertIn(b"fw_scpu", scpu_path)
self.assertIn(b"fw_ncpu", ncpu_path)
self.assertTrue(auto_reboot)
def test_kl720_kdp_legacy(self):
"""KL720 KDP1 legacy (pid=0x200):走 flashing 路徑warrenchen 模式)."""
legacy = FakeDeviceDescriptor(
usb_port_id=5, product_id=0x200, firmware="KDP",
)
after = FakeDeviceDescriptor(
usb_port_id=5, product_id=0x720, firmware="KDP2",
)
# KL720 沒 loader.binwarrenchen 也沒附)
self._fw_paths["loader"] = None
with self.stub_scan_returning([legacy], [after]):
result = bridge.handle_firmware_upgrade({"port": "5", "chip": "KL720"})
self.assertEqual(result["status"], "upgraded")
# KL720 legacy 沒 loader.binpreparing → flashing → verifying → done
stages = [e["stage"] for e in self.progress_events]
self.assertEqual(stages, ["preparing", "flashing", "verifying", "done"])
def test_progress_event_schema_has_required_fields(self):
"""進度事件 schema 對齊 TDD §4.2stage, elapsed_ms, eta_ms 必填."""
dev = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP2")
with self.stub_scan_returning([dev], [dev]):
bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
for e in self.progress_events:
self.assertIn("stage", e)
self.assertIn("elapsed_ms", e)
self.assertIn("eta_ms", e)
self.assertIsInstance(e["elapsed_ms"], int)
def test_done_stage_returns_duration_ms(self):
"""done event 必須有 duration_mscaller 取 elapsed_ms."""
dev = FakeDeviceDescriptor(usb_port_id=2, product_id=0x100, firmware="KDP2")
with self.stub_scan_returning([dev], [dev]):
result = bridge.handle_firmware_upgrade({"port": "2", "chip": "KL520"})
self.assertIn("duration_ms", result)
# done event 的 elapsed_ms 應該 = duration_msfinishing-time alignment
done_event = [e for e in self.progress_events if e["stage"] == "done"][0]
self.assertEqual(done_event["elapsed_ms"], result["duration_ms"])
# ── 4 個失敗路徑測試 ──────────────────────────────────────────────────
class TestFirmwareUpgradeFailure(FirmwareUpgradeTestBase):
def test_scan_not_found(self):
"""scan 找不到 target portpreparing stage failure with reason=scan_not_found."""
with self.stub_scan_returning([]): # empty scan
result = bridge.handle_firmware_upgrade({"port": "999", "chip": "KL520"})
self.assertIn("error", result)
self.assertEqual(result["stage"], "preparing")
self.assertEqual(result["reason"], "scan_not_found")
self.assertIn("not found", result["error"].lower())
# 應該 fire preparing + error 兩個 event
stages = [e["stage"] for e in self.progress_events]
self.assertIn("preparing", stages)
self.assertIn("error", stages)
def test_connect_failed(self):
"""ctypes connect 回 status != KP_SUCCESSpreparing/connect_failed."""
dev = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP")
# 讓 connect 回 non-zero status
self.fake_lib.connect_return = (0, -3) # handle=NULL, status=-3
with self.stub_scan_returning([dev]):
result = bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
self.assertEqual(result["stage"], "preparing")
self.assertEqual(result["reason"], "connect_failed")
# 錯誤訊息應該包含 raw error 線索
self.assertIn("connect", result["error"].lower())
def test_loader_write_failed(self):
"""KL520 KDP1 legacyloader 寫入回 non-zero → loading/loader_write_failed."""
legacy_dev = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP")
# 讓 kp_update_kdp_firmware_from_files 回 error code
self.fake_lib.upgrade_return = -7
with self.stub_scan_returning([legacy_dev]):
result = bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
self.assertEqual(result["stage"], "loading")
self.assertEqual(result["reason"], "loader_write_failed")
# loader call 確實有發生
self.assertEqual(len(self.fake_lib.upgrade_calls), 1)
def test_verify_mismatch(self):
"""升級完成但 verify 階段發現 firmware 字串仍 legacy → verify_mismatch."""
legacy_dev = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP")
# 升級完仍是 KDP1mockupgrade 成功但 device firmware 字串沒變)
stuck_legacy = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP")
with self.stub_scan_returning([legacy_dev], [stuck_legacy], [stuck_legacy]):
result = bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
self.assertEqual(result["stage"], "verifying")
self.assertEqual(result["reason"], "verify_mismatch")
self.assertIn("legacy", result["error"].lower())
def test_verify_not_found(self):
"""verify 階段 device disappearrescan 找不到)→ verify_not_found."""
legacy_dev = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP")
# 升級時走 loader → flashing 都 OK、verify 階段 scan 回空device 還沒 re-enumerate
with self.stub_scan_returning([legacy_dev], [legacy_dev], []):
result = bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
self.assertEqual(result["stage"], "verifying")
self.assertEqual(result["reason"], "verify_not_found")
def test_failure_event_carries_reason_and_raw_error(self):
"""error event 必須含 reason + raw_errorTDD §4.2 失敗欄位)."""
with self.stub_scan_returning([]):
bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
err_events = [e for e in self.progress_events if e["stage"] == "error"]
self.assertEqual(len(err_events), 1, "error event 應該 fire 一次")
e = err_events[0]
self.assertIn("reason", e)
self.assertIn("raw_error", e)
self.assertIn("before_version", e)
def test_chip_unsupported(self):
"""A 階段不支援 KL630 / KL730應該直接拒絕preparing/scan_not_found."""
result = bridge.handle_firmware_upgrade({"port": "1", "chip": "KL630"})
self.assertIn("error", result)
self.assertEqual(result["stage"], "preparing")
self.assertIn("KL630", result["error"])
# ── Reviewer s4補 4 個欠缺的 test case ──────────────────────────
def test_loading_stage_disconnect_during_op(self):
"""Reviewer s4 (1)loading stage 寫 loader 成功後 rescan 找不到 device.
對應 kneron_bridge.py 1753-1758 disconnect_during_op in loading stage
Stage 序列preparing loading error(disconnect_during_op)
"""
legacy_dev = FakeDeviceDescriptor(
usb_port_id=42, product_id=0x100, firmware="KDP",
)
# loader 寫成功upgrade_return=0 default、但 reboot 後 rescan 回空
# device 沒 re-enumerate 回來)
with self.stub_scan_returning([legacy_dev], []): # 第二次 scan 空
result = bridge.handle_firmware_upgrade({"port": "42", "chip": "KL520"})
self.assertEqual(result["stage"], "loading")
self.assertEqual(result["reason"], "disconnect_during_op")
self.assertIn("disappear", result["error"].lower())
# 第一個 upgrade_call 是 loader成功、應有 1 個 call
self.assertEqual(len(self.fake_lib.upgrade_calls), 1)
def test_loading_stage_reconnect_failed(self):
"""Reviewer s4 (2)loading stage 寫 loader 成功、rescan 找到 device、但 reconnect 失敗.
對應 kneron_bridge.py 1759-1765 connect_failed in loading stagereconnect 失敗
Stage 序列preparing loading error(connect_failed)
"""
legacy_dev = FakeDeviceDescriptor(
usb_port_id=42, product_id=0x100, firmware="KDP",
)
post_loader_dev = FakeDeviceDescriptor(
usb_port_id=42, product_id=0x100, firmware="USB Boot Loader",
)
# 第一次 connectpreparingOK第二次 connectreconnect after loader失敗
call_count = [0]
original_connect = self.fake_lib.kp_connect_devices
def maybe_fail_connect(n, ports_ptr, status_ptr):
call_count[0] += 1
if call_count[0] == 2:
# 第二次 connect 失敗
status_ptr._obj.value = -5
return 0 # NULL handle
return original_connect(n, ports_ptr, status_ptr)
self.fake_lib.kp_connect_devices = maybe_fail_connect
with self.stub_scan_returning([legacy_dev], [post_loader_dev]):
result = bridge.handle_firmware_upgrade({"port": "42", "chip": "KL520"})
self.assertEqual(result["stage"], "loading")
self.assertEqual(result["reason"], "connect_failed")
self.assertIn("reconnect", result["error"].lower())
def test_failure_event_full_extra_fields(self):
"""Reviewer s4 (3)error event 必須含 TDD §4.2 完整失敗欄位.
TDD §4.2 列出 error event extra dict 應含
error / reason / raw_error / before_version
test caller _fw_handle_failure 確實組裝這些欄位
"""
legacy_dev = FakeDeviceDescriptor(
usb_port_id=99, product_id=0x100, firmware="KDP1.5",
)
# 讓 loader write 失敗、確保走進 _fw_handle_failure
self.fake_lib.upgrade_return = -7
with self.stub_scan_returning([legacy_dev]):
result = bridge.handle_firmware_upgrade({"port": "99", "chip": "KL520"})
err_events = [e for e in self.progress_events if e["stage"] == "error"]
self.assertEqual(len(err_events), 1)
e = err_events[0]
# TDD §4.2 必填欄位
for field in ("error", "reason", "raw_error", "before_version"):
self.assertIn(field, e, f"error event missing field: {field}")
# before_version 應該抓到 scan 階段的 firmware string
self.assertEqual(e["before_version"], "KDP1.5")
# raw_error 應該帶 SDK error context包含 ret code / 函式名)
self.assertIn("loader", e["raw_error"].lower())
# ── Reviewer s4 (4)ctypes binding 簽名測試 ──────────────────────
class TestCtypesBindingSignatures(unittest.TestCase):
"""驗證 _fw_load_libkplus 設對 argtypes / restype.
mock test 階段我們繞過 _fw_load_libkplus直接餵 FakeLib
但實機跑時 binding 簽名錯會在 first call ctypes.ArgumentError 或更糟
silently corrupt memory test 用一個 mock CDLL object _fw_load_libkplus
驗它對每個 C 符號設了正確的 argtypes / restype
"""
def test_libkplus_binding_signatures(self):
"""_fw_load_libkplus 對所有 C 符號設了正確 argtypes / restype."""
import ctypes
# Mock CDLL紀錄 argtypes / restype 設定、不執行真實 lib
class MockSymbol:
def __init__(self, name):
self.name = name
self.argtypes = None
self.restype = None
class MockCDLL:
def __init__(self, *args, **kwargs):
self.kp_connect_devices = MockSymbol("kp_connect_devices")
self.kp_set_timeout = MockSymbol("kp_set_timeout")
self.kp_load_firmware_from_file = MockSymbol("kp_load_firmware_from_file")
self.kp_update_kdp_firmware_from_files = MockSymbol(
"kp_update_kdp_firmware_from_files"
)
self.kp_disconnect_devices = MockSymbol("kp_disconnect_devices")
self.kp_error_string = MockSymbol("kp_error_string")
def __getattr__(self, name):
# 任何沒設的符號回 MockSymbolhasattr check 用)
sym = MockSymbol(name)
setattr(self, name, sym)
return sym
# Mock importlib + os.path.isfile + ctypes.CDLL
mock_spec = mock.MagicMock()
mock_spec.submodule_search_locations = ["/fake/kp_dir"]
with mock.patch("importlib.util.find_spec", return_value=mock_spec), \
mock.patch("os.path.isfile", return_value=True), \
mock.patch.object(ctypes, "CDLL", side_effect=lambda p: MockCDLL()):
lib = bridge._fw_load_libkplus()
# 驗 kp_connect_devices(int, c_int*, c_int*) -> c_void_p
self.assertEqual(
lib.kp_connect_devices.argtypes,
[ctypes.c_int, ctypes.POINTER(ctypes.c_int), ctypes.POINTER(ctypes.c_int)],
)
self.assertEqual(lib.kp_connect_devices.restype, ctypes.c_void_p)
# 驗 kp_set_timeout(c_void_p, c_int) -> None
self.assertEqual(lib.kp_set_timeout.argtypes, [ctypes.c_void_p, ctypes.c_int])
self.assertIsNone(lib.kp_set_timeout.restype)
# 驗 kp_load_firmware_from_file(c_void_p, c_char_p, c_char_p) -> c_int
self.assertEqual(
lib.kp_load_firmware_from_file.argtypes,
[ctypes.c_void_p, ctypes.c_char_p, ctypes.c_char_p],
)
self.assertEqual(lib.kp_load_firmware_from_file.restype, ctypes.c_int)
# 驗 kp_update_kdp_firmware_from_files(c_void_p, c_char_p, c_char_p, c_bool) -> c_int
self.assertEqual(
lib.kp_update_kdp_firmware_from_files.argtypes,
[ctypes.c_void_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_bool],
)
self.assertEqual(lib.kp_update_kdp_firmware_from_files.restype, ctypes.c_int)
# 驗 kp_disconnect_devices(c_void_p) -> c_int
self.assertEqual(lib.kp_disconnect_devices.argtypes, [ctypes.c_void_p])
self.assertEqual(lib.kp_disconnect_devices.restype, ctypes.c_int)
# 驗 kp_error_string(c_int) -> c_char_p若存在
self.assertEqual(lib.kp_error_string.argtypes, [ctypes.c_int])
self.assertEqual(lib.kp_error_string.restype, ctypes.c_char_p)
# ── Timeout 測試 ──────────────────────────────────────────────────────
class TestFirmwareUpgradeTimeout(FirmwareUpgradeTestBase):
def test_timeout_kl520(self):
"""KL520 升級 > 60s 撞 timeout → reason=timeout.
Mock time.monotonic 讓每次 call 都讀到一個跳很快的 clock
確保第二個 stage check 之前就撞 timeout60s
"""
legacy_dev = FakeDeviceDescriptor(usb_port_id=1, product_id=0x100, firmware="KDP")
# 第一次 call 回 0start_ts、第二次起回 99s撞 60s timeout
clock_values = iter([0.0, 0.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0])
def fake_monotonic():
try:
return next(clock_values)
except StopIteration:
return 99.0
monotonic_patch = mock.patch.object(
bridge.time, "monotonic", side_effect=fake_monotonic
)
with monotonic_patch, self.stub_scan_returning([legacy_dev]):
result = bridge.handle_firmware_upgrade({"port": "1", "chip": "KL520"})
self.assertEqual(result["reason"], "timeout",
msg=f"expected timeout, got: {result}")
self.assertIn(result["stage"], ("preparing", "loading", "flashing", "verifying"))
def test_timeout_kl720_uses_200s_bound(self):
"""KL720 用 200s timeoutKL520 60s 不適用)."""
# 直接驗 constant 是 200防止後續誤改
self.assertEqual(bridge.KL720_UPGRADE_TIMEOUT_S, 200)
self.assertEqual(bridge.KL520_UPGRADE_TIMEOUT_S, 60)
# ── Graceful shutdown (SIGTERM) 拒絕測試 ─────────────────────────────
class TestFirmwareUpgradeGracefulShutdown(unittest.TestCase):
"""Test AC-FW-1.9:升級進行中收到 SIGTERM 不應立即退出."""
def setUp(self):
# 確保旗標歸零
bridge._firmware_upgrade_in_progress = False
bridge._firmware_upgrade_start_ts = 0.0
def tearDown(self):
try:
bridge._fw_unregister_sigterm_handler()
except Exception:
pass
bridge._firmware_upgrade_in_progress = False
@unittest.skipIf(sys.platform == "win32", "SIGTERM not on Windows")
def test_sigterm_rejected_during_upgrade(self):
"""升級進行中SIGTERM handler 拒絕並 push shutdown_rejected event."""
import signal
bridge._firmware_upgrade_in_progress = True
start_ts = time.monotonic()
bridge._firmware_upgrade_start_ts = start_ts
# 攔截 stderr
capture = io.StringIO()
bridge._fw_register_sigterm_handler(start_ts)
with mock.patch.object(sys, "stderr", capture):
os.kill(os.getpid(), signal.SIGTERM)
# 給 signal handler 一點時間執行
time.sleep(0.05)
# 驗證 process 沒退出test 還在跑、能讀到 stderr
output = capture.getvalue()
# 應該找到 shutdown_rejected event
self.assertIn("shutdown_rejected", output,
msg=f"expected shutdown_rejected in stderr, got: {output}")
# 解析 JSON 驗 schema
for line in output.strip().split("\n"):
if not line.strip():
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("event") == "shutdown_rejected":
self.assertEqual(ev["reason"], "firmware_upgrade_in_progress")
self.assertEqual(ev["task"], "firmware_upgrade")
self.assertIn("elapsed_ms", ev)
return
self.fail("shutdown_rejected event 沒找到")
@unittest.skipIf(sys.platform == "win32", "SIGTERM not on Windows")
def test_sigterm_handler_unregistered_after_upgrade(self):
"""升級結束後 SIGTERM handler 應該還原(避免影響後續 server graceful shutdown."""
import signal
# 預設 handlerpython default 是 SIG_DFL
prev = signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGTERM, prev)
# register
bridge._fw_register_sigterm_handler(time.monotonic())
# 確認 handler 已換
current = signal.signal(signal.SIGTERM, signal.SIG_DFL)
self.assertNotEqual(current, signal.SIG_DFL,
msg="handler should be installed during upgrade")
# 重新 install 後 unregister
bridge._fw_register_sigterm_handler(time.monotonic())
bridge._fw_unregister_sigterm_handler()
# unregister 後 handler 應還原(不再是我們的 wrapper
# 注意unregister 後可能是原 handler 或 SIG_DFL、我們的 wrapper 不該再生效
bridge._firmware_upgrade_in_progress = False
# 這個測試重點是 unregister 不報錯、且 _fw_original_sigterm_handler 已歸 None
self.assertIsNone(bridge._fw_original_sigterm_handler)
# ── _fw_classify_legacy 邏輯測試 ──────────────────────────────────────
class TestClassifyLegacy(unittest.TestCase):
def test_kl720_legacy_by_product_id(self):
self.assertTrue(bridge._fw_classify_legacy("any", 0x0200))
def test_kl520_legacy_by_firmware_string(self):
self.assertTrue(bridge._fw_classify_legacy("KDP", 0x0100))
def test_kl520_kdp2_not_legacy(self):
self.assertFalse(bridge._fw_classify_legacy("KDP2", 0x0100))
self.assertFalse(bridge._fw_classify_legacy("KDP2.5", 0x0100))
def test_kl720_kdp2_not_legacy(self):
self.assertFalse(bridge._fw_classify_legacy("KDP2", 0x0720))
# ── Reviewer M3 + s3firmware 字串覆蓋擴展 ────────────────────
# 原本 substring match `"KDP" in fw and "KDP2" not in fw` 對 USB Boot /
# Loader / 空字串 / KDP3+ 等情境覆蓋不夠或會誤判、改用顯式 enumeration
# + prefix 比對表後、以下 case 必須通過:
def test_kl520_legacy_empty_firmware_string(self):
"""部分 USB Boot state device 不回 firmware string、應視為 legacy."""
self.assertTrue(bridge._fw_classify_legacy("", 0x0100))
self.assertTrue(bridge._fw_classify_legacy(None, 0x0100))
def test_kl520_legacy_usb_boot_strings(self):
"""USB Boot / Loader / Bootloader 等 legacy state 字串都應視為 legacy."""
for fw in ("USB Boot", "USB Boot Loader", "Loader", "Bootloader",
"USB BOOT", "loader", "BOOTLOADER"):
with self.subTest(firmware=fw):
self.assertTrue(
bridge._fw_classify_legacy(fw, 0x0100),
f"firmware={fw!r} should be classified as legacy",
)
def test_kl520_legacy_kdp1_variants(self):
"""KDP1 / KDP1.x / KDP1 space 等版本字串都應視為 legacy."""
for fw in ("KDP1", "KDP1.0", "KDP1.5", "KDP1 v1.0", "kdp1.5"):
with self.subTest(firmware=fw):
self.assertTrue(
bridge._fw_classify_legacy(fw, 0x0100),
f"firmware={fw!r} should be classified as legacy",
)
def test_kdp3_kdp4_not_legacy(self):
"""Reviewer s3KDP3 / KDP4+(未來 firmware不該被 substring match 誤判 legacy."""
# 原本 substring match `"KDP" in fw and "KDP2" not in fw` 對 KDP3.0 會誤判 legacy
# 改用顯式 prefix 比對表後、KDP3 / KDP4 應視為 modern firmware
for fw in ("KDP3", "KDP3.0", "KDP3.5", "KDP4", "KDP4.2", "KDP9"):
with self.subTest(firmware=fw):
self.assertFalse(
bridge._fw_classify_legacy(fw, 0x0100),
f"firmware={fw!r} (modern KDP3+) should NOT be classified as legacy",
)
def test_unknown_firmware_default_not_legacy(self):
"""未知 firmware 字串保守 default = 不走 loader (避免誤觸 brick device)."""
# 例:未來 firmware 用全新命名 → 不確定走 loader 是否會 brick、保守不走
# 若 mis-classify、verify 階段會偵測 verify_mismatch、不致 brick
for fw in ("NEF", "K3", "FOO", "RANDOM"):
with self.subTest(firmware=fw):
self.assertFalse(
bridge._fw_classify_legacy(fw, 0x0100),
f"firmware={fw!r} (unknown) should default to not-legacy",
)
# ── _fw_eta_ms 邏輯測試 ──────────────────────────────────────────────
class TestEtaEstimation(unittest.TestCase):
def test_eta_decreases_through_stages(self):
kl520_etas = [
bridge._fw_eta_ms("KL520", s)
for s in ("preparing", "loading", "flashing", "verifying")
]
# ETA 應該遞減
self.assertEqual(kl520_etas, sorted(kl520_etas, reverse=True))
def test_kl720_eta_larger_than_kl520(self):
self.assertGreater(
bridge._fw_eta_ms("KL720", "preparing"),
bridge._fw_eta_ms("KL520", "preparing"),
)
# ── _resolve_firmware_paths_full 測試(用真實檔案)─────────────────────
class TestResolveFirmwarePathsFull(unittest.TestCase):
def test_kl520_has_loader(self):
"""KL520 升級後應該找到 scpu/ncpu/loader 三個檔案."""
paths = bridge._resolve_firmware_paths_full("KL520")
self.assertIsNotNone(paths["scpu"], "fw_scpu.bin missing")
self.assertIsNotNone(paths["ncpu"], "fw_ncpu.bin missing")
self.assertIsNotNone(paths["loader"],
"fw_loader.bin missing — required for KDP1→KDP2")
self.assertTrue(os.path.exists(paths["scpu"]))
self.assertTrue(os.path.exists(paths["loader"]))
def test_kl720_has_scpu_ncpu(self):
paths = bridge._resolve_firmware_paths_full("KL720")
self.assertIsNotNone(paths["scpu"])
self.assertIsNotNone(paths["ncpu"])
# KL720 沒 loader.bin 預期、不檢查
self.assertTrue(os.path.exists(paths["scpu"]))
def test_unknown_chip_returns_none(self):
paths = bridge._resolve_firmware_paths_full("KL999")
self.assertIsNone(paths["scpu"])
self.assertIsNone(paths["ncpu"])
# ── _fw_emit_progress JSON schema 測試 ───────────────────────────────
class TestEmitProgress(unittest.TestCase):
def test_emit_writes_json_line_to_stderr(self):
capture = io.StringIO()
with mock.patch.object(sys, "stderr", capture):
bridge._fw_emit_progress(
"flashing",
message="testing",
elapsed_ms=1234,
eta_ms=5678,
)
line = capture.getvalue().strip()
ev = json.loads(line)
self.assertEqual(ev["event"], "firmware_progress")
self.assertEqual(ev["stage"], "flashing")
self.assertEqual(ev["percent"], 50)
self.assertEqual(ev["message"], "testing")
self.assertEqual(ev["elapsed_ms"], 1234)
self.assertEqual(ev["eta_ms"], 5678)
def test_emit_with_extra_includes_failure_fields(self):
capture = io.StringIO()
with mock.patch.object(sys, "stderr", capture):
bridge._fw_emit_progress(
"error",
message="bad",
elapsed_ms=100,
extra={"reason": "scan_not_found", "raw_error": "details"},
)
ev = json.loads(capture.getvalue().strip())
self.assertEqual(ev["stage"], "error")
self.assertEqual(ev["percent"], -1)
self.assertEqual(ev["reason"], "scan_not_found")
self.assertEqual(ev["raw_error"], "details")
if __name__ == "__main__":
unittest.main(verbosity=2)