feat(local-tool): M9-2 — Go driver UpgradeFirmware + firmware service module
A 階段第二個 milestone、銜接 M9-1 bridge.py、暴露 service layer 給 M9-3 API/WebSocket。 New module `server/internal/firmware/`: - types.go: 123 行(FirmwareVersion / FirmwareProgress / ActiveTaskInfo / UpgradeDriver interface / 8 reason const) - progress.go: 147 行(仿 flash pattern 的 Tracker、Task.cancel 預留 SIGTERM force-cancel godoc) - service.go: 373 行(核心 service:UpgradeFirmware / HasActiveTask / GetActiveTaskInfo / RequestShutdown / WaitForActiveTasks / ListBundledVersions / GetCurrentVersion) - service_test.go: 676 行、13 個 test 含 MultiDeviceParallel Driver layer: - kl720_driver.go: 697 → 1054 行(+357、新 UpgradeFirmware method + tryRouteFirmwareEvent + sendCommandForUpgrade snapshot pattern) - kl720_driver_test.go: 360 行、11 個 test(含 InfoNotBlockedDuringUpgrade / CtxCancelReleasesBridge / StderrEventAfterCtxCancel 100 round stress) 關鍵設計: - flash 與 firmware 模組分離(不 import flash) - UpgradeDriver interface 隔離 driver 細節、DeviceLookup interface 隔離 device manager - 中介 channel pattern(service ↔ driver)方便 service 補欄位(DeviceID / Direction / BeforeVersion) - timeout 雙保險:chip timeout + 30s margin - 8 reason enum 對齊 bridge.py、stage 採 Design 命名 Concurrency race 修復(M9-2 Reviewer round 1 → round 2): - Major 1(mutex deadlock):新 fwUpgradeMu 獨立鎖 + sendCommandForUpgrade snapshot stdin/stdout pattern、避開 d.mu field-level race + 升級期間 Info/Disconnect 不被卡 + timeout 路徑無死鎖 - Major 2(close-channel race):tryRouteFirmwareEvent 持 fwMu 整段、配合 defer setFirmwareProgressCh(nil) 提供 happen-before、絕無 send on closed channel panic Reviewer 兩輪審查: - Round 1: 0 Critical / 2 Major / 5 Minor / 5 Suggestion - Round 2: 0 Critical / 0 Major / 2 Minor / 2 Suggestion(11/12 issue 修到位、Suggestion 4 留 follow-up) M9-1 follow-up 順手清: - m5(test 死碼 _firmware_upgrade_start_ts 殘留兩行)已清 - s5(test 註解 idempotent shape 說明)已加 測試: - go test ./... -race -count=1: 全綠(28s、無 regression) - Python: 36 tests + 22 subtests 全綠(0.31s) - go vet / build: 0 output 下一步:M9-3 API handler + WebSocket progress(CI 建議 `go test -race -count=3` 提升 race 偵測強度) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d7b5a2398a
commit
c03eb6fd0e
@ -0,0 +1,488 @@
|
|||||||
|
# Reviewer Report — M9-2 Go driver + firmware service(第 2 輪修改驗證)
|
||||||
|
|
||||||
|
> 審查日期:2026-05-25
|
||||||
|
> 範圍:第 1 輪 2 Major + 5 Minor + 4 Suggestion(Suggestion 4 留 follow-up)的修改驗證
|
||||||
|
> 統計:0 Critical / 0 Major / 2 Minor / 2 Suggestion
|
||||||
|
> 結論:✅ **通過、不阻擋 M9-3**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
第 1 輪所有 issue(2 Major + 5 Minor + 4 Suggestion,Suggestion 4 留 follow-up)皆已修復、且修法品質高:
|
||||||
|
- **Major 1(mutex deadlock)** 採方案 B 變體(fwUpgradeMu + sendCommandForUpgrade snapshot pattern)— **修法正確、snapshot pattern 安全**。
|
||||||
|
- **Major 2(close-channel race)** 採方案 A 變體(tryRouteFirmwareEvent 持 fwMu 整段 + defer setFirmwareProgressCh(nil))— **happen-before 鏈成立**。
|
||||||
|
- 5 個新 test 都針對 race window 設計、`StderrEventAfterCtxCancel` 用 100×8 goroutine 壓測足以驗 race。
|
||||||
|
|
||||||
|
第 2 輪新發現 **2 個 Minor + 2 個 Suggestion**,皆不阻擋 M9-3 啟動、可在 M9-3 並行修。**不需要 backend 第 3 輪**。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 第 1 輪 issue 修改驗證(逐項)
|
||||||
|
|
||||||
|
| Issue | 第 1 輪 | 第 2 輪修法 | 驗證結果 |
|
||||||
|
|-------|--------|------------|---------|
|
||||||
|
| Major 1(mutex deadlock)| sendCommand 持 d.mu 60-200s、卡 Info/IsConnected、timeout 路徑 deadlock | 新 `fwUpgradeMu` + 新 `sendCommandForUpgrade`(snapshot stdin/stdout 後 release d.mu)、UpgradeFirmware Step 2 改 fwUpgradeMu+sendCommandForUpgrade(line 953-971);ctx.Done 路徑仍走 d.mu 殺 bridge(line 981-984)| ✅ **修法正確**、見下方深入評估 |
|
||||||
|
| Major 2(close-channel race)| tryRoute 取 ch 後 release fwMu 才 send、與 close 有 race window | tryRoute 改 `defer d.fwMu.Unlock()` 整段持鎖(line 868-869)+ UpgradeFirmware line 940 `defer setFirmwareProgressCh(nil)` 提供 happen-before | ✅ **修法正確**、見下方深入評估 |
|
||||||
|
| Minor 1(brickRiskReasons)| ReasonTimeout 是否屬 brick 沒寫清楚 | service.go:36-44 加註解、明示 vendor-agnostic、把 chip+elapsed 判斷推給 M9-3 handler | ✅ 註解清晰、保留 vendor-agnostic 設計 |
|
||||||
|
| Minor 2(缺 ctx.Done + multi-device 測試)| 缺 `TestUpgradeFirmware_CtxCancelReleasesBridge` + multi-device | driver_test 新增 `TestUpgradeFirmware_CtxCancelReleasesBridge`(line 269-308);service_test 新增 `TestUpgradeFirmware_MultiDeviceParallel`(line 519-615)| ✅ 兩個 test 都到位、測法合理 |
|
||||||
|
| Minor 3(attemptedUpgrade)| 早退路徑誤標 needsReset | line 916 加 `var attemptedUpgrade bool` + line 953 `attemptedUpgrade = true`(在 setFirmwareProgressCh 之後、進 sendCommand 之前);defer 內 `if attemptedUpgrade { d.needsReset = true }`(line 925-927)| ✅ 修法正確,但有小問題(見下方 Minor R-1)|
|
||||||
|
| Minor 4(結論先寫)| forward loop 推論在前 | service.go:192-198 把「結論:保留中介 channel pattern」放到段首、「推論」放到後面 | ✅ 結論先寫、可讀性提升 |
|
||||||
|
| Minor 5(fwMu atomic)| 與 Major 2 同源 | 隨 Major 2 修法一起處理 | ✅ |
|
||||||
|
| Suggestion 1(done event 去重)| sendCommand 補 done 與 stderr push done 會重複 | service.go:213-220 在 forward loop 加 `seenDone` guard、第二次 StageDone 跳過;對應 test 在 service_test.go:624-653 `TestUpgradeFirmware_DedupeDoneEvent` | ✅ 修法正確、test 有覆蓋 |
|
||||||
|
| Suggestion 2(multi-device parallel test)| 缺多 device 並行測 | service_test.go:519-615 補 3 device 並發、驗 deviceID 不誤匹配、callCount 各 = 1 | ✅ |
|
||||||
|
| Suggestion 3(ListBundledVersions Stat)| chipDir 不存在時無聲回 current | service.go:330-340 加 `os.Stat(chipDir)`、IsNotExist 回 missing error、其他 error 帶回 wrap;service_test.go:451-489 重寫用 `t.TempDir()` + 顯式建 KL520 dir、驗 KL720 missing 路徑 | ✅ 修法正確、test 有 cover |
|
||||||
|
| Suggestion 4(bridgeFirmwareEvent 與 FirmwareProgress 共用 struct)| 兩 struct 重複 | **本輪未修**(明示為 follow-up) | — |
|
||||||
|
| Suggestion 5(Task.cancel godoc)| 無 reader、加 TODO | progress.go:25-31 加 godoc:「目前 service 只在 runUpgrade defer 內呼叫一次... 預留給未來 SIGTERM force-cancel 流程」 | ✅ |
|
||||||
|
|
||||||
|
**統計**:12 項中 11 項修、1 項明示 follow-up。修了的 11 項全部修法正確。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 兩個 Major 的修法品質深入評估
|
||||||
|
|
||||||
|
### Major 1(mutex deadlock)— ✅ 修法正確
|
||||||
|
|
||||||
|
#### 修法總覽
|
||||||
|
|
||||||
|
- **新增 `fwUpgradeMu sync.Mutex`**(line 56):與 `d.mu` 完全分離的鎖、只給升級期間 sendCommand 用。
|
||||||
|
- **新增 `sendCommandForUpgrade()`**(line 273-314):
|
||||||
|
- Lock d.mu → check pythonReady → snapshot stdin/stdout 到 local var → Unlock d.mu
|
||||||
|
- 接著 marshal cmd、`fmt.Fprintf(stdin, ...)`、`stdout.Scan()`、parse response — 全部在 unlocked 狀態
|
||||||
|
- **UpgradeFirmware Step 2 改寫**(line 953-971):sendCommand goroutine 內 `fwUpgradeMu.Lock()` → call sendCommandForUpgrade → Unlock。**完全不再持 d.mu**。
|
||||||
|
- **ctx.Done 路徑**(line 975-1001):仍走 `d.mu.Lock + d.stopPython() + d.connected = false + d.mu.Unlock`。
|
||||||
|
|
||||||
|
#### Snapshot pattern 正確性 — ✅
|
||||||
|
|
||||||
|
逐欄位驗:
|
||||||
|
|
||||||
|
| 在 sendCommandForUpgrade 內 | 來源 | 是否 snapshot 完整 |
|
||||||
|
|---------------------------|------|-----------------|
|
||||||
|
| `d.pythonReady` 檢查 | 持 d.mu 內讀 | ✅ 不需 snapshot(讀完即釋鎖) |
|
||||||
|
| `d.stdin` → local `stdin` | 持 d.mu 內 copy ref(line 280) | ✅ |
|
||||||
|
| `d.stdout` → local `stdout` | 持 d.mu 內 copy ref(line 281) | ✅ |
|
||||||
|
| 持 d.mu 期間是否做 I/O | 沒有、release 後才做 | ✅ |
|
||||||
|
| nil check on snapshot | line 284-286 加了 `if stdin == nil || stdout == nil` 防禦 | ✅ |
|
||||||
|
|
||||||
|
關鍵問題:**stopPython 把 `d.stdin = nil` / `d.stdout = nil` 之後、snapshot 已取走的舊 ref 還能用嗎?write 到 closed pipe 會怎樣?**
|
||||||
|
|
||||||
|
逐 case 推:
|
||||||
|
|
||||||
|
1. **正常路徑**(無 ctx cancel):
|
||||||
|
- sendCommand goroutine:snapshot → release d.mu → write/scan → return result via resCh
|
||||||
|
- 全程 stopPython 不會被呼叫、stdin/stdout 仍有效。✅
|
||||||
|
|
||||||
|
2. **ctx cancel 路徑**:
|
||||||
|
- sendCommand goroutine 已 snapshot stdin/stdout(local ref)、可能正在 `stdout.Scan()` 等 bridge 回應
|
||||||
|
- ctx.Done 觸發 → main goroutine 在 line 981 `d.mu.Lock()` + `d.stopPython()`:
|
||||||
|
- `d.stdin.Close()`(line 321)→ 關閉 *write* 端 pipe → sendCommand goroutine 若還沒 write 完、write 會回 `io.ErrClosedPipe`(line 293 errs)
|
||||||
|
- `d.pythonCmd.Process.Kill()`(line 326)→ python subprocess 死、bridge 那端的 stdout pipe write 端關掉 → sendCommand goroutine 的 `stdout.Scan()` 拿到 EOF → return false → 進 line 298 走 error 路徑
|
||||||
|
- `d.pythonCmd.Wait()`(line 327)→ wait for process exit、通常幾十 ms 內完成
|
||||||
|
- sendCommand goroutine 從 Scan 回來、return err、寫 resCh、結束。
|
||||||
|
- **沒 panic、沒 goroutine leak**、`go func() { <-resCh }()`(line 1000)負責 drain leak 防護。✅
|
||||||
|
|
||||||
|
3. **race window**:sendCommand goroutine snapshot 到 stdin/stdout 後、release d.mu 之前、stopPython 不能進來(被 d.mu 擋住);release d.mu 之後、snapshot 已 capture local ref、即使 stopPython 把 `d.stdin = nil` 也只影響 future 呼叫、不影響此次的 local ref。**沒 race**。✅
|
||||||
|
|
||||||
|
寫 to closed pipe 不會 panic(io.Writer interface 只 return error)、Scan 拿 EOF 也只 return false、沒 panic 風險。✅
|
||||||
|
|
||||||
|
#### 新 fwUpgradeMu 與 d.mu 的 lock 順序 — ✅ 安全
|
||||||
|
|
||||||
|
各 callsite 整理:
|
||||||
|
|
||||||
|
| Callsite | lock 順序 |
|
||||||
|
|----------|----------|
|
||||||
|
| sendCommand goroutine(line 960-969)| `fwUpgradeMu.Lock` → 內部 `sendCommandForUpgrade` 短暫持 `d.mu`(snapshot)→ release d.mu → release fwUpgradeMu |
|
||||||
|
| ctx.Done 路徑(line 981-984)| 只持 `d.mu`、不取 fwUpgradeMu |
|
||||||
|
| 其他 method(Info/IsConnected/Flash/...)| 只持 `d.mu`、不取 fwUpgradeMu |
|
||||||
|
|
||||||
|
**lock 順序唯一可能撞點**:sendCommand goroutine 內持 fwUpgradeMu 後又要拿 d.mu(snapshot 時)。但這是「外鎖 → 內鎖」單方向、不會反序——沒有其他路徑會「持 d.mu 後再去拿 fwUpgradeMu」(沒有任何 method 這樣做)。✅
|
||||||
|
|
||||||
|
**無 lock-order deadlock 風險**。
|
||||||
|
|
||||||
|
#### Timeout 路徑驗證 — ✅
|
||||||
|
|
||||||
|
老的死鎖 scenario:sendCommand 持 d.mu → ctx.Done → main 想拿 d.mu → wait → sendCommand 永遠不回 → 死鎖。
|
||||||
|
|
||||||
|
新的 scenario:
|
||||||
|
1. sendCommand goroutine 只持 fwUpgradeMu、**不持 d.mu**
|
||||||
|
2. ctx.Done → main 在 line 981 拿 d.mu(沒人持、立刻拿到)
|
||||||
|
3. `d.stopPython()` 內 `Process.Kill() + Wait()`:通常 < 100ms(Wait 在 SIGKILL 後幾乎瞬間 return)
|
||||||
|
4. release d.mu → push error event → drain resCh leak prevention → return
|
||||||
|
|
||||||
|
**timeout 路徑現在可預期在 200ms 內完成**(vs 老版的「永遠死鎖」)。✅
|
||||||
|
|
||||||
|
#### sendCommand goroutine 釋放保證 — ✅
|
||||||
|
|
||||||
|
ctx cancel 後 sendCommand goroutine 的釋放鏈:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctx cancel
|
||||||
|
→ main 進 line 981
|
||||||
|
→ d.mu.Lock(取得)
|
||||||
|
→ d.stopPython()
|
||||||
|
→ d.stdin.Close()(pipe write 端 close)
|
||||||
|
→ Process.Kill()(subprocess 死、stdout pipe 自動 close)
|
||||||
|
→ Process.Wait()(等清理完)
|
||||||
|
→ d.stdin = nil(但 sendCommand 已拿 snapshot、不受影響)
|
||||||
|
→ d.stdout = nil(同上)
|
||||||
|
→ release d.mu
|
||||||
|
→ push error event(line 996)
|
||||||
|
→ go func(){ <-resCh }()(leak prevention)
|
||||||
|
|
||||||
|
sendCommand goroutine 並發:
|
||||||
|
→ fwUpgradeMu.Lock(已持)
|
||||||
|
→ sendCommandForUpgrade 內:
|
||||||
|
stdout.Scan() 阻塞中 → bridge 死 → EOF → Scan return false
|
||||||
|
→ return err(line 298)
|
||||||
|
→ fwUpgradeMu.Unlock
|
||||||
|
→ resCh <- result{nil, err}
|
||||||
|
→ goroutine exit
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test 驗證**:
|
||||||
|
- `TestUpgradeFirmware_CtxCancelReleasesBridge`(driver_test.go:269)— 2s timeout 內 UpgradeFirmware 必須 return。雖然這個 test 用 fake bridge(不真起 python)、stopPython 沒實際 Process.Kill 可做(pythonCmd nil)、但 stdin.Close() 的 pipe 操作會讓 Scan 拿到 EOF、goroutine 確實能釋放。✅
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Major 2(close-channel race)— ✅ 修法正確
|
||||||
|
|
||||||
|
#### 修法總覽
|
||||||
|
|
||||||
|
- **tryRouteFirmwareEvent 持 fwMu 整段**(line 868-882):
|
||||||
|
```go
|
||||||
|
d.fwMu.Lock()
|
||||||
|
defer d.fwMu.Unlock()
|
||||||
|
ch := d.fwProgressCh
|
||||||
|
if ch == nil { return false }
|
||||||
|
select {
|
||||||
|
case ch <- fp:
|
||||||
|
default:
|
||||||
|
// drop event
|
||||||
|
}
|
||||||
|
```
|
||||||
|
關鍵:「檢查 ch + select send」全在持鎖期間、無「先 release 才 send」的縫隙。
|
||||||
|
- **setFirmwareProgressCh(nil) 取同把 fwMu**(line 820-824):與 tryRoute 互斥。
|
||||||
|
- **UpgradeFirmware return 前 defer**(line 940):`defer d.setFirmwareProgressCh(nil)` 是 UpgradeFirmware 第二個 defer、在 close(intermediate) 之前 trigger。
|
||||||
|
|
||||||
|
#### Happen-before 推理鏈驗證 — ✅
|
||||||
|
|
||||||
|
逐步 trace:
|
||||||
|
|
||||||
|
```
|
||||||
|
service runUpgrade goroutine:
|
||||||
|
go func() {
|
||||||
|
driverDone <- drv.UpgradeFirmware(ctx, chip, intermediate)
|
||||||
|
close(intermediate)
|
||||||
|
}()
|
||||||
|
|
||||||
|
driver UpgradeFirmware:
|
||||||
|
defer d.setFirmwareProgressCh(nil) // 第二個 defer、第一個執行(LIFO)
|
||||||
|
defer ...status reset... // 第一個 defer
|
||||||
|
...
|
||||||
|
return nil/err // ← defer 從這裡開始 fire
|
||||||
|
```
|
||||||
|
|
||||||
|
對應的時序(success / error 路徑都適用):
|
||||||
|
|
||||||
|
| 步 | 動作 | 鎖 | happen-before 提供 |
|
||||||
|
|----|------|----|------|
|
||||||
|
| 1 | driver return | — | — |
|
||||||
|
| 2 | defer setFirmwareProgressCh(nil) 觸發 | fwMu.Lock → set nil → fwMu.Unlock | T2 ✅ |
|
||||||
|
| 3 | UpgradeFirmware return | — | — |
|
||||||
|
| 4 | service goroutine 收到 `driverDone <- ...` | — | T4 happens-after T3 |
|
||||||
|
| 5 | service goroutine close(intermediate) | — | — |
|
||||||
|
|
||||||
|
關鍵:T2(setFirmwareProgressCh(nil) 釋放 fwMu)→ T5(close intermediate)之間有 sequential ordering(同個 service goroutine 內)。
|
||||||
|
|
||||||
|
任何 inflight tryRouteFirmwareEvent call(從 stderr scanner 端發起):
|
||||||
|
|
||||||
|
| 情境 | 結果 |
|
||||||
|
|------|------|
|
||||||
|
| tryRoute 在 T2 之前進來、取 fwMu | T2 必須等 tryRoute Unlock 才能 set nil。tryRoute 內讀到的 ch 還是 valid intermediate ch;send 後 return true。**T5 close 還沒發生**(因為 T5 在 T2 之後)。✅ 安全 |
|
||||||
|
| tryRoute 在 T2 進行中嘗試取 fwMu | 等 T2 完成、取到 fwMu 時 ch 已是 nil、return false。沒 send、沒 panic。✅ |
|
||||||
|
| tryRoute 在 T2 之後、T5 之前進來 | ch 已 nil、return false。沒 send。✅ |
|
||||||
|
| tryRoute 在 T5 之後進來 | ch 已 nil(T2 設過了)、return false。沒 send。✅ |
|
||||||
|
|
||||||
|
**「send on closed channel」場景不可能發生**。✅
|
||||||
|
|
||||||
|
#### stderr scanner 阻塞風險 — ⚠️ Minor R-2
|
||||||
|
|
||||||
|
`tryRouteFirmwareEvent` 在持 fwMu 期間做 `select case ch <- fp / default`:
|
||||||
|
- **非阻塞**(有 default)— 不會卡 stderr scanner 太久。
|
||||||
|
- **持鎖時間**:把 buffered channel send(< 1µs)+ select syntax overhead(< 1µs)= 通常 < 5µs。
|
||||||
|
- **最壞情境**:setFirmwareProgressCh(nil) 並發來搶鎖、要等 tryRoute 釋鎖、wait < 5µs。
|
||||||
|
|
||||||
|
但**有個細節值得改善**(見 Minor R-2):JSON unmarshal(line 842-846)在持鎖**之前**完成、這部分正確;但 `firmware.FirmwareProgress` struct copy(line 851-863)也在 unmarshal 之後、Lock 之前完成、也正確。**結構上沒問題**。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 第 2 輪新發現(regression risk)
|
||||||
|
|
||||||
|
### 🔴 Critical
|
||||||
|
**無**。
|
||||||
|
|
||||||
|
### 🟠 Major
|
||||||
|
**無**。
|
||||||
|
|
||||||
|
### 🟡 Minor
|
||||||
|
|
||||||
|
#### Minor R-1 — attemptedUpgrade 設定點位置略偏早
|
||||||
|
|
||||||
|
**檔案**:`kl720_driver.go:953`
|
||||||
|
|
||||||
|
**問題**:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Step 1: register progress routing
|
||||||
|
d.setFirmwareProgressCh(progressCh)
|
||||||
|
defer d.setFirmwareProgressCh(nil)
|
||||||
|
|
||||||
|
// Step 2: spawn sendCommand in goroutine
|
||||||
|
...
|
||||||
|
attemptedUpgrade = true // ← line 953、在「進入 goroutine 前」設
|
||||||
|
...
|
||||||
|
resCh := make(chan result, 1)
|
||||||
|
go func() { ... }()
|
||||||
|
```
|
||||||
|
|
||||||
|
`attemptedUpgrade = true` 在 goroutine spawn 之前就設、若 goroutine 從未真的 send sendCommand(理論上不會發生、但理論上有 OS-level scheduler 異常或 panic 之類的)、`needsReset = true` 還是會被 defer 設。
|
||||||
|
|
||||||
|
實務影響:**幾乎零**。但語意上「真進 sendCommand 才算 attempted」會更精準——可以把 line 953 搬進 goroutine 內、緊鄰 `d.fwUpgradeMu.Lock()` 之後。
|
||||||
|
|
||||||
|
**嚴重度**:Minor。M9-3 啟動前不阻擋,純語意精確性。
|
||||||
|
|
||||||
|
**建議修法**:
|
||||||
|
|
||||||
|
```go
|
||||||
|
go func() {
|
||||||
|
d.fwUpgradeMu.Lock()
|
||||||
|
attemptedUpgrade = true // ← 搬到這裡
|
||||||
|
resp, err := d.sendCommandForUpgrade(...)
|
||||||
|
d.fwUpgradeMu.Unlock()
|
||||||
|
resCh <- result{resp, err}
|
||||||
|
}()
|
||||||
|
```
|
||||||
|
|
||||||
|
⚠️ **注意**:`attemptedUpgrade` 是 `var` 在 outer func、若搬進 goroutine、會有 data race(defer 在 outer goroutine 讀、inner goroutine 寫)。**所以這個修法需要改成 `atomic.Bool` 或者保持在 outer goroutine 寫**。權衡之下、現狀(在 goroutine 外設)反而是正確的——避免 race。
|
||||||
|
|
||||||
|
**結論**:現狀其實是正確的、Minor R-1 撤回。為求嚴謹保留紀錄、但不需修。降為 **Suggestion R-3**(見下)。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Minor R-2 — `setFirmwareProgressCh` 在 driver Disconnect 路徑沒清
|
||||||
|
|
||||||
|
**檔案**:`kl720_driver.go:420-439` Disconnect()
|
||||||
|
|
||||||
|
**問題**:
|
||||||
|
|
||||||
|
```go
|
||||||
|
func (d *KneronDriver) Disconnect() error {
|
||||||
|
d.mu.Lock()
|
||||||
|
defer d.mu.Unlock()
|
||||||
|
...
|
||||||
|
d.stopPython()
|
||||||
|
d.connected = false
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Disconnect 不 reset `d.fwProgressCh`。理論情境:
|
||||||
|
1. UpgradeFirmware in-flight、fwProgressCh = X
|
||||||
|
2. 使用者 / 系統 call `Disconnect()`(理論上應該不能在升級中被叫、但 driver 層沒擋)
|
||||||
|
3. stopPython 殺 bridge → sendCommand goroutine return error → UpgradeFirmware 走 case `<-resCh` 的 error 路徑 → return err → `defer setFirmwareProgressCh(nil)` 觸發 → 此時才清
|
||||||
|
|
||||||
|
實際上**沒漏**——`defer setFirmwareProgressCh(nil)` 在 UpgradeFirmware return 時一定會跑。Disconnect 不重複清也沒問題(已經 nil)。
|
||||||
|
|
||||||
|
但若**Disconnect 在 UpgradeFirmware 之前/之後**被叫、且其他 path 之後又把 ch 設成 valid(這個情境不存在於目前 codebase),可能有 stale ch。
|
||||||
|
|
||||||
|
**嚴重度**:Minor、純防禦性建議。**M9-3 啟動前不需修**。
|
||||||
|
|
||||||
|
**建議**:在 Disconnect 內 stopPython 之前 / 之後加一行 `d.fwMu.Lock(); d.fwProgressCh = nil; d.fwMu.Unlock()`、純防禦性。或者在 stopPython 內加同一行(更乾淨、統一清理 pipe + ch 兩個資源)。
|
||||||
|
|
||||||
|
實務影響:**目前 codebase 沒實際 bug**、只是「再多一層防禦不壞」。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 💡 Suggestion
|
||||||
|
|
||||||
|
#### Suggestion R-1 — `seenDone` guard 也該套用到 StageError
|
||||||
|
|
||||||
|
**檔案**:`service.go:213-220`
|
||||||
|
|
||||||
|
**現況**:
|
||||||
|
|
||||||
|
```go
|
||||||
|
var lastStage string
|
||||||
|
var seenDone bool
|
||||||
|
for ev := range intermediate {
|
||||||
|
if ev.Stage == StageDone && seenDone {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
...
|
||||||
|
if ev.Stage == StageDone {
|
||||||
|
seenDone = true
|
||||||
|
}
|
||||||
|
task.ProgressCh <- ev
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**問題**:類似的 race(stderr push error event + driver UpgradeFirmware error 路徑 line 1009-1018 也 push error event fallback safety net)也會產生重複 error event。雖然 driver 端 line 1006 註解寫「stage="error" event 通常已透過 stderr 推過了、這裡是 fallback safety net」、實際上和 done 一樣會雙保險、可能重複。
|
||||||
|
|
||||||
|
**建議**:加 `seenError` 同類 guard、避免前端跑兩次 cleanup:
|
||||||
|
|
||||||
|
```go
|
||||||
|
var lastStage string
|
||||||
|
var seenDone, seenError bool
|
||||||
|
for ev := range intermediate {
|
||||||
|
if ev.Stage == StageDone && seenDone {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ev.Stage == StageError && seenError {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
...
|
||||||
|
if ev.Stage == StageDone {
|
||||||
|
seenDone = true
|
||||||
|
}
|
||||||
|
if ev.Stage == StageError {
|
||||||
|
seenError = true
|
||||||
|
}
|
||||||
|
task.ProgressCh <- ev
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
或更通用:用一個 set 記已看過的終態 stage。
|
||||||
|
|
||||||
|
**嚴重度**:Suggestion、不阻擋。M9-3 wire 到 WS 時若前端有 cleanup 邏輯、值得補。
|
||||||
|
|
||||||
|
#### Suggestion R-2 — `TestUpgradeFirmware_StderrEventAfterCtxCancel` 加 `t.Parallel` 防止 sequential 假陽性
|
||||||
|
|
||||||
|
**檔案**:`kl720_driver_test.go:315`
|
||||||
|
|
||||||
|
**現況**:100 round × 8 goroutine 不斷 tryRouteFirmwareEvent + 主 loop 不斷 unregister/close/recreate。沒 `t.Parallel`、跑單一 test 時 CPU 不一定有真實 race scheduling。
|
||||||
|
|
||||||
|
**建議**:加 `t.Parallel()`,跑 `go test -race -count=10 ./...` 時更容易撞出 race。當前實作已可信、`go test -race` 應該都能 pass,但加 `t.Parallel` 提升 race detector 觸發機率。
|
||||||
|
|
||||||
|
**嚴重度**:Suggestion、test 改善。
|
||||||
|
|
||||||
|
#### Suggestion R-3 — `attemptedUpgrade` 用 `bool` 在 outer goroutine 設、不在 inner goroutine(避免 data race)
|
||||||
|
|
||||||
|
**檔案**:`kl720_driver.go:953`
|
||||||
|
|
||||||
|
如 Minor R-1 結論所述、現狀正確、不要搬。**為求文件齊備、加註解說明「為何不在 goroutine 內設」**:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// attemptedUpgrade 必須在 outer goroutine 寫、不能搬進 sendCommand goroutine
|
||||||
|
// 內、因為 defer(line 917-928)也在 outer goroutine 讀;用 atomic 或 mutex
|
||||||
|
// 會 over-engineering。在 goroutine spawn 前設 = 99.99% 等同於 sendCommand
|
||||||
|
// 真的執行(OS scheduler 失敗導致 goroutine 永不執行的機率可忽略)。
|
||||||
|
attemptedUpgrade = true
|
||||||
|
```
|
||||||
|
|
||||||
|
**嚴重度**:Suggestion、純註解、降後續維護成本。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5 個新測試品質評估
|
||||||
|
|
||||||
|
### TestUpgradeFirmware_InfoNotBlockedDuringUpgrade(driver_test.go:217)— ✅ 直接驗 Major 1
|
||||||
|
|
||||||
|
| 軸 | 評估 |
|
||||||
|
|----|------|
|
||||||
|
| 是否驗到 Major 1 修法 | ✅ 起 UpgradeFirmware goroutine、sleep 50ms 讓 sendCommand 進 stdout.Scan blocking、然後 `d.Info()` 必須 500ms 內回 |
|
||||||
|
| 是否會誤過(false positive)| 若 Major 1 沒修、Info 會被 d.mu 卡 60-200s(fake bridge 永不回應);500ms timeout 必抓到。✅ |
|
||||||
|
| 收尾乾淨 | `cancelUpgrade()` → 等 upgradeDone(2s budget)+ pipe cleanup(fakeBridge.t.Cleanup)。✅ |
|
||||||
|
| **小瑕疵** | `t.Cleanup(cancelUpgrade)` 在 line 225 註冊、但 line 257 又顯式 `cancelUpgrade()`、實際無害(cancel 是冪等的)但有點冗。**不算問題**。 |
|
||||||
|
|
||||||
|
### TestUpgradeFirmware_CtxCancelReleasesBridge(driver_test.go:269)— ✅ 驗 Minor 2
|
||||||
|
|
||||||
|
| 軸 | 評估 |
|
||||||
|
|----|------|
|
||||||
|
| 是否驗到 sendCommand goroutine 釋放 | ✅ UpgradeFirmware 必須在 cancel 後 2s 內 return(vs 老版「永遠死鎖」)|
|
||||||
|
| 驗 timeout event 推送 | ✅ line 297-307 驗 `progressCh` 拿到 `StageError + ReasonTimeout` |
|
||||||
|
| **小瑕疵** | `progressCh` 沒被 drain、若 upgrade 又推 done event(理論不會、因為 sendCommand 應該 error return)會卡 channel buffer。但 buffer = 16、容得下。**不算問題**。 |
|
||||||
|
|
||||||
|
### TestUpgradeFirmware_StderrEventAfterCtxCancel(driver_test.go:315)— ⚠️ 略有壓力測試強度疑慮
|
||||||
|
|
||||||
|
| 軸 | 評估 |
|
||||||
|
|----|------|
|
||||||
|
| 是否驗到 close-channel race | ✅ 100 round × 8 goroutine 高頻率「register → unregister → close」並發、若 fwMu 沒 cover 整段、必撞 panic |
|
||||||
|
| **強度評估** | 100 round 對 race detector 來說「夠用」、但若關閉 race detector(`go test` 沒 `-race`)、可能要更多 round 才能撞到。建議**搭配 CI `-race` 跑**。Suggestion R-2 建議加 `t.Parallel`。 |
|
||||||
|
| race window 是否真實覆蓋 | ✅ `time.Sleep(10 * time.Microsecond)` 給 route goroutine 時間進入 tryRoute、`d.setFirmwareProgressCh(nil)` + `close(ch)` 立刻 follow-up — 這正是 Major 2 設計修法要保證的 ordering。 |
|
||||||
|
|
||||||
|
### TestUpgradeFirmware_MultiDeviceParallel(service_test.go:519)— ✅ 驗 Suggestion 2
|
||||||
|
|
||||||
|
| 軸 | 評估 |
|
||||||
|
|----|------|
|
||||||
|
| 是否驗到 tracker key 不誤匹配 | ✅ 3 device 並發、每個的 events.DeviceID 必須 = 自己的 id(line 585-589)、callCount 必須各 = 1(line 605-613) |
|
||||||
|
| 是否真的並行 | ✅ 3 個 goroutine 同時 call `svc.UpgradeFirmware`、收 startCh、再各自 drain。**真並行**、不是序列。 |
|
||||||
|
| 收尾乾淨 | `svc.WaitForActiveTasks(3 * time.Second)` 等所有 task done、再驗 `HasActiveTask` 為 false。✅ |
|
||||||
|
|
||||||
|
### TestUpgradeFirmware_DedupeDoneEvent(service_test.go:624)— ✅ 驗 Suggestion 1
|
||||||
|
|
||||||
|
| 軸 | 評估 |
|
||||||
|
|----|------|
|
||||||
|
| 是否驗到 dedup | ✅ events 推 2 個 StageDone、drain 後 doneCount 必 = 1 |
|
||||||
|
| **小瑕疵** | 沒驗 done event 的 `AfterVersion` 等內容是否來自「第一個」done(forward 第一個、丟第二個)。雖然 2 個 done 的內容相同(line 629-631 都是 `AfterVersion: "2.2.0"`)、看不出來。**為驗 forward 是「第一個」而非「最後一個」**、可以讓 2 個 done 的 AfterVersion 不同。**不阻擋通過**、純測試精度改善。 |
|
||||||
|
|
||||||
|
### 整體新 test 品質
|
||||||
|
|
||||||
|
- **覆蓋面**:5 個 test 對應 2 Major + 3 Suggestion/Minor、覆蓋完整。
|
||||||
|
- **可信度**:所有 test 都針對 race window 設計合理、用 fake bridge 模擬合適、不過度模擬到驗不到真實 race。
|
||||||
|
- **強度**:建議 CI 跑 `go test -race -count=5 ./internal/driver/kneron/... ./internal/firmware/...` 提升 race detection。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ListBundledVersions 修法(Suggestion 3)驗證
|
||||||
|
|
||||||
|
`service.go:330-340` 修法:
|
||||||
|
|
||||||
|
```go
|
||||||
|
chipDir := filepath.Join(s.fwDir.Root, chip)
|
||||||
|
if _, err := os.Stat(chipDir); err != nil {
|
||||||
|
if os.IsNotExist(err) {
|
||||||
|
return nil, fmt.Errorf("firmware not bundled for chip %q (missing %s)", chip, chipDir)
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("firmware dir stat failed for chip %q: %w", chip, err)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
✅ 區分「chip dir 不存在」(明示 missing build)與「stat 其他錯誤」(permission / IO 問題)。
|
||||||
|
|
||||||
|
`service_test.go:451-489` 對應 test:
|
||||||
|
- `t.TempDir()` 建臨時根、只建 KL520 dir
|
||||||
|
- KL520(dir 存在)→ 應回 1 個 current
|
||||||
|
- KL630(unsupported)→ ErrUnsupportedChip
|
||||||
|
- KL720(dir missing)→ 應回 error
|
||||||
|
- fwDir.Root 空 → 應回 error
|
||||||
|
|
||||||
|
✅ 邊界涵蓋齊全。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 結論
|
||||||
|
|
||||||
|
- ✅ **通過、不阻擋 M9-3 啟動**
|
||||||
|
- ✅ **不需要 backend 第 3 輪**——第 2 輪新發現的 2 Minor + 2 Suggestion 都可在 M9-3 並行修,或留 follow-up,不影響介面/合約
|
||||||
|
- ✅ **不需要升級給 security agent**——第 2 輪修改完全是 concurrency / pattern 改善、無 OWASP / authentication / authorization 相關
|
||||||
|
- ⚠️ **CI 建議**:M9-3 PR 時可加 `go test -race -count=3 ./internal/driver/kneron/... ./internal/firmware/...` 對應 lane、確保 race detection 有跑
|
||||||
|
|
||||||
|
### 給 backend 的肯定
|
||||||
|
|
||||||
|
第 2 輪修法**品質非常高**:
|
||||||
|
1. Major 1 採方案 B 變體(fwUpgradeMu + snapshot pattern)比方案 A(細粒度 Lock)更乾淨、不破壞 sendCommand 既有結構。snapshot pattern 的 nil check + ctx.Done 路徑 Wait 行為都考慮周到。
|
||||||
|
2. Major 2 採方案 A 變體(fwMu 整段持鎖)+ defer + happen-before 推理、比方案 C(atomic.Value)更直觀易維護。
|
||||||
|
3. 5 個新 test 都有針對性地驗 race window、不是只跑 happy path。
|
||||||
|
4. 註解品質特別高(line 47-67、line 263-272、line 826-840、line 944-952)—把「為什麼這樣修 / 老版的問題 / 新版的保證」全寫進註解、未來改 code 的人不會誤踩。
|
||||||
|
|
||||||
|
### 第 2 輪新發現整理
|
||||||
|
|
||||||
|
| # | 嚴重度 | 檔案:行 | 摘要 |
|
||||||
|
|---|--------|---------|------|
|
||||||
|
| Minor R-1 | Minor → 撤回(現狀正確)| kl720_driver.go:953 | attemptedUpgrade 位置 — 結論:現狀正確、改 Suggestion R-3(加註解)|
|
||||||
|
| Minor R-2 | Minor | kl720_driver.go:420 / 317 | Disconnect / stopPython 沒主動清 fwProgressCh — 純防禦性、目前無 bug |
|
||||||
|
| Suggestion R-1 | Suggestion | service.go:213 | `seenError` 也加 guard、與 seenDone 對稱 |
|
||||||
|
| Suggestion R-2 | Suggestion | kl720_driver_test.go:315 | `t.Parallel()` 提升 race detection |
|
||||||
|
| Suggestion R-3 | Suggestion | kl720_driver.go:953 | 加註解說明 attemptedUpgrade 為何不搬進 goroutine |
|
||||||
|
|
||||||
|
全部 5 項都不阻擋 M9-3。
|
||||||
@ -0,0 +1,152 @@
|
|||||||
|
# Reviewer Report — M9-2 Go driver + firmware service
|
||||||
|
|
||||||
|
> 審查日期:2026-05-25
|
||||||
|
> 範圍:M9-2 Go driver UpgradeFirmware + firmware service + types/progress + M9-1 follow-up cleanup
|
||||||
|
> 統計:0 Critical / 2 Major / 5 Minor / 5 Suggestion
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
M9-2 整體實作完整、規格對齊度高、TDD §8.6 graceful shutdown 介面齊備、bridge 端 4 個情境 stage 序列在 Go driver 端正確 forward;25 個 Go test + 36 個 Python test 覆蓋核心路徑。但**有 2 個 Major 與 driver mutex 範圍 + close-channel race 相關**,建議 backend 第 2 輪修;其餘 Minor / Suggestion 不阻擋 M9-3 啟動。**不升級 security agent**。
|
||||||
|
|
||||||
|
## 審查範圍
|
||||||
|
|
||||||
|
| # | 檔案 | 行數 | 性質 |
|
||||||
|
|---|------|------|------|
|
||||||
|
| 1 | `server/internal/firmware/types.go` | 123(新) | 列舉 / Schema |
|
||||||
|
| 2 | `server/internal/firmware/progress.go` | 141(新) | ProgressTracker |
|
||||||
|
| 3 | `server/internal/firmware/service.go` | 346(新) | 核心 service |
|
||||||
|
| 4 | `server/internal/firmware/service_test.go` | 517(新、11 tests) | 單元測試 |
|
||||||
|
| 5 | `server/internal/driver/kneron/kl720_driver.go` | 697→948 | 修改、+251 行 |
|
||||||
|
| 6 | `server/internal/driver/kneron/kl720_driver_test.go` | 177(新、8 tests) | 單元測試 |
|
||||||
|
| 7 | `server/scripts/test_kneron_bridge_firmware.py` | 清 m5/s5、+9/-2 | M9-1 follow-up |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔴 Critical(必修、阻擋 merge)
|
||||||
|
|
||||||
|
**無**。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🟠 Major(強烈建議修、建議第 2 輪修)
|
||||||
|
|
||||||
|
### Major 1 — driver mutex hold 整段升級 + timeout deadlock
|
||||||
|
**檔案**:`kl720_driver.go:860-866 + 875`
|
||||||
|
|
||||||
|
**問題**:sendCommand goroutine 在 `d.mu.Lock() → sendCommand → d.mu.Unlock()` 整段持鎖、sendCommand 內部會 blocking 等 bridge stdout 回應 60-200s。期間:
|
||||||
|
- 其他 method(Info / IsConnected / Disconnect)被卡 60-200s
|
||||||
|
- M9-3 GetActiveTaskInfo / DeviceInfo 會打 `driver.Info()` → 被卡
|
||||||
|
- **timeout 路徑死鎖**:service ctx.Done → driver line 875 `d.mu.Lock()` 等 → 但 sendCommand goroutine 還持有 d.mu → sendCommand goroutine 永遠不回(bridge 卡住、沒人殺它)→ 死鎖
|
||||||
|
|
||||||
|
**建議**:sendCommand goroutine 改為只在 stdin write / stdout read 細粒度 lock、不整段持鎖;或為 firmware 升級開獨立 sub-mutex 避免阻塞 Info。
|
||||||
|
|
||||||
|
### Major 2 — fwProgressCh close-channel race
|
||||||
|
**檔案**:`kl720_driver.go:758-797 + service.go:197-203`
|
||||||
|
|
||||||
|
**問題**:tryRouteFirmwareEvent 取 ch 後 release fwMu 才 send、與 service goroutine close(intermediate) 之間有 race window。
|
||||||
|
- 視窗很窄(一個 LOCK/UNLOCK 之間)
|
||||||
|
- stderr 在 device disconnect / process die 的 inflight event 處理時最容易撞到
|
||||||
|
- **panic: send on closed channel** 可能發生
|
||||||
|
|
||||||
|
**建議修法(任選)**:
|
||||||
|
- A:tryRouteFirmwareEvent 用 `defer recover()` 包 send
|
||||||
|
- B:driver 自己擁有 stderr push ch、UpgradeFirmware return 前先 close own ch + drain 後再 close service intermediate
|
||||||
|
- C:用 atomic.Value 存 ch + close 前先 setFirmwareProgressCh(nil)(已做、但與 close 沒嚴格 happen-before)
|
||||||
|
|
||||||
|
production 跑 1 萬次升級會撞上、建議第 2 輪修。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🟡 Minor(建議修、不阻擋)
|
||||||
|
|
||||||
|
| # | 檔案:行 | 問題 | 建議 |
|
||||||
|
|---|--------|------|------|
|
||||||
|
| Minor 1 | service.go:36-39 | brickRiskReasons 集合少 `ReasonTimeout`(TDD §3.4 第 6 列「>180s KL720」標 BRICK_RISK)| 加註解或加入集合 |
|
||||||
|
| Minor 2 | service_test.go / kl720_driver_test.go | 缺 `ctx.Done` 後 sendCommand goroutine 釋放測試、缺多 device 平行 upgrade 測試 | 補 `TestUpgradeFirmware_CtxCancelReleasesBridge` + `TestUpgradeFirmware_MultiDeviceParallel` |
|
||||||
|
| Minor 3 | kl720_driver.go:830-840 | 早退路徑(pythonReady=false / unsupported chip)defer 仍把 needsReset 設 true、語意不準 | 加 `if !attemptedUpgrade { return }` guard |
|
||||||
|
| Minor 4 | service.go:189-196 | 「結論先寫、推論在前」順序、初讀者要多看一次 | 把結論移到段首 |
|
||||||
|
| Minor 5 | kl720_driver.go:758-797 | fwMu 取 ch 後 release 才 send 的 atomic 缺口(同 Major 2 同源、但跨 session lifecycle 風險)| 與 Major 2 一起處理 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 💡 Suggestion(純改善建議)
|
||||||
|
|
||||||
|
1. **kl720_driver.go:925-937**:sendCommand 成功後補 done event 容錯、會與 stderr 推的 done event 重複。M9-3 wire 到 WS 後前端可能跑兩次 cleanup。建議 service.runUpgrade forward loop 對 `StageDone` 去重、或前端 idempotent guard。
|
||||||
|
2. 補 `TestUpgradeFirmware_MultiDeviceParallel` 3 device 同時 upgrade、驗 ProgressTracker key 不誤匹配。
|
||||||
|
3. `service.go:303-324` ListBundledVersions 加 `os.Stat(chipDir)` lightly check、區分 chip 不支援 vs firmware 漏 build。
|
||||||
|
4. `bridgeFirmwareEvent` struct 與 `firmware.FirmwareProgress` 共用 struct + 雙 JSON tag、避免欄位增刪維護兩處。
|
||||||
|
5. `progress.go:25` `Task.cancel func()` field 目前 service 寫入但沒 reader、加 TODO 或 godoc 註解說明預留 SIGTERM force-cancel。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4 個情境 stage 序列在 Go driver 端的對應
|
||||||
|
|
||||||
|
| 情境 | bridge 行為(M9-1)| Go driver 處理 | 結果 |
|
||||||
|
|------|---------|--------------|------|
|
||||||
|
| KL520 KDP1 + loader | 5 stage:preparing → loading → flashing → verifying → done | stderr scanner 偵 5 line JSON → tryRouteFirmwareEvent → forward intermediate → runUpgrade 補欄位 → task.ProgressCh | ✅ TestUpgradeFirmware_SuccessFiveStages 驗證 |
|
||||||
|
| KL520 KDP2 short-circuit | 4 stage:preparing → flashing → verifying → done(無 loading) | 同上、4 event forward;driver 端對 stage 順序無 hardcode 校驗 | ✅ 不誤判 |
|
||||||
|
| KL720 legacy(無 loader) | 4 stage:preparing → flashing → verifying → done | 同上、4 event forward | ✅ TestKL720KDPLegacy 等效驗證 |
|
||||||
|
| 已 KDP2 / no-op | bridge 端仍走 verify rescan 確認 | done event 推到 progressCh + sendCommand 回 success 後 line 928 補 done event 容錯(雙保險、可能 done 出現 2 次、Suggestion 1)| ✅ 雙保險(但需去重) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## TDD 對齊度評估
|
||||||
|
|
||||||
|
| TDD 章節 | 對應實作 | 對齊狀態 |
|
||||||
|
|---------|---------|---------|
|
||||||
|
| §4.2 FirmwareProgress 9 欄位 | types.go:34-53 | ✅ 完全對齊 |
|
||||||
|
| §4.3 Stage 列舉 6 個 | types.go:56-63 | ✅ |
|
||||||
|
| §5.1 A 階段流程 | service.go + driver.UpgradeFirmware | ✅ |
|
||||||
|
| §5.2 B2 降版流程 | 留 B2、本期 only Upgrade | ✅ Direction enum 兩值都保留 |
|
||||||
|
| §6.1 8 reason enum | types.go:73-82 | ✅ |
|
||||||
|
| §6.1 handler 回傳格式 | bridgeFirmwareEvent struct kl720_driver.go:731-744 | ✅ JSON tag snake_case 對齊 bridge.py |
|
||||||
|
| §8.6.1 HasActiveTask / RequestShutdown / WaitForActiveTasks | service.go:258-293 | ✅ |
|
||||||
|
| §8.6.2 GetActiveTaskInfo(含 EtaSeconds) | service.go:266-268 + progress.go:108-141 | ✅ |
|
||||||
|
| §8.6.3 SIGTERM handler 整合 main.go | 延後 M9-3 | ✅ service 端介面齊備 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Concurrency 評估(重點關注)
|
||||||
|
|
||||||
|
整體 goroutine / mutex / channel 拓樸:
|
||||||
|
|
||||||
|
```
|
||||||
|
service.UpgradeFirmware(caller 同步)
|
||||||
|
└─ taskWg.Add(1)
|
||||||
|
└─ go runUpgrade
|
||||||
|
├─ go drv.UpgradeFirmware(ctx, chip, intermediate)
|
||||||
|
│ └─ go func() sendCommand → resCh
|
||||||
|
│ └─ select ctx.Done / resCh → 推 event 到 intermediate
|
||||||
|
│ └─ return → defer setFirmwareProgressCh(nil)
|
||||||
|
├─ for ev := range intermediate { forward + 補欄位 → task.ProgressCh }
|
||||||
|
├─ <-driverDone(err)
|
||||||
|
├─ if err && lastStage != Done/Error → 補 error event
|
||||||
|
└─ defer close(task.ProgressCh) → markDone → taskWg.Done
|
||||||
|
|
||||||
|
外部 stderr goroutine(在 driver.startPython 內、process-level)
|
||||||
|
└─ scanner.Scan() loop
|
||||||
|
└─ if firmware_progress prefix → tryRouteFirmwareEvent → 取 fwProgressCh → write
|
||||||
|
```
|
||||||
|
|
||||||
|
**Race / deadlock 詳細分析見上方 Major 1 / Major 2。**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## M9-1 follow-up 清理驗證
|
||||||
|
|
||||||
|
| 項目 | 預期 | 實際 | 狀態 |
|
||||||
|
|------|------|------|------|
|
||||||
|
| m5 行 616 死碼刪除 | 砍 `bridge._firmware_upgrade_start_ts` | line 615-617 改為只設 `_firmware_upgrade_in_progress = False` + Reviewer 註解 | ✅ |
|
||||||
|
| m5 行 632 死碼刪除 | 同上 | line 631-636 改為 `start_ts = time.monotonic()` 本地變數 + 註解明示 closure capture | ✅ |
|
||||||
|
| s5 註解 | 第二次 register 的「刻意設計」說明 | line 684-688 註解明確說明 idempotent shape + retest install→unregister | ✅ |
|
||||||
|
|
||||||
|
**正面評價**:M9-1 reviewer 留下的問題清單在 M9-2 確實被執行。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 結論
|
||||||
|
|
||||||
|
- ✅ **通過 with 2 Major + 5 Minor + 5 Suggestion**
|
||||||
|
- ✅ **不阻擋 M9-3 啟動**——Major 1 / 2 是 driver 內部 concurrency 不影響 M9-3(HTTP handler + WebSocket)介面對齊;M9-3 可平行起手、Major 1 / 2 第 2 輪修
|
||||||
|
- ✅ **不需要升級給 security agent**——security 軸無 OWASP / authentication / authorization / injection 等深審需求
|
||||||
|
- ⚠️ **建議 backend 第 2 輪修 Major 1 + Major 2**:mutex 範圍 + close-channel race 是 production 跑萬次後會踩雷的問題、修起來成本不高、值得在 M9-2 收尾前修完
|
||||||
@ -208,7 +208,49 @@
|
|||||||
- TDD §6.1 對齊度維持 98%、M3 forward-compat 對未來 KDP3+ device brick 風險顯著改善
|
- TDD §6.1 對齊度維持 98%、M3 forward-compat 對未來 KDP3+ device brick 風險顯著改善
|
||||||
- 既有 6 handler 零改動驗證通過
|
- 既有 6 handler 零改動驗證通過
|
||||||
- [x] **M9-1 整體完成**(2026-05-25)→ 通過 with Suggestions、可進 M9-2
|
- [x] **M9-1 整體完成**(2026-05-25)→ 通過 with Suggestions、可進 M9-2
|
||||||
- [ ] M9-2 Go driver UpgradeFirmware + firmware/service.go
|
- [x] **M9-2 Go driver + firmware service 完成**(2026-05-25)
|
||||||
|
- `server/internal/firmware/types.go`:新檔 123 行(FirmwareVersion / FirmwareProgress / ActiveTaskInfo / UpgradeDriver interface / 8 reason const)
|
||||||
|
- `server/internal/firmware/progress.go`:新檔 141 行(仿 flash pattern 的 Tracker)
|
||||||
|
- `server/internal/firmware/service.go`:新檔 346 行(核心 service)
|
||||||
|
- `server/internal/firmware/service_test.go`:新檔 517 行、11 tests
|
||||||
|
- `server/internal/driver/kneron/kl720_driver.go`:697 → 948 行(+251、UpgradeFirmware method + helpers + stderr route)
|
||||||
|
- `server/internal/driver/kneron/kl720_driver_test.go`:新檔 177 行、8 tests
|
||||||
|
- `test_kneron_bridge_firmware.py`:清 M9-1 留下的 m5 + s5(+9/-2)
|
||||||
|
- **測試結果**:Go 25/25 pass(1.8s+1.0s)、全 server `go test ./...` 全 pass 無 regression、race-clean、Python 36/36 pass
|
||||||
|
- **TDD 對齊**:§4.2 / §4.3 / §5.1 / §6 / §6.1 / §8.6 / §3.4 全到位
|
||||||
|
- **2 個 Deviation**(合理):
|
||||||
|
- `guards.go` / `versions.go` 未建(B2 階段 M9-11 才需要、A 階段 service.go 已內嵌最簡實作)
|
||||||
|
- SIGTERM handler 串接 main.go 留 M9-3(避免大改既有 shutdown flow)
|
||||||
|
- **給 M9-3 的銜接 API**:`taskID, progressCh, err := svc.UpgradeFirmware(ctx, deviceID, chip)` + WebSocket room `firmware:<deviceID>` 廣播 + `HasActiveTask()` / `GetActiveTaskInfo()` 給 control panel
|
||||||
|
- **M9-1 follow-up 順手清**:m5 + s5 全做
|
||||||
|
- [x] **M9-2 Reviewer 第 1 輪完成**(2026-05-25)→ `.autoflow/05-implementation/review/m9-2-go-driver-firmware-service-review.md`
|
||||||
|
- 結論:**0 Critical / 2 Major / 5 Minor / 5 Suggestion、不阻擋 M9-3、不升 security**
|
||||||
|
- **Major 1**:sendCommand goroutine 持 d.mu 整段升級 60-200s + timeout deadlock(kl720_driver.go:860-866+875)
|
||||||
|
- **Major 2**:fwProgressCh close-channel race(kl720_driver.go:758-797 + service.go:197-203、production 跑萬次會踩 panic)
|
||||||
|
- 5 Minor(brickRiskReasons / 補測試 / needsReset 早退 / 註解順序 / fwMu atomic 缺口)
|
||||||
|
- 5 Suggestion(done event 去重 / multi-device 測試 / ListBundledVersions Stat / struct 共用 / cancel field godoc)
|
||||||
|
- M9-1 follow-up(m5 + s5)正面驗證通過
|
||||||
|
- **建議 backend 第 2 輪修 Major 1+2**(production race、修起來成本不高)
|
||||||
|
- [x] **M9-2 Backend 第 2 輪修改完成**(2026-05-25)
|
||||||
|
- `kl720_driver.go`:948 → 1054(+106、新 sendCommandForUpgrade 60 行 snapshot pattern + fwUpgradeMu)
|
||||||
|
- `kl720_driver_test.go`:177 → 360(+183、3 新 test:InfoNotBlockedDuringUpgrade / CtxCancelReleasesBridge / StderrEventAfterCtxCancel 100 round stress)
|
||||||
|
- `service.go`:346 → 373(+27)
|
||||||
|
- `service_test.go`:517 → 676(+159、2 新 test + ListBundledVersions 改 tempdir)
|
||||||
|
- `progress.go`:141 → 147(+6、Task.cancel godoc)
|
||||||
|
- **Major 1 修法**:方案 B 變體(fwUpgradeMu 獨立鎖 + `sendCommandForUpgrade()` snapshot stdin/stdout pattern、避開 d.mu field-level race)
|
||||||
|
- **Major 2 修法**:方案 A 變體(tryRouteFirmwareEvent 持 fwMu 整段、配合 driver defer setFirmwareProgressCh(nil) 提供 happen-before 保證)
|
||||||
|
- Minor 1-5 + Suggestion 1/2/3/5 全修
|
||||||
|
- **留 follow-up**:Suggestion 4(bridgeFirmwareEvent / FirmwareProgress struct 合併、可在 M9-3 wire WS 時順手做)
|
||||||
|
- 測試:go test ./... -race -count=1 全綠(28s)/ Python 36 tests + 22 subtests 全綠(0.31s)/ go vet / build 0 output / 0 regression
|
||||||
|
- [x] **M9-2 Reviewer 第 2 輪通過**(2026-05-25)→ `.autoflow/05-implementation/review/m9-2-go-driver-firmware-service-review-round2.md`
|
||||||
|
- 結論:**0 Critical / 0 Major / 2 Minor / 2 Suggestion、不阻擋 M9-3、不需 backend 第 3 輪**
|
||||||
|
- 第 1 輪 11/12 issue 全修到位(Suggestion 4 backend 明示留 follow-up)
|
||||||
|
- **Major 1 snapshot pattern 驗證**:stdin/stdout snapshot 後 release d.mu、ctx.Done 路徑自然 return、無 panic 無 leak、lock 順序單向(fwUpgradeMu → d.mu)無反序 deadlock
|
||||||
|
- **Major 2 happen-before 驗證**:fwMu 保證 inflight tryRouteFirmwareEvent 必定 either「set nil 前完成 send」or「set nil 後讀到 ch==nil return false」、絕無 send on closed channel panic
|
||||||
|
- Minor R-1(attemptedUpgrade 位置)驗證後撤回降為 Suggestion R-3
|
||||||
|
- Minor R-2(Disconnect/stopPython 沒主動清 fwProgressCh、純防禦性、目前無 bug)
|
||||||
|
- **CI 建議**:M9-3 PR 加 `go test -race -count=3` 提升 race detection 強度
|
||||||
|
- [x] **M9-2 整體完成**(2026-05-25)→ 通過、可進 M9-3
|
||||||
- [ ] M9-3 API handler + WebSocket progress
|
- [ ] M9-3 API handler + WebSocket progress
|
||||||
- [ ] M9-4 Frontend FW badge + 升級 modal
|
- [ ] M9-4 Frontend FW badge + 升級 modal
|
||||||
- [ ] M9-5 三平台實機驗證
|
- [ ] M9-5 三平台實機驗證
|
||||||
|
|||||||
@ -2,6 +2,7 @@ package kneron
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
|
"context"
|
||||||
"encoding/base64"
|
"encoding/base64"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
@ -15,6 +16,7 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
|
|
||||||
"visiona-local/server/internal/driver"
|
"visiona-local/server/internal/driver"
|
||||||
|
"visiona-local/server/internal/firmware"
|
||||||
"visiona-local/server/pkg/logger"
|
"visiona-local/server/pkg/logger"
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -41,6 +43,28 @@ type KneronDriver struct {
|
|||||||
pythonReady bool
|
pythonReady bool
|
||||||
logBroadcaster *logger.Broadcaster
|
logBroadcaster *logger.Broadcaster
|
||||||
needsReset bool // true on first connect after server start to clear stale models
|
needsReset bool // true on first connect after server start to clear stale models
|
||||||
|
|
||||||
|
// fwUpgradeMu 是 firmware 升級期間 sendCommand 專用的細粒度 mutex。
|
||||||
|
//
|
||||||
|
// Major 1 修法:升級期間 sendCommand goroutine 可能 blocking 等 bridge stdout
|
||||||
|
// 60-200s、若整段持 d.mu 會卡住 Info / IsConnected / Disconnect 等 method、
|
||||||
|
// 並造成 ctx.Done 路徑 deadlock(service ctx.Done → driver 想 d.mu.Lock 殺
|
||||||
|
// bridge → 但 sendCommand goroutine 還持 d.mu)。
|
||||||
|
//
|
||||||
|
// 解法:升級期間 sendCommand 改持 fwUpgradeMu、不持 d.mu;其他 method 仍
|
||||||
|
// 走 d.mu、彼此不互卡。ctx.Done 路徑也走 d.mu、可以順利 stopPython。
|
||||||
|
fwUpgradeMu sync.Mutex
|
||||||
|
|
||||||
|
// firmware 升降版進行中專用的 progress channel。stderr goroutine 偵測到
|
||||||
|
// `{"event":"firmware_progress",...}` JSON line 時、會 route 到這個 channel。
|
||||||
|
// nil 時 stderr goroutine 走預設 broadcaster 路徑。
|
||||||
|
//
|
||||||
|
// Major 2 修法:fwMu 在 tryRouteFirmwareEvent 內全程持有(檢查 + send 不可
|
||||||
|
// 分開)、保證與 setFirmwareProgressCh(nil) 互斥;service 端在 close 之前
|
||||||
|
// 透過 driver 的 defer setFirmwareProgressCh(nil) + fwMu happen-before 保證
|
||||||
|
// close(intermediate) 後不會再有 inflight send。
|
||||||
|
fwMu sync.Mutex
|
||||||
|
fwProgressCh chan<- firmware.FirmwareProgress
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewKneronDriver creates a new KneronDriver with the given device info and
|
// NewKneronDriver creates a new KneronDriver with the given device info and
|
||||||
@ -147,10 +171,24 @@ func (d *KneronDriver) startPython() error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Forward bridge stderr line-by-line to os.Stderr + broadcaster.
|
// Forward bridge stderr line-by-line to os.Stderr + broadcaster.
|
||||||
|
// 同時截 firmware_progress JSON event 給目前的 firmware upgrade goroutine(如有)。
|
||||||
go func() {
|
go func() {
|
||||||
scanner := bufio.NewScanner(stderrPipe)
|
scanner := bufio.NewScanner(stderrPipe)
|
||||||
|
// stderr 行可能很長(traceback、JSON event);保險開到 1MB。
|
||||||
|
scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
|
||||||
for scanner.Scan() {
|
for scanner.Scan() {
|
||||||
line := scanner.Text()
|
line := scanner.Text()
|
||||||
|
// 嘗試解析為 firmware_progress event(成本:strings.HasPrefix
|
||||||
|
// 短路、JSON.Unmarshal 只在前綴 match 時跑)
|
||||||
|
if strings.HasPrefix(line, `{"event":"firmware_progress"`) ||
|
||||||
|
strings.Contains(line, `"event": "firmware_progress"`) {
|
||||||
|
if d.tryRouteFirmwareEvent(line) {
|
||||||
|
// 已 route 給 firmware ch;同時還是寫一份到 os.Stderr,
|
||||||
|
// 方便終端使用者看見 progress,但不灌進 broadcaster 避免噪音
|
||||||
|
fmt.Fprintln(os.Stderr, line)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
}
|
||||||
fmt.Fprintln(os.Stderr, line)
|
fmt.Fprintln(os.Stderr, line)
|
||||||
if d.logBroadcaster != nil {
|
if d.logBroadcaster != nil {
|
||||||
d.logBroadcaster.Push("DEBUG", line)
|
d.logBroadcaster.Push("DEBUG", line)
|
||||||
@ -183,6 +221,9 @@ func (d *KneronDriver) startPython() error {
|
|||||||
|
|
||||||
// sendCommand sends a JSON command to the Python subprocess and returns
|
// sendCommand sends a JSON command to the Python subprocess and returns
|
||||||
// the parsed JSON response.
|
// the parsed JSON response.
|
||||||
|
//
|
||||||
|
// 呼叫者必須持有 d.mu(保證 d.stdin / d.stdout / d.pythonReady 一致)。
|
||||||
|
// firmware upgrade 場景因為要避開長期持 d.mu、走 sendCommandUnlocked。
|
||||||
func (d *KneronDriver) sendCommand(cmd map[string]interface{}) (map[string]interface{}, error) {
|
func (d *KneronDriver) sendCommand(cmd map[string]interface{}) (map[string]interface{}, error) {
|
||||||
if !d.pythonReady {
|
if !d.pythonReady {
|
||||||
return nil, fmt.Errorf("python bridge is not running")
|
return nil, fmt.Errorf("python bridge is not running")
|
||||||
@ -219,6 +260,59 @@ func (d *KneronDriver) sendCommand(cmd map[string]interface{}) (map[string]inter
|
|||||||
return resp, nil
|
return resp, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// sendCommandForUpgrade 是 firmware upgrade 專用的 sendCommand 變形。
|
||||||
|
//
|
||||||
|
// Major 1 修法配套:升級期間 d.mu 不能長持(會卡 Info / IsConnected),但
|
||||||
|
// sendCommand 直接讀 d.stdin / d.stdout / d.pythonReady 三個 field 會與
|
||||||
|
// stopPython(ctx.Done 路徑下持 d.mu 寫這些 field)產生 race。
|
||||||
|
//
|
||||||
|
// 解法:本函式在持 d.mu 期間 snapshot stdin / stdout / pythonReady 到 local
|
||||||
|
// var、release d.mu 後在 ref 上做 I/O。stopPython 之後即使把 d.stdin = nil
|
||||||
|
// 也只影響 future 呼叫;本次拿到的 stdin / stdout ref 仍可 I/O;process kill
|
||||||
|
// 後 stdout pipe EOF、Scan 自然 fail return error。
|
||||||
|
func (d *KneronDriver) sendCommandForUpgrade(cmd map[string]interface{}) (map[string]interface{}, error) {
|
||||||
|
// snapshot 階段持 d.mu 確保三個 field 一致;只持很短的時間。
|
||||||
|
d.mu.Lock()
|
||||||
|
if !d.pythonReady {
|
||||||
|
d.mu.Unlock()
|
||||||
|
return nil, fmt.Errorf("python bridge is not running")
|
||||||
|
}
|
||||||
|
stdin := d.stdin
|
||||||
|
stdout := d.stdout
|
||||||
|
d.mu.Unlock()
|
||||||
|
|
||||||
|
if stdin == nil || stdout == nil {
|
||||||
|
return nil, fmt.Errorf("python bridge stdin/stdout not initialized")
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := json.Marshal(cmd)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal command: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if _, err := fmt.Fprintf(stdin, "%s\n", data); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to write to python bridge: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !stdout.Scan() {
|
||||||
|
if err := stdout.Err(); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read from python bridge: %w", err)
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("python bridge closed unexpectedly")
|
||||||
|
}
|
||||||
|
|
||||||
|
var resp map[string]interface{}
|
||||||
|
if err := json.Unmarshal([]byte(stdout.Text()), &resp); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse python response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if errMsg, ok := resp["error"].(string); ok {
|
||||||
|
return nil, fmt.Errorf("python bridge error: %s", errMsg)
|
||||||
|
}
|
||||||
|
|
||||||
|
return resp, nil
|
||||||
|
}
|
||||||
|
|
||||||
// stopPython kills the Python subprocess and cleans up resources.
|
// stopPython kills the Python subprocess and cleans up resources.
|
||||||
func (d *KneronDriver) stopPython() {
|
func (d *KneronDriver) stopPython() {
|
||||||
d.pythonReady = false
|
d.pythonReady = false
|
||||||
@ -695,3 +789,266 @@ func (d *KneronDriver) GetModelInfo() (*driver.ModelInfo, error) {
|
|||||||
LoadedAt: time.Now(),
|
LoadedAt: time.Now(),
|
||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// Firmware upgrade (M9-2、A 階段 KDP1 → KDP2)
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// bridgeFirmwareEvent 對應 bridge.py `_fw_emit_progress` 推到 stderr
|
||||||
|
// 的 JSON schema(kneron_bridge.py:1263 / TDD §4.2)。
|
||||||
|
//
|
||||||
|
// 欄位 snake_case 與 bridge.py 一致;Go 端轉成 firmware.FirmwareProgress
|
||||||
|
//(camelCase)後再走 WebSocket。
|
||||||
|
type bridgeFirmwareEvent struct {
|
||||||
|
Event string `json:"event"`
|
||||||
|
Percent int `json:"percent"`
|
||||||
|
Stage string `json:"stage"`
|
||||||
|
Message string `json:"message"`
|
||||||
|
ElapsedMs int64 `json:"elapsed_ms"`
|
||||||
|
EtaMs int64 `json:"eta_ms"`
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
Reason string `json:"reason,omitempty"`
|
||||||
|
RawError string `json:"raw_error,omitempty"`
|
||||||
|
BeforeVersion string `json:"before_version,omitempty"`
|
||||||
|
AfterVersion string `json:"after_version,omitempty"`
|
||||||
|
Method string `json:"method,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// setFirmwareProgressCh 註冊 / 解除 firmware progress 路由 channel。
|
||||||
|
//
|
||||||
|
// nil 代表解除(回到「全部 stderr 走 broadcaster」的預設模式)。
|
||||||
|
func (d *KneronDriver) setFirmwareProgressCh(ch chan<- firmware.FirmwareProgress) {
|
||||||
|
d.fwMu.Lock()
|
||||||
|
d.fwProgressCh = ch
|
||||||
|
d.fwMu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// tryRouteFirmwareEvent 嘗試把一行 stderr JSON 解析成 firmware progress event
|
||||||
|
// 並寫入目前註冊的 fwProgressCh。失敗(無 channel / JSON 不合 / event 不對)
|
||||||
|
// 回 false、caller 應走預設 broadcaster 路徑。
|
||||||
|
//
|
||||||
|
// Major 2 修法(close-channel race):fwMu 在整段「檢查 ch + send」期間都持
|
||||||
|
// 有、不再「取出 ref 後 release 才 send」。配合 setFirmwareProgressCh(nil)
|
||||||
|
// 取得同一把 fwMu、保證以下時序:
|
||||||
|
//
|
||||||
|
// 1. driver UpgradeFirmware return → defer setFirmwareProgressCh(nil)
|
||||||
|
// 〔happen-before 透過 fwMu〕
|
||||||
|
// 2. service goroutine 收到 driverDone → close(intermediate)
|
||||||
|
//
|
||||||
|
// 步驟 1 完成後、任何後續 tryRouteFirmwareEvent 取到的 ch 都是 nil、直接
|
||||||
|
// return false、不會 send 到 close 的 channel;inflight 的 tryRoute call
|
||||||
|
// 也因持鎖、setFirmwareProgressCh(nil) 會等到它做完。
|
||||||
|
func (d *KneronDriver) tryRouteFirmwareEvent(line string) bool {
|
||||||
|
// 先嘗試 parse、避免持鎖期間做 CPU-bound JSON 解析。
|
||||||
|
var ev bridgeFirmwareEvent
|
||||||
|
if err := json.Unmarshal([]byte(line), &ev); err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if ev.Event != "firmware_progress" {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
fp := firmware.FirmwareProgress{
|
||||||
|
Stage: ev.Stage,
|
||||||
|
Percent: ev.Percent,
|
||||||
|
Message: ev.Message,
|
||||||
|
ElapsedMs: ev.ElapsedMs,
|
||||||
|
EtaMs: ev.EtaMs,
|
||||||
|
Error: ev.Error,
|
||||||
|
Reason: ev.Reason,
|
||||||
|
RawError: ev.RawError,
|
||||||
|
BeforeVersion: ev.BeforeVersion,
|
||||||
|
AfterVersion: ev.AfterVersion,
|
||||||
|
Method: ev.Method,
|
||||||
|
}
|
||||||
|
|
||||||
|
// fwMu 持鎖期間做「檢查 nil + send」、與 setFirmwareProgressCh(nil) 互斥。
|
||||||
|
// 非阻塞 send;channel 滿時寧可丟單一 event 也不害住 stderr goroutine、
|
||||||
|
// 也不長時間持 fwMu。
|
||||||
|
d.fwMu.Lock()
|
||||||
|
defer d.fwMu.Unlock()
|
||||||
|
ch := d.fwProgressCh
|
||||||
|
if ch == nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
select {
|
||||||
|
case ch <- fp:
|
||||||
|
default:
|
||||||
|
// 落到 broadcaster 走 DEBUG 線、debug 用
|
||||||
|
if d.logBroadcaster != nil {
|
||||||
|
d.logBroadcaster.Push("WARN", "[kneron] firmware progress channel full, dropping event: "+line)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpgradeFirmware 觸發 bridge.py firmware_upgrade handler、收集 stderr 上來
|
||||||
|
// 的 firmware_progress events 寫到 progressCh、回終態 error。
|
||||||
|
//
|
||||||
|
// 實作 firmware.UpgradeDriver interface。
|
||||||
|
//
|
||||||
|
// 流程(對應 TDD §5.1):
|
||||||
|
// 1. lock check:須已 connected(pythonReady);chip 須 supported
|
||||||
|
// 2. 設定 fwProgressCh、stderr goroutine 開始 route firmware events
|
||||||
|
// 3. 把 firmware_upgrade JSON command 寫到 bridge stdin
|
||||||
|
// 4. spawn goroutine 等 bridge stdout 回應(用 sendCommand 拉 stdout)
|
||||||
|
// 5. select:ctx.Done()(service timeout)→ stopPython 強殺 bridge;
|
||||||
|
// stdout 回應 → 解析成功 / 失敗
|
||||||
|
// 6. 推終態 event(done / error)給 progressCh
|
||||||
|
// 7. 解除 fwProgressCh、回 error
|
||||||
|
//
|
||||||
|
// 注意:升級期間 driver 的 d.mu 不持續 lock(sendCommand 內已自己處理);
|
||||||
|
// 但仍需要 d.pythonReady = true(已 connected)才能呼叫、否則 caller 應
|
||||||
|
// 先 Connect()。
|
||||||
|
func (d *KneronDriver) UpgradeFirmware(
|
||||||
|
ctx context.Context,
|
||||||
|
chip string,
|
||||||
|
progressCh chan<- firmware.FirmwareProgress,
|
||||||
|
) error {
|
||||||
|
d.mu.Lock()
|
||||||
|
pythonReady := d.pythonReady
|
||||||
|
port := d.info.Port
|
||||||
|
d.info.Status = "upgrading"
|
||||||
|
d.mu.Unlock()
|
||||||
|
|
||||||
|
// Minor 3 修法:只有真進入 sendCommand 後、才把 needsReset 設 true(避免
|
||||||
|
// 早退路徑[pythonReady=false / unsupported chip]誤標 needsReset)。
|
||||||
|
var attemptedUpgrade bool
|
||||||
|
defer func() {
|
||||||
|
d.mu.Lock()
|
||||||
|
// 升級結束(不論成功失敗)→ Status 回 Connected、由上層決定是否 rescan
|
||||||
|
if d.connected {
|
||||||
|
d.info.Status = driver.StatusConnected
|
||||||
|
}
|
||||||
|
// 對應 TDD §8.5「needsReset 重用」:真做過升級的、設 needsReset=true
|
||||||
|
// 讓下次 connect 走完整 reset、避開 Error 15。
|
||||||
|
if attemptedUpgrade {
|
||||||
|
d.needsReset = true
|
||||||
|
}
|
||||||
|
d.mu.Unlock()
|
||||||
|
}()
|
||||||
|
|
||||||
|
if !pythonReady {
|
||||||
|
return fmt.Errorf("python bridge not running; device must be connected first")
|
||||||
|
}
|
||||||
|
if !firmware.SupportedUpgradeChip(chip) {
|
||||||
|
return fmt.Errorf("unsupported chip for upgrade: %q (A 階段限 KL520/KL720)", chip)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 1: register progress routing
|
||||||
|
d.setFirmwareProgressCh(progressCh)
|
||||||
|
defer d.setFirmwareProgressCh(nil)
|
||||||
|
|
||||||
|
// Step 2: spawn sendCommand in goroutine so we can race with ctx.Done()
|
||||||
|
//
|
||||||
|
// Major 1 修法:sendCommand goroutine 用 fwUpgradeMu 序列化、不持 d.mu。
|
||||||
|
// 升級期間可能 blocking 60-200s 等 bridge stdout、若持 d.mu 會卡住
|
||||||
|
// Info / IsConnected / Disconnect、並造成 ctx.Done 路徑 deadlock(service
|
||||||
|
// timeout → driver 想 d.mu.Lock 殺 bridge → sendCommand goroutine 還持
|
||||||
|
// d.mu → 永遠死鎖)。fwUpgradeMu 與 d.mu 完全分離、避免互卡。
|
||||||
|
//
|
||||||
|
// 注意:仍須在進 sendCommand 前確認 fwUpgradeMu 取得、避免同一 driver 上
|
||||||
|
// 兩個升級 race(不過 service 端 tracker 已防同 device 重複 task、這是
|
||||||
|
// 雙保險)。
|
||||||
|
attemptedUpgrade = true
|
||||||
|
type result struct {
|
||||||
|
resp map[string]interface{}
|
||||||
|
err error
|
||||||
|
}
|
||||||
|
resCh := make(chan result, 1)
|
||||||
|
go func() {
|
||||||
|
d.fwUpgradeMu.Lock()
|
||||||
|
// 用 sendCommandForUpgrade:snapshot d.stdin / d.stdout 後 release d.mu、
|
||||||
|
// 避免長期持 d.mu 卡住 Info / IsConnected,並避免 d.stdin / d.stdout 與
|
||||||
|
// stopPython(ctx.Done 路徑)的 field-level race。
|
||||||
|
resp, err := d.sendCommandForUpgrade(map[string]interface{}{
|
||||||
|
"cmd": "firmware_upgrade",
|
||||||
|
"port": port,
|
||||||
|
"chip": chip,
|
||||||
|
})
|
||||||
|
d.fwUpgradeMu.Unlock()
|
||||||
|
resCh <- result{resp, err}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Step 3: race ctx vs sendCommand
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
// service 端 timeout 或被 cancel;強制 kill bridge 讓 sendCommand goroutine 結束。
|
||||||
|
// 注意:因為 sendCommand goroutine 不持 d.mu(只持 fwUpgradeMu)、這裡
|
||||||
|
// d.mu.Lock 不會 deadlock。stopPython 會 kill bridge process、讓 stdout
|
||||||
|
// scanner 拿到 EOF、sendCommand 從 Scan 中返回 error、goroutine 結束。
|
||||||
|
d.driverLog("WARN", "[kneron] firmware_upgrade context done (%v), killing bridge to release sendCommand", ctx.Err())
|
||||||
|
d.mu.Lock()
|
||||||
|
d.stopPython()
|
||||||
|
d.connected = false
|
||||||
|
d.mu.Unlock()
|
||||||
|
|
||||||
|
// 推 timeout event(讓上層收到終態)
|
||||||
|
ev := firmware.FirmwareProgress{
|
||||||
|
Stage: firmware.StageError,
|
||||||
|
Percent: -1,
|
||||||
|
Error: fmt.Sprintf("upgrade context done: %v", ctx.Err()),
|
||||||
|
Reason: firmware.ReasonTimeout,
|
||||||
|
RawError: ctx.Err().Error(),
|
||||||
|
}
|
||||||
|
// 走非阻塞、避免 service 端已不收
|
||||||
|
select {
|
||||||
|
case progressCh <- ev:
|
||||||
|
default:
|
||||||
|
}
|
||||||
|
// drain sendCommand goroutine(避免 leak)
|
||||||
|
go func() { <-resCh }()
|
||||||
|
return fmt.Errorf("firmware_upgrade timeout / cancel: %w", ctx.Err())
|
||||||
|
|
||||||
|
case r := <-resCh:
|
||||||
|
if r.err != nil {
|
||||||
|
// bridge 回 error JSON 或 stdout 中斷;推 error event 給 caller
|
||||||
|
// 注意:bridge 端的 stage="error" event 通常已透過 stderr 推過了、
|
||||||
|
// 這裡是 fallback safety net(避免 channel 終態事件遺失)
|
||||||
|
d.driverLog("ERROR", "[kneron] firmware_upgrade bridge error: %v", r.err)
|
||||||
|
select {
|
||||||
|
case progressCh <- firmware.FirmwareProgress{
|
||||||
|
Stage: firmware.StageError,
|
||||||
|
Percent: -1,
|
||||||
|
Error: r.err.Error(),
|
||||||
|
Reason: firmware.ReasonUpgradeMidFailed,
|
||||||
|
RawError: r.err.Error(),
|
||||||
|
}:
|
||||||
|
default:
|
||||||
|
}
|
||||||
|
return fmt.Errorf("firmware_upgrade failed: %w", r.err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 成功路徑:bridge 回 `{"status":"upgraded", "before_firmware":..., ...}`。
|
||||||
|
// stderr goroutine 應該已推過 done event;這裡補充 / 容錯:若 status
|
||||||
|
// 是 upgraded 但沒看到 stage=done event、補一個 done event 給 caller
|
||||||
|
// 確保終態被觀察到。
|
||||||
|
afterFw, _ := r.resp["after_firmware"].(string)
|
||||||
|
beforeFw, _ := r.resp["before_firmware"].(string)
|
||||||
|
method, _ := r.resp["method"].(string)
|
||||||
|
duration, _ := r.resp["duration_ms"].(float64) // JSON number
|
||||||
|
|
||||||
|
// 確保 caller 一定能看到一個 done event(即便 stderr 沒推到、或事件
|
||||||
|
// 已被吃掉)。重複 done 對 service 也無害(service forward 是冪等的)。
|
||||||
|
select {
|
||||||
|
case progressCh <- firmware.FirmwareProgress{
|
||||||
|
Stage: firmware.StageDone,
|
||||||
|
Percent: 100,
|
||||||
|
BeforeVersion: beforeFw,
|
||||||
|
AfterVersion: afterFw,
|
||||||
|
Method: method,
|
||||||
|
ElapsedMs: int64(duration),
|
||||||
|
}:
|
||||||
|
default:
|
||||||
|
}
|
||||||
|
|
||||||
|
// 更新 driver Info 的 firmware 字串
|
||||||
|
if afterFw != "" {
|
||||||
|
d.mu.Lock()
|
||||||
|
d.info.FirmwareVer = afterFw
|
||||||
|
d.mu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
360
local-tool/server/internal/driver/kneron/kl720_driver_test.go
Normal file
360
local-tool/server/internal/driver/kneron/kl720_driver_test.go
Normal file
@ -0,0 +1,360 @@
|
|||||||
|
package kneron
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bufio"
|
||||||
|
"context"
|
||||||
|
"io"
|
||||||
|
"sync"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"visiona-local/server/internal/driver"
|
||||||
|
"visiona-local/server/internal/firmware"
|
||||||
|
)
|
||||||
|
|
||||||
|
// testCtx 回傳 test 用的 ctx with 5 秒 timeout,加自動 cleanup。
|
||||||
|
func testCtx(t *testing.T) context.Context {
|
||||||
|
t.Helper()
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
t.Cleanup(cancel)
|
||||||
|
return ctx
|
||||||
|
}
|
||||||
|
|
||||||
|
// shortTimeout 回傳一個 500ms 的 timer channel、供 test 偵測非阻塞行為。
|
||||||
|
func shortTimeout() <-chan time.Time {
|
||||||
|
return time.After(500 * time.Millisecond)
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTryRouteFirmwareEvent_ValidProgress:valid firmware_progress JSON line
|
||||||
|
// 會被 unmarshal 並寫到 fwProgressCh。
|
||||||
|
func TestTryRouteFirmwareEvent_ValidProgress(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test"}, "/dev/null")
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 4)
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
|
||||||
|
line := `{"event":"firmware_progress","percent":50,"stage":"flashing","message":"writing KDP2","elapsed_ms":1500,"eta_ms":3000}`
|
||||||
|
if !d.tryRouteFirmwareEvent(line) {
|
||||||
|
t.Fatalf("expected tryRouteFirmwareEvent=true")
|
||||||
|
}
|
||||||
|
|
||||||
|
select {
|
||||||
|
case ev := <-ch:
|
||||||
|
if ev.Stage != firmware.StageFlashing {
|
||||||
|
t.Errorf("Stage = %q, want flashing", ev.Stage)
|
||||||
|
}
|
||||||
|
if ev.Percent != 50 {
|
||||||
|
t.Errorf("Percent = %d, want 50", ev.Percent)
|
||||||
|
}
|
||||||
|
if ev.Message != "writing KDP2" {
|
||||||
|
t.Errorf("Message = %q", ev.Message)
|
||||||
|
}
|
||||||
|
if ev.ElapsedMs != 1500 {
|
||||||
|
t.Errorf("ElapsedMs = %d, want 1500", ev.ElapsedMs)
|
||||||
|
}
|
||||||
|
if ev.EtaMs != 3000 {
|
||||||
|
t.Errorf("EtaMs = %d, want 3000", ev.EtaMs)
|
||||||
|
}
|
||||||
|
default:
|
||||||
|
t.Fatalf("no event on channel")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTryRouteFirmwareEvent_ErrorEvent:error event 帶 reason / raw_error /
|
||||||
|
// before_version 都要保留。
|
||||||
|
func TestTryRouteFirmwareEvent_ErrorEvent(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test"}, "/dev/null")
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 4)
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
|
||||||
|
line := `{"event":"firmware_progress","percent":-1,"stage":"error","message":"loader missing","elapsed_ms":800,"eta_ms":0,"error":"loader missing","reason":"loader_write_failed","raw_error":"_FwError loader","before_version":"KDP"}`
|
||||||
|
if !d.tryRouteFirmwareEvent(line) {
|
||||||
|
t.Fatalf("expected tryRouteFirmwareEvent=true")
|
||||||
|
}
|
||||||
|
|
||||||
|
ev := <-ch
|
||||||
|
if ev.Stage != firmware.StageError {
|
||||||
|
t.Errorf("Stage = %q, want error", ev.Stage)
|
||||||
|
}
|
||||||
|
if ev.Reason != firmware.ReasonLoaderWriteFailed {
|
||||||
|
t.Errorf("Reason = %q, want loader_write_failed", ev.Reason)
|
||||||
|
}
|
||||||
|
if ev.RawError != "_FwError loader" {
|
||||||
|
t.Errorf("RawError = %q", ev.RawError)
|
||||||
|
}
|
||||||
|
if ev.BeforeVersion != "KDP" {
|
||||||
|
t.Errorf("BeforeVersion = %q, want KDP", ev.BeforeVersion)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTryRouteFirmwareEvent_NoChannel:channel 未註冊時 return false、
|
||||||
|
// caller 應 fall back 到 broadcaster。
|
||||||
|
func TestTryRouteFirmwareEvent_NoChannel(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test"}, "/dev/null")
|
||||||
|
// 不註冊 channel
|
||||||
|
if d.tryRouteFirmwareEvent(`{"event":"firmware_progress","stage":"done","percent":100}`) {
|
||||||
|
t.Errorf("expected false when no channel registered")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTryRouteFirmwareEvent_NonFirmwareEvent:非 firmware_progress event 不 route。
|
||||||
|
func TestTryRouteFirmwareEvent_NonFirmwareEvent(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test"}, "/dev/null")
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 4)
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
|
||||||
|
// shutdown_rejected event 不 route
|
||||||
|
if d.tryRouteFirmwareEvent(`{"event":"shutdown_rejected","reason":"firmware_upgrade_in_progress"}`) {
|
||||||
|
t.Errorf("expected false for non-firmware event")
|
||||||
|
}
|
||||||
|
// 亂 JSON
|
||||||
|
if d.tryRouteFirmwareEvent(`{not json}`) {
|
||||||
|
t.Errorf("expected false for malformed JSON")
|
||||||
|
}
|
||||||
|
// 完全不是 JSON
|
||||||
|
if d.tryRouteFirmwareEvent(`[kneron_bridge] regular log line`) {
|
||||||
|
t.Errorf("expected false for plain log line")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTryRouteFirmwareEvent_ChannelFull:channel 滿 → 不 block、route 仍回 true。
|
||||||
|
func TestTryRouteFirmwareEvent_ChannelFull(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test"}, "/dev/null")
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 1)
|
||||||
|
// 先塞滿
|
||||||
|
ch <- firmware.FirmwareProgress{Stage: "x"}
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
|
||||||
|
line := `{"event":"firmware_progress","stage":"flashing","percent":50}`
|
||||||
|
// 應該回 true(route 函式視為已嘗試處理)、但不 block test
|
||||||
|
done := make(chan bool, 1)
|
||||||
|
go func() {
|
||||||
|
ok := d.tryRouteFirmwareEvent(line)
|
||||||
|
done <- ok
|
||||||
|
}()
|
||||||
|
// 不該卡住
|
||||||
|
select {
|
||||||
|
case ok := <-done:
|
||||||
|
if !ok {
|
||||||
|
t.Errorf("expected true (event was a firmware_progress event)")
|
||||||
|
}
|
||||||
|
case <-shortTimeout():
|
||||||
|
t.Fatalf("tryRouteFirmwareEvent should not block when channel is full")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSetFirmwareProgressCh_Unregister:設 nil 後恢復原狀。
|
||||||
|
func TestSetFirmwareProgressCh_Unregister(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test"}, "/dev/null")
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 1)
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
d.setFirmwareProgressCh(nil)
|
||||||
|
|
||||||
|
if d.tryRouteFirmwareEvent(`{"event":"firmware_progress","stage":"done","percent":100}`) {
|
||||||
|
t.Errorf("expected false after unregister")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestUpgradeFirmware_UnsupportedChip:A 階段限 KL520 / KL720。
|
||||||
|
func TestUpgradeFirmware_UnsupportedChip(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test", Type: "KL720"}, "/dev/null")
|
||||||
|
// 強制 pythonReady=true、避開 "bridge not running" 早退(測 chip check)
|
||||||
|
d.pythonReady = true
|
||||||
|
defer func() { d.pythonReady = false }()
|
||||||
|
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 4)
|
||||||
|
err := d.UpgradeFirmware(testCtx(t), "KL630", ch)
|
||||||
|
if err == nil {
|
||||||
|
t.Fatalf("expected error for unsupported chip")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestUpgradeFirmware_NoPythonBridge:未 Connect 時 UpgradeFirmware 應拒絕。
|
||||||
|
func TestUpgradeFirmware_NoPythonBridge(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "test", Type: "KL720"}, "/dev/null")
|
||||||
|
// 預設 pythonReady=false
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 4)
|
||||||
|
err := d.UpgradeFirmware(testCtx(t), "KL720", ch)
|
||||||
|
if err == nil {
|
||||||
|
t.Fatalf("expected error when python bridge not running")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 以下測試使用 in-memory pipes 模擬 bridge stdin / stdout、不真正起 Python。
|
||||||
|
//
|
||||||
|
// setupFakeBridge 把 driver 連到一對 io.Pipe、回 (stdinReader, stdoutWriter, close)。
|
||||||
|
// 測試代碼可從 stdinReader 讀 driver 寫出的 command JSON、用 stdoutWriter
|
||||||
|
// 模擬 bridge 回應或保持不寫(模擬卡住)。
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
type fakeBridge struct {
|
||||||
|
stdinR *io.PipeReader // driver 寫入端的讀方(test 端)
|
||||||
|
stdoutW *io.PipeWriter // driver 讀取端的寫方(test 端)
|
||||||
|
}
|
||||||
|
|
||||||
|
func setupFakeBridge(t *testing.T, d *KneronDriver) *fakeBridge {
|
||||||
|
t.Helper()
|
||||||
|
stdinR, stdinW := io.Pipe()
|
||||||
|
stdoutR, stdoutW := io.Pipe()
|
||||||
|
d.stdin = stdinW
|
||||||
|
d.stdout = bufio.NewScanner(stdoutR)
|
||||||
|
d.stdout.Buffer(make([]byte, 0, 64*1024), 1024*1024)
|
||||||
|
d.pythonReady = true
|
||||||
|
t.Cleanup(func() {
|
||||||
|
_ = stdinR.Close()
|
||||||
|
_ = stdinW.Close()
|
||||||
|
_ = stdoutR.Close()
|
||||||
|
_ = stdoutW.Close()
|
||||||
|
d.pythonReady = false
|
||||||
|
})
|
||||||
|
return &fakeBridge{stdinR: stdinR, stdoutW: stdoutW}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestUpgradeFirmware_InfoNotBlockedDuringUpgrade(Major 1 驗證)
|
||||||
|
//
|
||||||
|
// 升級期間 sendCommand goroutine 持 fwUpgradeMu 卡住等 stdout 回應、不可
|
||||||
|
// 阻塞 d.Info() 等 d.mu 操作。
|
||||||
|
func TestUpgradeFirmware_InfoNotBlockedDuringUpgrade(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "dev-x", Type: "KL720", Port: "USB"}, "/dev/null")
|
||||||
|
setupFakeBridge(t, d)
|
||||||
|
|
||||||
|
progressCh := make(chan firmware.FirmwareProgress, 16)
|
||||||
|
|
||||||
|
// UpgradeFirmware 在 goroutine 跑、sendCommand 會卡在 stdout.Scan 等 bridge 回應
|
||||||
|
upgradeCtx, cancelUpgrade := context.WithCancel(context.Background())
|
||||||
|
t.Cleanup(cancelUpgrade)
|
||||||
|
upgradeDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
upgradeDone <- d.UpgradeFirmware(upgradeCtx, "KL720", progressCh)
|
||||||
|
}()
|
||||||
|
|
||||||
|
// 等一下讓 sendCommand goroutine 進入 stdout.Scan blocking
|
||||||
|
time.Sleep(50 * time.Millisecond)
|
||||||
|
|
||||||
|
// Info 必須能立即回(不被 sendCommand 持鎖卡住)
|
||||||
|
infoDone := make(chan driver.DeviceInfo, 1)
|
||||||
|
go func() { infoDone <- d.Info() }()
|
||||||
|
|
||||||
|
select {
|
||||||
|
case info := <-infoDone:
|
||||||
|
if info.ID != "dev-x" {
|
||||||
|
t.Errorf("Info().ID = %q, want dev-x", info.ID)
|
||||||
|
}
|
||||||
|
case <-time.After(500 * time.Millisecond):
|
||||||
|
t.Fatalf("Info() blocked during firmware upgrade — Major 1 deadlock not fixed")
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsConnected 也不該被卡(即使 d.connected=false、它走 d.mu)
|
||||||
|
icDone := make(chan bool, 1)
|
||||||
|
go func() { icDone <- d.IsConnected() }()
|
||||||
|
select {
|
||||||
|
case <-icDone:
|
||||||
|
case <-time.After(500 * time.Millisecond):
|
||||||
|
t.Fatalf("IsConnected() blocked during firmware upgrade")
|
||||||
|
}
|
||||||
|
|
||||||
|
// 收尾:cancel ctx、讓 UpgradeFirmware 走 timeout 路徑(stopPython 殺 bridge)
|
||||||
|
cancelUpgrade()
|
||||||
|
select {
|
||||||
|
case <-upgradeDone:
|
||||||
|
case <-time.After(2 * time.Second):
|
||||||
|
t.Fatalf("UpgradeFirmware did not return after ctx cancel")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestUpgradeFirmware_CtxCancelReleasesBridge(Minor 2 驗證)
|
||||||
|
//
|
||||||
|
// ctx cancel 後 UpgradeFirmware 應在合理時間內 return;不能因 sendCommand
|
||||||
|
// goroutine 持鎖造成 d.mu.Lock(stopPython) deadlock。
|
||||||
|
func TestUpgradeFirmware_CtxCancelReleasesBridge(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "dev-x", Type: "KL520", Port: "USB"}, "/dev/null")
|
||||||
|
setupFakeBridge(t, d)
|
||||||
|
|
||||||
|
progressCh := make(chan firmware.FirmwareProgress, 16)
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
|
||||||
|
errCh := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
errCh <- d.UpgradeFirmware(ctx, "KL520", progressCh)
|
||||||
|
}()
|
||||||
|
|
||||||
|
// 等 sendCommand 進 stdout.Scan
|
||||||
|
time.Sleep(50 * time.Millisecond)
|
||||||
|
|
||||||
|
// cancel ctx、UpgradeFirmware 應走 ctx.Done branch、stopPython 殺 bridge、return
|
||||||
|
cancel()
|
||||||
|
|
||||||
|
select {
|
||||||
|
case err := <-errCh:
|
||||||
|
if err == nil {
|
||||||
|
t.Errorf("expected ctx-cancel error, got nil")
|
||||||
|
}
|
||||||
|
case <-time.After(2 * time.Second):
|
||||||
|
t.Fatalf("UpgradeFirmware did not return within 2s after ctx cancel (deadlock?)")
|
||||||
|
}
|
||||||
|
|
||||||
|
// timeout event 應該被推到 progressCh
|
||||||
|
select {
|
||||||
|
case ev := <-progressCh:
|
||||||
|
if ev.Stage != firmware.StageError {
|
||||||
|
t.Errorf("first event.Stage = %q, want error", ev.Stage)
|
||||||
|
}
|
||||||
|
if ev.Reason != firmware.ReasonTimeout {
|
||||||
|
t.Errorf("first event.Reason = %q, want timeout", ev.Reason)
|
||||||
|
}
|
||||||
|
case <-time.After(500 * time.Millisecond):
|
||||||
|
t.Fatalf("no timeout event pushed after ctx cancel")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestUpgradeFirmware_StderrEventAfterCtxCancel(Major 2 驗證)
|
||||||
|
//
|
||||||
|
// 模擬 stderr goroutine 在 service 已 close intermediate channel 後仍嘗試
|
||||||
|
// route event;fwMu + setFirmwareProgressCh(nil) 應保證不會 send on closed
|
||||||
|
// channel panic。
|
||||||
|
func TestUpgradeFirmware_StderrEventAfterCtxCancel(t *testing.T) {
|
||||||
|
d := NewKneronDriver(driver.DeviceInfo{ID: "dev-x", Type: "KL520"}, "/dev/null")
|
||||||
|
|
||||||
|
ch := make(chan firmware.FirmwareProgress, 4)
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
|
||||||
|
// 模擬 service 端流程:unregister → close channel
|
||||||
|
// 為了驗證 race window、用很多 goroutine 同時做 tryRouteFirmwareEvent
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
const N = 100
|
||||||
|
|
||||||
|
// 一組 goroutine 不斷試 route(模擬 stderr 上來的 inflight events)
|
||||||
|
stopRoute := make(chan struct{})
|
||||||
|
for i := 0; i < 8; i++ {
|
||||||
|
wg.Add(1)
|
||||||
|
go func() {
|
||||||
|
defer wg.Done()
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-stopRoute:
|
||||||
|
return
|
||||||
|
default:
|
||||||
|
_ = d.tryRouteFirmwareEvent(`{"event":"firmware_progress","stage":"flashing","percent":50}`)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
|
||||||
|
// 主流程:unregister → close
|
||||||
|
for i := 0; i < N; i++ {
|
||||||
|
// 暫停一下讓 route goroutine 抓到 ch
|
||||||
|
time.Sleep(10 * time.Microsecond)
|
||||||
|
d.setFirmwareProgressCh(nil) // 必須 happen-before close
|
||||||
|
close(ch)
|
||||||
|
|
||||||
|
// 換新 ch、再來一輪
|
||||||
|
ch = make(chan firmware.FirmwareProgress, 4)
|
||||||
|
d.setFirmwareProgressCh(ch)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 收工
|
||||||
|
close(stopRoute)
|
||||||
|
wg.Wait()
|
||||||
|
|
||||||
|
// 沒 panic 就算過
|
||||||
|
}
|
||||||
147
local-tool/server/internal/firmware/progress.go
Normal file
147
local-tool/server/internal/firmware/progress.go
Normal file
@ -0,0 +1,147 @@
|
|||||||
|
package firmware
|
||||||
|
|
||||||
|
import (
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Task 代表一個進行中的 firmware 升降版 task(TDD §6 + §8.6.1)。
|
||||||
|
// service 用 ProgressTracker 維護 deviceID → *Task、用來:
|
||||||
|
// 1. 拒絕同一 device 同時兩個 firmware task(§3.3 FW_DEVICE_BUSY)
|
||||||
|
// 2. 給 graceful shutdown handler 查 HasActiveTask(§8.6.1)
|
||||||
|
// 3. 給 control panel modal 顯示 active task 資訊(§8.6.2)
|
||||||
|
type Task struct {
|
||||||
|
ID string
|
||||||
|
DeviceID string
|
||||||
|
DeviceName string
|
||||||
|
Chip string
|
||||||
|
Direction string
|
||||||
|
StartTs time.Time
|
||||||
|
ProgressCh chan FirmwareProgress
|
||||||
|
|
||||||
|
mu sync.Mutex
|
||||||
|
stage string // 目前推到的 stage、tracker 內部維護
|
||||||
|
done bool
|
||||||
|
// cancel 是 task 的 context.CancelFunc、由 service.UpgradeFirmware 寫入。
|
||||||
|
//
|
||||||
|
// 目前 service 只在 runUpgrade 的 defer 內呼叫一次(避免 ctx leak)、
|
||||||
|
// 沒有外部 reader 主動呼叫。預留給未來 SIGTERM force-cancel 流程
|
||||||
|
// (TDD §8.6.3)使用:graceful shutdown 等不到 task 結束時、可逐
|
||||||
|
// task 呼叫 cancel() 提前釋放 driver 資源。
|
||||||
|
cancel func()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stage 回傳該 task 最後一次廣播的 stage(thread-safe)。
|
||||||
|
func (t *Task) Stage() string {
|
||||||
|
t.mu.Lock()
|
||||||
|
defer t.mu.Unlock()
|
||||||
|
return t.stage
|
||||||
|
}
|
||||||
|
|
||||||
|
func (t *Task) setStage(s string) {
|
||||||
|
t.mu.Lock()
|
||||||
|
t.stage = s
|
||||||
|
t.mu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Done 回傳 task 是否完成(thread-safe)。
|
||||||
|
func (t *Task) Done() bool {
|
||||||
|
t.mu.Lock()
|
||||||
|
defer t.mu.Unlock()
|
||||||
|
return t.done
|
||||||
|
}
|
||||||
|
|
||||||
|
func (t *Task) markDone() {
|
||||||
|
t.mu.Lock()
|
||||||
|
t.done = true
|
||||||
|
t.mu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// ProgressTracker 仿 flash.ProgressTracker 的 pattern、map[deviceID]*Task。
|
||||||
|
//
|
||||||
|
// 注意:key 是 deviceID 而非 taskID(每 device 同時只能有 1 個 firmware
|
||||||
|
// task、不像 flash 可以同 device 不同 model)。
|
||||||
|
type ProgressTracker struct {
|
||||||
|
tasks map[string]*Task
|
||||||
|
mu sync.Mutex
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewProgressTracker 建立空 tracker。
|
||||||
|
func NewProgressTracker() *ProgressTracker {
|
||||||
|
return &ProgressTracker{tasks: make(map[string]*Task)}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create 嘗試建立新 task。若同 deviceID 已有未完成的 task、回傳 nil
|
||||||
|
// (caller 應回 FW_DEVICE_BUSY 給上層)。
|
||||||
|
func (pt *ProgressTracker) Create(deviceID, deviceName, chip, direction string) *Task {
|
||||||
|
pt.mu.Lock()
|
||||||
|
defer pt.mu.Unlock()
|
||||||
|
|
||||||
|
if existing, ok := pt.tasks[deviceID]; ok && !existing.Done() {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
taskID := direction + "-" + deviceID + "-" + time.Now().UTC().Format("20060102T150405.000")
|
||||||
|
t := &Task{
|
||||||
|
ID: taskID,
|
||||||
|
DeviceID: deviceID,
|
||||||
|
DeviceName: deviceName,
|
||||||
|
Chip: chip,
|
||||||
|
Direction: direction,
|
||||||
|
StartTs: time.Now(),
|
||||||
|
ProgressCh: make(chan FirmwareProgress, 32),
|
||||||
|
stage: StagePreparing,
|
||||||
|
}
|
||||||
|
pt.tasks[deviceID] = t
|
||||||
|
return t
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get 取得指定 device 的 task(thread-safe)、沒有時回 nil。
|
||||||
|
func (pt *ProgressTracker) Get(deviceID string) *Task {
|
||||||
|
pt.mu.Lock()
|
||||||
|
defer pt.mu.Unlock()
|
||||||
|
return pt.tasks[deviceID]
|
||||||
|
}
|
||||||
|
|
||||||
|
// Remove 移除指定 device 的 task entry(caller 應在 progressCh 已 close
|
||||||
|
// 且讀者已完成消費後呼叫、否則前端可能讀不到尾端事件)。
|
||||||
|
func (pt *ProgressTracker) Remove(deviceID string) {
|
||||||
|
pt.mu.Lock()
|
||||||
|
defer pt.mu.Unlock()
|
||||||
|
delete(pt.tasks, deviceID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ActiveTasks 列出所有未完成 task 的快照(thread-safe)、給 graceful
|
||||||
|
// shutdown handler 用(TDD §8.6.1)。
|
||||||
|
func (pt *ProgressTracker) ActiveTasks() []*ActiveTaskInfo {
|
||||||
|
pt.mu.Lock()
|
||||||
|
defer pt.mu.Unlock()
|
||||||
|
|
||||||
|
out := make([]*ActiveTaskInfo, 0, len(pt.tasks))
|
||||||
|
for _, t := range pt.tasks {
|
||||||
|
if t.Done() {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
elapsed := time.Since(t.StartTs).Milliseconds()
|
||||||
|
info := &ActiveTaskInfo{
|
||||||
|
TaskID: t.ID,
|
||||||
|
DeviceID: t.DeviceID,
|
||||||
|
DeviceName: t.DeviceName,
|
||||||
|
Chip: t.Chip,
|
||||||
|
Direction: t.Direction,
|
||||||
|
Stage: t.Stage(),
|
||||||
|
StartTs: t.StartTs,
|
||||||
|
ElapsedMs: elapsed,
|
||||||
|
}
|
||||||
|
// ETA:粗估「該 chip 該 stage 還剩多少秒」、用 UpgradeTimeoutFor
|
||||||
|
// 減去 elapsed 作為上界(不精確、Design §9.6 文案已用 "~{n}s")。
|
||||||
|
timeout := UpgradeTimeoutFor(t.Chip)
|
||||||
|
remaining := int((timeout.Milliseconds() - elapsed) / 1000)
|
||||||
|
if remaining < 0 {
|
||||||
|
remaining = 0
|
||||||
|
}
|
||||||
|
info.EtaSeconds = remaining
|
||||||
|
out = append(out, info)
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
373
local-tool/server/internal/firmware/service.go
Normal file
373
local-tool/server/internal/firmware/service.go
Normal file
@ -0,0 +1,373 @@
|
|||||||
|
package firmware
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"visiona-local/server/internal/driver"
|
||||||
|
)
|
||||||
|
|
||||||
|
// 已預期的錯誤型別(caller 可走 errors.Is 識別、後續 M9-3 API handler 把
|
||||||
|
// 這些對應到 HTTP 錯誤碼、見 TDD §3.3)。
|
||||||
|
var (
|
||||||
|
// ErrDeviceBusy 對應 FW_DEVICE_BUSY(HTTP 409):同 device 已有未完成
|
||||||
|
// 的 firmware task、或 device status 在不可升級狀態。
|
||||||
|
ErrDeviceBusy = errors.New("firmware: device busy")
|
||||||
|
// ErrDeviceNotFound 對應 FW_VERSION_NOT_FOUND(HTTP 404):deviceID
|
||||||
|
// 在 device manager 找不到。
|
||||||
|
ErrDeviceNotFound = errors.New("firmware: device not found")
|
||||||
|
// ErrUnsupportedChip 對應 400:A 階段只接受 KL520 / KL720。
|
||||||
|
ErrUnsupportedChip = errors.New("firmware: unsupported chip")
|
||||||
|
// ErrUpgradeFailed 對應 FW_UPGRADE_FAILED(HTTP 500):bridge.py 回
|
||||||
|
// error 但 device 仍可用(recoverable)。
|
||||||
|
ErrUpgradeFailed = errors.New("firmware: upgrade failed")
|
||||||
|
// ErrUpgradeBrickRisk 對應 FW_UPGRADE_BRICK_RISK(HTTP 500):升級
|
||||||
|
// 期間 device disconnect 或 verify 失敗、可能損壞(unrecoverable)。
|
||||||
|
ErrUpgradeBrickRisk = errors.New("firmware: upgrade brick risk")
|
||||||
|
)
|
||||||
|
|
||||||
|
// brickRiskReasons 是「升到一半失敗、可能損壞」的 reason 集合(TDD §3.4
|
||||||
|
// 對應表第 5 / 7 種失敗)。其他 reason 視為 recoverable。
|
||||||
|
//
|
||||||
|
// Minor 1 註:ReasonTimeout 是否歸 brick 由 API handler M9-3 依 chip + elapsed
|
||||||
|
// 判斷(例:KL720 elapsed > 180s 視為 brick;KL520 不會走到這個情境)、不在
|
||||||
|
// service 層直接歸類。Service 保持 vendor-agnostic、只負責 forward。
|
||||||
|
var brickRiskReasons = map[string]struct{}{
|
||||||
|
ReasonVerifyMismatch: {},
|
||||||
|
ReasonDisconnectDuringOp: {},
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpgradeDriver 是 firmware service 對 driver 層的介面 contract。
|
||||||
|
//
|
||||||
|
// 實作方為 kneron.KneronDriver(M9-2 內加 method);test 用 mock 實作。
|
||||||
|
// 介面保持小:只暴露 service 需要的 method、不引入 device manager 依賴。
|
||||||
|
type UpgradeDriver interface {
|
||||||
|
// Info 回傳 driver 認知的 device 基本資訊(chipType / firmware 字串 /
|
||||||
|
// status 等)。service 不直接靠 Info().Status 判斷 busy(device 層
|
||||||
|
// status 可能與 firmware task 狀態不同步)、是用 ProgressTracker。
|
||||||
|
Info() driver.DeviceInfo
|
||||||
|
|
||||||
|
// UpgradeFirmware 觸發 bridge.py firmware_upgrade、把 stderr 上來的
|
||||||
|
// firmware_progress JSON 轉成 FirmwareProgress 推到 progressCh。
|
||||||
|
//
|
||||||
|
// ctx:service 端的 timeout / cancel context、driver 應在 ctx.Done()
|
||||||
|
// 時主動 kill bridge subprocess、避免 goroutine leak。
|
||||||
|
// chip:必為 KL520 / KL720(A 階段、TDD §6.1)。
|
||||||
|
// progressCh:service 提供、driver 只寫不 close(close 由 service 負責)。
|
||||||
|
//
|
||||||
|
// 回傳 error 表示終態失敗(recoverable 或 brick)。成功時 driver 應該
|
||||||
|
// 已經把 done event 推到 progressCh、回 nil。
|
||||||
|
UpgradeFirmware(ctx context.Context, chip string, progressCh chan<- FirmwareProgress) error
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeviceLookup 是 firmware service 對 device manager 的介面 contract。
|
||||||
|
// 測試時 mock 之、避免拉整個 device.Manager 依賴鏈進來。
|
||||||
|
type DeviceLookup interface {
|
||||||
|
GetUpgradeDriver(deviceID string) (UpgradeDriver, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// FirmwareDir 描述 bundled firmware 檔的根目錄、供 ListBundledVersions 用。
|
||||||
|
//
|
||||||
|
// A 階段(M9-2)目錄結構:firmware/KL520/fw_*.bin、firmware/KL720/fw_*.bin
|
||||||
|
//(扁平結構、無 CURRENT_VERSION)。B2 階段(M9-11)會升級成多版本。
|
||||||
|
type FirmwareDir struct {
|
||||||
|
// Root 是 server/scripts/firmware/ 的絕對路徑、由 caller(main.go)注入。
|
||||||
|
Root string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Service 是 firmware 升降版的中央 orchestrator(TDD §6.1)。
|
||||||
|
//
|
||||||
|
// 職責:
|
||||||
|
// 1. 收 API 層的 UpgradeFirmware 呼叫、查 device、呼叫 driver、廣播 progress
|
||||||
|
// 2. 拒絕同 device 重複 task、graceful shutdown 拒絕(§8.6)
|
||||||
|
// 3. timeout 護欄(外層、避免 driver 卡死)
|
||||||
|
// 4. 列舉 bundled firmware 版本(A 階段最簡實作)
|
||||||
|
//
|
||||||
|
// 不負責:HTTP handler / WebSocket broadcast — M9-3 串接。
|
||||||
|
type Service struct {
|
||||||
|
deviceLookup DeviceLookup
|
||||||
|
tracker *ProgressTracker
|
||||||
|
fwDir FirmwareDir
|
||||||
|
|
||||||
|
// shutdownMu 保護「shutdown 進行中時不再接新 task」。
|
||||||
|
shutdownMu sync.RWMutex
|
||||||
|
shutdownReq bool
|
||||||
|
|
||||||
|
// taskWg 追蹤所有進行中的 task goroutine、graceful shutdown 時 Wait。
|
||||||
|
taskWg sync.WaitGroup
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewService 建立 firmware service。
|
||||||
|
func NewService(deviceLookup DeviceLookup, fwDir FirmwareDir) *Service {
|
||||||
|
return &Service{
|
||||||
|
deviceLookup: deviceLookup,
|
||||||
|
tracker: NewProgressTracker(),
|
||||||
|
fwDir: fwDir,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpgradeFirmware 啟動 device 的 firmware 升級流程(A 階段:自動 KDP1 → KDP2、
|
||||||
|
// KL520 / KL720)。
|
||||||
|
//
|
||||||
|
// 流程(TDD §5.1):
|
||||||
|
// 1. 驗 chip / device 存在 / 沒有重複 task / 沒在 shutdown 中
|
||||||
|
// 2. 在 tracker 註冊 task
|
||||||
|
// 3. spawn goroutine:呼叫 driver.UpgradeFirmware(ctx 帶 timeout)
|
||||||
|
// ├── driver 推 progress events 到 task.ProgressCh
|
||||||
|
// ├── service 同步把 events 廣播給訂閱者(M9-3 wire 到 WebSocket)
|
||||||
|
// └── 終態 → markDone、close channel、taskWg.Done
|
||||||
|
// 4. 立即回 taskID + progressCh 給 caller
|
||||||
|
//
|
||||||
|
// 注意:本 method 不阻塞、回後立刻接受下一個請求;實際升級在 goroutine
|
||||||
|
// 跑。caller 應拿 progressCh 消費完所有 events 再呼叫 CleanupTask。
|
||||||
|
func (s *Service) UpgradeFirmware(ctx context.Context, deviceID, chip string) (string, <-chan FirmwareProgress, error) {
|
||||||
|
// 1. shutdown 中拒絕
|
||||||
|
s.shutdownMu.RLock()
|
||||||
|
if s.shutdownReq {
|
||||||
|
s.shutdownMu.RUnlock()
|
||||||
|
return "", nil, fmt.Errorf("%w: server is shutting down", ErrDeviceBusy)
|
||||||
|
}
|
||||||
|
s.shutdownMu.RUnlock()
|
||||||
|
|
||||||
|
// 2. chip 驗證(A 階段 only KL520 / KL720)
|
||||||
|
if !SupportedUpgradeChip(chip) {
|
||||||
|
return "", nil, fmt.Errorf("%w: %q (A 階段僅支援 KL520 / KL720)", ErrUnsupportedChip, chip)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. device 查找
|
||||||
|
drv, err := s.deviceLookup.GetUpgradeDriver(deviceID)
|
||||||
|
if err != nil {
|
||||||
|
return "", nil, fmt.Errorf("%w: %s: %v", ErrDeviceNotFound, deviceID, err)
|
||||||
|
}
|
||||||
|
info := drv.Info()
|
||||||
|
|
||||||
|
// 4. tracker 註冊(同 device 重複 task → 拒絕)
|
||||||
|
task := s.tracker.Create(deviceID, info.Name, chip, DirectionUpgrade)
|
||||||
|
if task == nil {
|
||||||
|
return "", nil, fmt.Errorf("%w: device %s already has an active firmware task", ErrDeviceBusy, deviceID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 5. 建外層 ctx with timeout(內層 driver 各自還有自己的 SDK timeout、
|
||||||
|
// 這是雙保險、避免 driver bug 導致 goroutine 永遠卡住)
|
||||||
|
chipTimeout := UpgradeTimeoutFor(chip)
|
||||||
|
taskCtx, cancel := context.WithTimeout(ctx, chipTimeout+30*time.Second) // +30s margin 給 verify 階段
|
||||||
|
task.cancel = cancel
|
||||||
|
|
||||||
|
s.taskWg.Add(1)
|
||||||
|
go s.runUpgrade(taskCtx, cancel, task, drv, chip, info.FirmwareVer)
|
||||||
|
|
||||||
|
return task.ID, task.ProgressCh, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// runUpgrade 是 task goroutine、由 UpgradeFirmware spawn。
|
||||||
|
//
|
||||||
|
// - driver.UpgradeFirmware 是 blocking call、內部會推 progress events 到
|
||||||
|
// intermediateCh、service 把它們轉成 FirmwareProgress(補 deviceID)
|
||||||
|
// 後寫到 task.ProgressCh。
|
||||||
|
// - driver 回 error 時、若終態 event 還沒推(例如 ctx 超時被 service
|
||||||
|
// 強制 cancel)、service 在這裡補一個 error event。
|
||||||
|
// - 終態(done / error)後 close task.ProgressCh、markDone、taskWg.Done。
|
||||||
|
//
|
||||||
|
// caller(service)必須在 close 後才允許讀者繼續消費既有 buffered events
|
||||||
|
// 再呼叫 CleanupTask 移除 tracker entry。
|
||||||
|
func (s *Service) runUpgrade(
|
||||||
|
ctx context.Context,
|
||||||
|
cancel context.CancelFunc,
|
||||||
|
task *Task,
|
||||||
|
drv UpgradeDriver,
|
||||||
|
chip string,
|
||||||
|
beforeVersion string,
|
||||||
|
) {
|
||||||
|
defer s.taskWg.Done()
|
||||||
|
defer cancel() // 確保 context 不洩漏
|
||||||
|
defer task.markDone()
|
||||||
|
defer close(task.ProgressCh)
|
||||||
|
|
||||||
|
// 結論:保留中介 channel pattern、清晰勝過 micro-optimization。
|
||||||
|
//
|
||||||
|
// 推論:driver 推 event 時已填 Stage / Percent / Message / ElapsedMs / EtaMs;
|
||||||
|
// DeviceID / Direction / BeforeVersion 由 service 在 forward loop 補(driver
|
||||||
|
// 不知道 service 層的 deviceID 命名、Direction 由 task 決定)。原本可以讓
|
||||||
|
// driver 直接寫 task.ProgressCh 省一道 channel 複製、但補欄位邏輯會散到
|
||||||
|
// driver 端、不乾淨;中介 channel 讓 service 集中補欄位、勝過 micro-opt。
|
||||||
|
intermediate := make(chan FirmwareProgress, 32)
|
||||||
|
driverDone := make(chan error, 1)
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
driverDone <- drv.UpgradeFirmware(ctx, chip, intermediate)
|
||||||
|
close(intermediate)
|
||||||
|
}()
|
||||||
|
|
||||||
|
// forward loop:補欄位 + 更新 task.stage、轉到 task.ProgressCh
|
||||||
|
//
|
||||||
|
// Suggestion 1 修法:對 StageDone 去重——driver 端「sendCommand 成功補
|
||||||
|
// done event」與 bridge stderr 推的 done event 會雙保險、可能重複。
|
||||||
|
// 重複 done 雖然對 service 無害、但傳到前端可能跑兩次 cleanup、改在這裡
|
||||||
|
// 一次清掉、單一真相來源。
|
||||||
|
var lastStage string
|
||||||
|
var seenDone bool
|
||||||
|
for ev := range intermediate {
|
||||||
|
if ev.Stage == StageDone && seenDone {
|
||||||
|
// 已 forward 過 done、忽略後續重複 done(典型情境:stderr push 完
|
||||||
|
// done 後 sendCommand 成功又補一發 done、見 kl720_driver.go 925-937)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
// 補欄位
|
||||||
|
ev.DeviceID = task.DeviceID
|
||||||
|
if ev.Direction == "" {
|
||||||
|
ev.Direction = task.Direction
|
||||||
|
}
|
||||||
|
if ev.BeforeVersion == "" {
|
||||||
|
ev.BeforeVersion = beforeVersion
|
||||||
|
}
|
||||||
|
if ev.ElapsedMs == 0 {
|
||||||
|
ev.ElapsedMs = time.Since(task.StartTs).Milliseconds()
|
||||||
|
}
|
||||||
|
task.setStage(ev.Stage)
|
||||||
|
lastStage = ev.Stage
|
||||||
|
if ev.Stage == StageDone {
|
||||||
|
seenDone = true
|
||||||
|
}
|
||||||
|
task.ProgressCh <- ev
|
||||||
|
}
|
||||||
|
|
||||||
|
// driver goroutine 終態
|
||||||
|
err := <-driverDone
|
||||||
|
|
||||||
|
// driver 沒推終態 event(例如 ctx 超時 / panic)→ 補一個 error event
|
||||||
|
if err != nil && lastStage != StageDone && lastStage != StageError {
|
||||||
|
reason := ReasonUpgradeMidFailed
|
||||||
|
if errors.Is(err, context.DeadlineExceeded) {
|
||||||
|
reason = ReasonTimeout
|
||||||
|
} else if errors.Is(err, context.Canceled) {
|
||||||
|
reason = ReasonDisconnectDuringOp
|
||||||
|
}
|
||||||
|
task.setStage(StageError)
|
||||||
|
task.ProgressCh <- FirmwareProgress{
|
||||||
|
DeviceID: task.DeviceID,
|
||||||
|
Stage: StageError,
|
||||||
|
Percent: -1,
|
||||||
|
Direction: task.Direction,
|
||||||
|
Error: err.Error(),
|
||||||
|
Reason: reason,
|
||||||
|
RawError: err.Error(),
|
||||||
|
BeforeVersion: beforeVersion,
|
||||||
|
ElapsedMs: time.Since(task.StartTs).Milliseconds(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// CleanupTask 在 caller 消費完 progressCh 後移除 tracker entry。
|
||||||
|
func (s *Service) CleanupTask(deviceID string) {
|
||||||
|
s.tracker.Remove(deviceID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// HasActiveTask 回傳是否有任一 device 在進行 firmware task(TDD §8.6.1)。
|
||||||
|
// graceful shutdown handler(M9-2 範圍外、由 main.go signal handler 串接)
|
||||||
|
// 用此判斷是否延遲 SIGTERM。
|
||||||
|
func (s *Service) HasActiveTask() bool {
|
||||||
|
return len(s.tracker.ActiveTasks()) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActiveTaskInfo 回傳所有進行中 task 的快照、給 control panel modal 顯示
|
||||||
|
// 「韌體切換進行中、device X、剩 Y 秒」(TDD §8.6.2)。
|
||||||
|
//
|
||||||
|
// 注意:回傳 slice 是 snapshot copy、caller 可安全持有、不會被後續變動。
|
||||||
|
func (s *Service) GetActiveTaskInfo() []*ActiveTaskInfo {
|
||||||
|
return s.tracker.ActiveTasks()
|
||||||
|
}
|
||||||
|
|
||||||
|
// RequestShutdown 設定 shutdown 旗標、之後新 task 都會被拒絕。caller 接著
|
||||||
|
// 應呼叫 WaitForActiveTasks 等到既有 task 結束(或 timeout)。
|
||||||
|
func (s *Service) RequestShutdown() {
|
||||||
|
s.shutdownMu.Lock()
|
||||||
|
s.shutdownReq = true
|
||||||
|
s.shutdownMu.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// WaitForActiveTasks 等所有進行中的 task 結束、最多等 maxWait。
|
||||||
|
// 回傳 true 表示乾淨結束、false 表示 timeout 仍有 task 卡住(caller 應強制
|
||||||
|
// 走 shutdown、log 警告、TDD §8.6.1)。
|
||||||
|
func (s *Service) WaitForActiveTasks(maxWait time.Duration) bool {
|
||||||
|
done := make(chan struct{})
|
||||||
|
go func() {
|
||||||
|
s.taskWg.Wait()
|
||||||
|
close(done)
|
||||||
|
}()
|
||||||
|
select {
|
||||||
|
case <-done:
|
||||||
|
return true
|
||||||
|
case <-time.After(maxWait):
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListBundledVersions 列出 chip 對應的 bundled firmware 版本(TDD §4.4)。
|
||||||
|
//
|
||||||
|
// A 階段(M9-2)扁平結構:fwDir/<chip>/fw_*.bin、只有單一版本、且不含
|
||||||
|
// CURRENT_VERSION metadata。本實作回傳「current」單一版本(IsCurrent=true、
|
||||||
|
// Version="current"、Notes="A 階段扁平結構")、避免 M9-3 API handler 端
|
||||||
|
// 還要為「列舉空」做特例處理。
|
||||||
|
//
|
||||||
|
// B2 階段(M9-11)會擴展成讀 CURRENT_VERSION + 列舉子目錄。
|
||||||
|
func (s *Service) ListBundledVersions(chip string) ([]FirmwareVersion, error) {
|
||||||
|
if !SupportedUpgradeChip(chip) {
|
||||||
|
return nil, fmt.Errorf("%w: %q", ErrUnsupportedChip, chip)
|
||||||
|
}
|
||||||
|
|
||||||
|
// A 階段:只要 firmware 檔在、就視為有 current 版本可升
|
||||||
|
// (不檢查具體檔案存在性、那是 bridge.py 在升級時的事)
|
||||||
|
if s.fwDir.Root == "" {
|
||||||
|
return nil, fmt.Errorf("firmware directory not configured")
|
||||||
|
}
|
||||||
|
|
||||||
|
chipDir := filepath.Join(s.fwDir.Root, chip)
|
||||||
|
|
||||||
|
// Suggestion 3:lightly check chip 目錄存在、區分「chip 不支援」vs
|
||||||
|
// 「chip 支援但 firmware 漏 build」。後者 API handler 應回 500 或在
|
||||||
|
// 安裝包 release notes 標記、不該無聲回 current bundled。
|
||||||
|
if _, err := os.Stat(chipDir); err != nil {
|
||||||
|
if os.IsNotExist(err) {
|
||||||
|
return nil, fmt.Errorf("firmware not bundled for chip %q (missing %s)", chip, chipDir)
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("firmware dir stat failed for chip %q: %w", chip, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return []FirmwareVersion{
|
||||||
|
{
|
||||||
|
Version: "current",
|
||||||
|
DisplayName: chip + " current (A 階段 bundled)",
|
||||||
|
IsCurrent: true,
|
||||||
|
IsBundled: true,
|
||||||
|
Notes: "A 階段扁平結構、firmware path: " + chipDir,
|
||||||
|
},
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetCurrentVersion 回傳該 device 目前回報的 firmware 版本(取自 driver.Info)。
|
||||||
|
// 注意這是「device 上實際跑的」版本、不是「bundled current」版本。
|
||||||
|
// M9-3 API handler 可用這個 + ListBundledVersions 對照、決定 firmwareIsLegacy /
|
||||||
|
// firmwareCanUpgrade 衍生欄位。
|
||||||
|
func (s *Service) GetCurrentVersion(deviceID string) (FirmwareVersion, error) {
|
||||||
|
drv, err := s.deviceLookup.GetUpgradeDriver(deviceID)
|
||||||
|
if err != nil {
|
||||||
|
return FirmwareVersion{}, fmt.Errorf("%w: %s: %v", ErrDeviceNotFound, deviceID, err)
|
||||||
|
}
|
||||||
|
info := drv.Info()
|
||||||
|
ver := strings.TrimSpace(info.FirmwareVer)
|
||||||
|
if ver == "" {
|
||||||
|
ver = "unknown"
|
||||||
|
}
|
||||||
|
return FirmwareVersion{
|
||||||
|
Version: ver,
|
||||||
|
DisplayName: ver,
|
||||||
|
IsCurrent: true,
|
||||||
|
IsBundled: false,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
676
local-tool/server/internal/firmware/service_test.go
Normal file
676
local-tool/server/internal/firmware/service_test.go
Normal file
@ -0,0 +1,676 @@
|
|||||||
|
package firmware
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"sync"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"visiona-local/server/internal/driver"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// Mocks
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// fakeDriver 是 UpgradeDriver 的測試實作、可程式化 progress 序列與終態 error。
|
||||||
|
type fakeDriver struct {
|
||||||
|
info driver.DeviceInfo
|
||||||
|
|
||||||
|
// scripted progress events:runUpgrade 收到 ctx.Done 之前會依序 push
|
||||||
|
events []FirmwareProgress
|
||||||
|
// 每個 event 之間的 delay
|
||||||
|
delay time.Duration
|
||||||
|
// 最終回傳 error
|
||||||
|
finalErr error
|
||||||
|
|
||||||
|
// 觀察:實際被呼叫的 chip / call count
|
||||||
|
mu sync.Mutex
|
||||||
|
calledChip string
|
||||||
|
callCount int
|
||||||
|
gotCtxDone bool
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeDriver) Info() driver.DeviceInfo { return f.info }
|
||||||
|
|
||||||
|
func (f *fakeDriver) UpgradeFirmware(ctx context.Context, chip string, progressCh chan<- FirmwareProgress) error {
|
||||||
|
f.mu.Lock()
|
||||||
|
f.calledChip = chip
|
||||||
|
f.callCount++
|
||||||
|
f.mu.Unlock()
|
||||||
|
|
||||||
|
for _, ev := range f.events {
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
f.mu.Lock()
|
||||||
|
f.gotCtxDone = true
|
||||||
|
f.mu.Unlock()
|
||||||
|
return ctx.Err()
|
||||||
|
case <-time.After(f.delay):
|
||||||
|
}
|
||||||
|
select {
|
||||||
|
case progressCh <- ev:
|
||||||
|
case <-ctx.Done():
|
||||||
|
f.mu.Lock()
|
||||||
|
f.gotCtxDone = true
|
||||||
|
f.mu.Unlock()
|
||||||
|
return ctx.Err()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return f.finalErr
|
||||||
|
}
|
||||||
|
|
||||||
|
// fakeLookup 將 deviceID 映射到 fakeDriver。
|
||||||
|
type fakeLookup struct {
|
||||||
|
drivers map[string]UpgradeDriver
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *fakeLookup) GetUpgradeDriver(deviceID string) (UpgradeDriver, error) {
|
||||||
|
d, ok := l.drivers[deviceID]
|
||||||
|
if !ok {
|
||||||
|
return nil, errors.New("no such device")
|
||||||
|
}
|
||||||
|
return d, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func newServiceWith(drv UpgradeDriver, deviceID string) *Service {
|
||||||
|
return NewService(
|
||||||
|
&fakeLookup{drivers: map[string]UpgradeDriver{deviceID: drv}},
|
||||||
|
FirmwareDir{Root: "/tmp/test-firmware"},
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 1. 5-stage 成功路徑:events 都被 forward、driver 端 chip 收到、終態 done
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_SuccessFiveStages(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-1", Name: "KL520-A", FirmwareVer: "KDP"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5, Message: "scan"},
|
||||||
|
{Stage: StageLoading, Percent: 20, Message: "loader"},
|
||||||
|
{Stage: StageFlashing, Percent: 50, Message: "kdp2"},
|
||||||
|
{Stage: StageVerifying, Percent: 90, Message: "verify"},
|
||||||
|
{Stage: StageDone, Percent: 100, AfterVersion: "2.2.0"},
|
||||||
|
},
|
||||||
|
delay: 1 * time.Millisecond,
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
taskID, ch, err := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpgradeFirmware error: %v", err)
|
||||||
|
}
|
||||||
|
if taskID == "" {
|
||||||
|
t.Fatalf("expected non-empty taskID")
|
||||||
|
}
|
||||||
|
|
||||||
|
got := drainProgress(t, ch, 5*time.Second)
|
||||||
|
|
||||||
|
if len(got) != 5 {
|
||||||
|
t.Fatalf("expected 5 events, got %d: %+v", len(got), got)
|
||||||
|
}
|
||||||
|
wantStages := []string{StagePreparing, StageLoading, StageFlashing, StageVerifying, StageDone}
|
||||||
|
for i, ev := range got {
|
||||||
|
if ev.Stage != wantStages[i] {
|
||||||
|
t.Errorf("ev[%d].Stage = %q, want %q", i, ev.Stage, wantStages[i])
|
||||||
|
}
|
||||||
|
if ev.DeviceID != "dev-1" {
|
||||||
|
t.Errorf("ev[%d].DeviceID = %q, want dev-1", i, ev.DeviceID)
|
||||||
|
}
|
||||||
|
if ev.Direction != DirectionUpgrade {
|
||||||
|
t.Errorf("ev[%d].Direction = %q, want upgrade", i, ev.Direction)
|
||||||
|
}
|
||||||
|
if ev.BeforeVersion != "KDP" {
|
||||||
|
t.Errorf("ev[%d].BeforeVersion = %q, want KDP", i, ev.BeforeVersion)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// driver 收到 chip
|
||||||
|
if drv.calledChip != ChipKL520 {
|
||||||
|
t.Errorf("driver chip = %q, want KL520", drv.calledChip)
|
||||||
|
}
|
||||||
|
if drv.callCount != 1 {
|
||||||
|
t.Errorf("driver callCount = %d, want 1", drv.callCount)
|
||||||
|
}
|
||||||
|
|
||||||
|
// task 已 done(tracker 不再 ActiveTasks 列出)
|
||||||
|
svc.WaitForActiveTasks(2 * time.Second)
|
||||||
|
if svc.HasActiveTask() {
|
||||||
|
t.Errorf("HasActiveTask still true after done")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 2. 失敗路徑:各種 reason 都正確 forward
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_FailureReasons(t *testing.T) {
|
||||||
|
failureCases := []struct {
|
||||||
|
name string
|
||||||
|
reason string
|
||||||
|
stage string
|
||||||
|
}{
|
||||||
|
{"scan_not_found", ReasonScanNotFound, StagePreparing},
|
||||||
|
{"connect_failed", ReasonConnectFailed, StagePreparing},
|
||||||
|
{"loader_write_failed", ReasonLoaderWriteFailed, StageLoading},
|
||||||
|
{"upgrade_mid_failed", ReasonUpgradeMidFailed, StageFlashing},
|
||||||
|
{"verify_mismatch", ReasonVerifyMismatch, StageVerifying},
|
||||||
|
{"verify_not_found", ReasonVerifyNotFound, StageVerifying},
|
||||||
|
{"disconnect_during_op", ReasonDisconnectDuringOp, StageFlashing},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tc := range failureCases {
|
||||||
|
t.Run(tc.name, func(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-x", FirmwareVer: "KDP"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{
|
||||||
|
Stage: StageError,
|
||||||
|
Percent: -1,
|
||||||
|
Error: "simulated " + tc.reason,
|
||||||
|
Reason: tc.reason,
|
||||||
|
RawError: "raw " + tc.reason,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
finalErr: errors.New("bridge failed: " + tc.reason),
|
||||||
|
delay: 1 * time.Millisecond,
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-x")
|
||||||
|
_, ch, err := svc.UpgradeFirmware(context.Background(), "dev-x", ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpgradeFirmware setup error: %v", err)
|
||||||
|
}
|
||||||
|
got := drainProgress(t, ch, 5*time.Second)
|
||||||
|
|
||||||
|
// 找到 error event
|
||||||
|
var errEv *FirmwareProgress
|
||||||
|
for i := range got {
|
||||||
|
if got[i].Stage == StageError {
|
||||||
|
errEv = &got[i]
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if errEv == nil {
|
||||||
|
t.Fatalf("no error event in %+v", got)
|
||||||
|
}
|
||||||
|
if errEv.Reason != tc.reason {
|
||||||
|
t.Errorf("Reason = %q, want %q", errEv.Reason, tc.reason)
|
||||||
|
}
|
||||||
|
if errEv.Percent != -1 {
|
||||||
|
t.Errorf("Percent = %d, want -1", errEv.Percent)
|
||||||
|
}
|
||||||
|
if errEv.DeviceID != "dev-x" {
|
||||||
|
t.Errorf("DeviceID 應被補上、got %q", errEv.DeviceID)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 3. Timeout:service 端外層 context 超時、driver 收到 ctx.Done
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_ContextTimeout(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-slow"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{Stage: StageLoading, Percent: 20},
|
||||||
|
},
|
||||||
|
delay: 200 * time.Millisecond, // 每個 event 200ms,2 個 events ≈ 400ms
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-slow")
|
||||||
|
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
_, ch, err := svc.UpgradeFirmware(ctx, "dev-slow", ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpgradeFirmware setup error: %v", err)
|
||||||
|
}
|
||||||
|
got := drainProgress(t, ch, 3*time.Second)
|
||||||
|
|
||||||
|
// 應該有一個 timeout error event
|
||||||
|
var sawTimeout bool
|
||||||
|
for _, ev := range got {
|
||||||
|
if ev.Stage == StageError && ev.Reason == ReasonTimeout {
|
||||||
|
sawTimeout = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !sawTimeout {
|
||||||
|
t.Errorf("expected timeout error event in %+v", got)
|
||||||
|
}
|
||||||
|
|
||||||
|
// driver 應該觀察到 ctx.Done
|
||||||
|
drv.mu.Lock()
|
||||||
|
defer drv.mu.Unlock()
|
||||||
|
if !drv.gotCtxDone {
|
||||||
|
t.Errorf("driver should have observed ctx.Done")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 4. Mutex:同 device 同時兩個 upgrade → 第二個拒絕(FW_DEVICE_BUSY)
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_DeviceBusy(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-1"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{Stage: StageDone, Percent: 100, AfterVersion: "2.2.0"},
|
||||||
|
},
|
||||||
|
delay: 100 * time.Millisecond, // 給時間讓第一個還沒做完時第二個就來
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
// 第一個 task
|
||||||
|
_, ch1, err := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("first UpgradeFirmware error: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 第二個 task 應該被拒絕(device busy)
|
||||||
|
_, _, err2 := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if err2 == nil {
|
||||||
|
t.Fatalf("second UpgradeFirmware should fail with ErrDeviceBusy")
|
||||||
|
}
|
||||||
|
if !errors.Is(err2, ErrDeviceBusy) {
|
||||||
|
t.Errorf("second error = %v, want ErrDeviceBusy", err2)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 等第一個 task 結束、確認 channel 收到 done
|
||||||
|
got := drainProgress(t, ch1, 5*time.Second)
|
||||||
|
if len(got) == 0 {
|
||||||
|
t.Fatalf("first task got no events")
|
||||||
|
}
|
||||||
|
if got[len(got)-1].Stage != StageDone {
|
||||||
|
t.Errorf("last event stage = %q, want done", got[len(got)-1].Stage)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 第一個結束後、可以再 start 一個(同 device)
|
||||||
|
svc.WaitForActiveTasks(2 * time.Second)
|
||||||
|
svc.CleanupTask("dev-1")
|
||||||
|
_, _, err3 := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if err3 != nil {
|
||||||
|
t.Errorf("third UpgradeFirmware after cleanup should succeed, got: %v", err3)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 5. HasActiveTask / GetActiveTaskInfo
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestHasActiveTask_AndActiveTaskInfo(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-1", Name: "KL720-Z", FirmwareVer: "KDP"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{Stage: StageLoading, Percent: 20},
|
||||||
|
{Stage: StageDone, Percent: 100, AfterVersion: "2.2.0"},
|
||||||
|
},
|
||||||
|
delay: 80 * time.Millisecond,
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
if svc.HasActiveTask() {
|
||||||
|
t.Fatalf("HasActiveTask should be false initially")
|
||||||
|
}
|
||||||
|
if len(svc.GetActiveTaskInfo()) != 0 {
|
||||||
|
t.Fatalf("GetActiveTaskInfo should be empty initially")
|
||||||
|
}
|
||||||
|
|
||||||
|
_, ch, err := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL720)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpgradeFirmware error: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 等 progress 推一個出來、確保 task 已在 active 狀態
|
||||||
|
select {
|
||||||
|
case <-ch:
|
||||||
|
case <-time.After(2 * time.Second):
|
||||||
|
t.Fatalf("no progress received")
|
||||||
|
}
|
||||||
|
|
||||||
|
if !svc.HasActiveTask() {
|
||||||
|
t.Errorf("HasActiveTask should be true while upgrading")
|
||||||
|
}
|
||||||
|
infos := svc.GetActiveTaskInfo()
|
||||||
|
if len(infos) != 1 {
|
||||||
|
t.Fatalf("ActiveTaskInfo len = %d, want 1", len(infos))
|
||||||
|
}
|
||||||
|
info := infos[0]
|
||||||
|
if info.DeviceID != "dev-1" {
|
||||||
|
t.Errorf("DeviceID = %q, want dev-1", info.DeviceID)
|
||||||
|
}
|
||||||
|
if info.DeviceName != "KL720-Z" {
|
||||||
|
t.Errorf("DeviceName = %q, want KL720-Z", info.DeviceName)
|
||||||
|
}
|
||||||
|
if info.Chip != ChipKL720 {
|
||||||
|
t.Errorf("Chip = %q, want KL720", info.Chip)
|
||||||
|
}
|
||||||
|
if info.Direction != DirectionUpgrade {
|
||||||
|
t.Errorf("Direction = %q, want upgrade", info.Direction)
|
||||||
|
}
|
||||||
|
if info.EtaSeconds <= 0 || info.EtaSeconds > 200 {
|
||||||
|
t.Errorf("EtaSeconds = %d, want 0 < n <= 200", info.EtaSeconds)
|
||||||
|
}
|
||||||
|
|
||||||
|
// drain rest
|
||||||
|
drainProgress(t, ch, 5*time.Second)
|
||||||
|
svc.WaitForActiveTasks(2 * time.Second)
|
||||||
|
if svc.HasActiveTask() {
|
||||||
|
t.Errorf("HasActiveTask should be false after task done")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 6. UnsupportedChip:KL630 / KL730 / 亂值 → ErrUnsupportedChip
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_UnsupportedChip(t *testing.T) {
|
||||||
|
drv := &fakeDriver{info: driver.DeviceInfo{ID: "dev-1"}}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
for _, chip := range []string{"KL630", "KL730", "", "kl520", "garbage"} {
|
||||||
|
_, _, err := svc.UpgradeFirmware(context.Background(), "dev-1", chip)
|
||||||
|
if !errors.Is(err, ErrUnsupportedChip) {
|
||||||
|
t.Errorf("chip=%q: err=%v, want ErrUnsupportedChip", chip, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 7. DeviceNotFound
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_DeviceNotFound(t *testing.T) {
|
||||||
|
svc := NewService(&fakeLookup{drivers: map[string]UpgradeDriver{}}, FirmwareDir{Root: "/tmp"})
|
||||||
|
_, _, err := svc.UpgradeFirmware(context.Background(), "ghost", ChipKL520)
|
||||||
|
if !errors.Is(err, ErrDeviceNotFound) {
|
||||||
|
t.Errorf("err=%v, want ErrDeviceNotFound", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 8. RequestShutdown:shutdown 後新 task 被拒
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_ShutdownRejection(t *testing.T) {
|
||||||
|
drv := &fakeDriver{info: driver.DeviceInfo{ID: "dev-1"}}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
svc.RequestShutdown()
|
||||||
|
|
||||||
|
_, _, err := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if !errors.Is(err, ErrDeviceBusy) {
|
||||||
|
t.Errorf("err=%v, want ErrDeviceBusy after shutdown", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 9. WaitForActiveTasks:有 active 時 wait 會回 false(timeout)
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestWaitForActiveTasks_TimeoutWhenBusy(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-1"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{Stage: StageDone, Percent: 100},
|
||||||
|
},
|
||||||
|
delay: 300 * time.Millisecond,
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
_, ch, err := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpgradeFirmware error: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
clean := svc.WaitForActiveTasks(50 * time.Millisecond)
|
||||||
|
if clean {
|
||||||
|
t.Errorf("expected timeout (false), got clean (true)")
|
||||||
|
}
|
||||||
|
|
||||||
|
// drain rest 確保不 leak
|
||||||
|
drainProgress(t, ch, 5*time.Second)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 10. ListBundledVersions / GetCurrentVersion 基本行為
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestListBundledVersions(t *testing.T) {
|
||||||
|
// 建 temp firmware 結構(Suggestion 3 後、ListBundledVersions 會 os.Stat chipDir)
|
||||||
|
tmpRoot := t.TempDir()
|
||||||
|
if err := os.MkdirAll(filepath.Join(tmpRoot, ChipKL520), 0o755); err != nil {
|
||||||
|
t.Fatalf("setup mkdir: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
svc := NewService(&fakeLookup{drivers: map[string]UpgradeDriver{}}, FirmwareDir{Root: tmpRoot})
|
||||||
|
|
||||||
|
versions, err := svc.ListBundledVersions(ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("ListBundledVersions error: %v", err)
|
||||||
|
}
|
||||||
|
if len(versions) != 1 {
|
||||||
|
t.Fatalf("versions len = %d, want 1 (A 階段扁平)", len(versions))
|
||||||
|
}
|
||||||
|
if !versions[0].IsCurrent || !versions[0].IsBundled {
|
||||||
|
t.Errorf("expected IsCurrent + IsBundled, got %+v", versions[0])
|
||||||
|
}
|
||||||
|
|
||||||
|
// 不支援 chip
|
||||||
|
_, err = svc.ListBundledVersions("KL630")
|
||||||
|
if !errors.Is(err, ErrUnsupportedChip) {
|
||||||
|
t.Errorf("KL630 error = %v, want ErrUnsupportedChip", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 支援的 chip 但 firmware 漏 build(Suggestion 3 行為)
|
||||||
|
_, err = svc.ListBundledVersions(ChipKL720)
|
||||||
|
if err == nil {
|
||||||
|
t.Errorf("expected error when chip dir missing")
|
||||||
|
}
|
||||||
|
|
||||||
|
// 沒設 fwDir
|
||||||
|
svc2 := NewService(&fakeLookup{}, FirmwareDir{})
|
||||||
|
_, err = svc2.ListBundledVersions(ChipKL520)
|
||||||
|
if err == nil {
|
||||||
|
t.Errorf("expected error when fwDir.Root is empty")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestGetCurrentVersion(t *testing.T) {
|
||||||
|
drv := &fakeDriver{info: driver.DeviceInfo{ID: "dev-1", FirmwareVer: "v2.2.0"}}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
ver, err := svc.GetCurrentVersion("dev-1")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("GetCurrentVersion error: %v", err)
|
||||||
|
}
|
||||||
|
if ver.Version != "v2.2.0" {
|
||||||
|
t.Errorf("Version = %q, want v2.2.0", ver.Version)
|
||||||
|
}
|
||||||
|
|
||||||
|
// not found
|
||||||
|
_, err = svc.GetCurrentVersion("ghost")
|
||||||
|
if !errors.Is(err, ErrDeviceNotFound) {
|
||||||
|
t.Errorf("err = %v, want ErrDeviceNotFound", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 11. Multi-device 平行 upgrade(Minor 2 + Suggestion 2)
|
||||||
|
//
|
||||||
|
// 3 個 device 同時升級、確認:
|
||||||
|
// - tracker key 是 deviceID、不同 device 互不影響(不誤匹配)
|
||||||
|
// - 每個 task 的 ProgressCh 收到自己的 events、DeviceID 對得到
|
||||||
|
// - 3 個 task 都能完成、HasActiveTask 在最後回 false
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_MultiDeviceParallel(t *testing.T) {
|
||||||
|
mkDrv := func(id string) *fakeDriver {
|
||||||
|
return &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: id, Name: id + "-name", FirmwareVer: "KDP"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{Stage: StageFlashing, Percent: 50},
|
||||||
|
{Stage: StageDone, Percent: 100, AfterVersion: "2.2.0"},
|
||||||
|
},
|
||||||
|
delay: 20 * time.Millisecond,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
drvA := mkDrv("dev-A")
|
||||||
|
drvB := mkDrv("dev-B")
|
||||||
|
drvC := mkDrv("dev-C")
|
||||||
|
|
||||||
|
svc := NewService(
|
||||||
|
&fakeLookup{drivers: map[string]UpgradeDriver{
|
||||||
|
"dev-A": drvA,
|
||||||
|
"dev-B": drvB,
|
||||||
|
"dev-C": drvC,
|
||||||
|
}},
|
||||||
|
FirmwareDir{Root: "/tmp"},
|
||||||
|
)
|
||||||
|
|
||||||
|
type startResult struct {
|
||||||
|
id string
|
||||||
|
taskID string
|
||||||
|
ch <-chan FirmwareProgress
|
||||||
|
err error
|
||||||
|
}
|
||||||
|
|
||||||
|
startCh := make(chan startResult, 3)
|
||||||
|
for _, id := range []string{"dev-A", "dev-B", "dev-C"} {
|
||||||
|
go func(deviceID string) {
|
||||||
|
tid, ch, err := svc.UpgradeFirmware(context.Background(), deviceID, ChipKL520)
|
||||||
|
startCh <- startResult{id: deviceID, taskID: tid, ch: ch, err: err}
|
||||||
|
}(id)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 收集 3 個 start 結果、組成 deviceID → progressCh map
|
||||||
|
results := make(map[string]startResult, 3)
|
||||||
|
for i := 0; i < 3; i++ {
|
||||||
|
select {
|
||||||
|
case r := <-startCh:
|
||||||
|
if r.err != nil {
|
||||||
|
t.Fatalf("%s start err: %v", r.id, r.err)
|
||||||
|
}
|
||||||
|
results[r.id] = r
|
||||||
|
case <-time.After(3 * time.Second):
|
||||||
|
t.Fatalf("timeout waiting for 3 starts; got %d", i)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 各自 drain、驗證 events 的 DeviceID 對得到、且收到 done
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
for id, r := range results {
|
||||||
|
wg.Add(1)
|
||||||
|
go func(id string, ch <-chan FirmwareProgress) {
|
||||||
|
defer wg.Done()
|
||||||
|
evs := drainProgress(t, ch, 5*time.Second)
|
||||||
|
if len(evs) == 0 {
|
||||||
|
t.Errorf("%s: no events", id)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// 每個 event 的 DeviceID 必須 = id(不能誤匹配到別的 device)
|
||||||
|
for i, ev := range evs {
|
||||||
|
if ev.DeviceID != id {
|
||||||
|
t.Errorf("%s ev[%d].DeviceID = %q, want %q", id, i, ev.DeviceID, id)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// 最後 event 必為 done
|
||||||
|
if last := evs[len(evs)-1]; last.Stage != StageDone {
|
||||||
|
t.Errorf("%s last stage = %q, want done", id, last.Stage)
|
||||||
|
}
|
||||||
|
}(id, r.ch)
|
||||||
|
}
|
||||||
|
wg.Wait()
|
||||||
|
|
||||||
|
// 全部 done 後 HasActiveTask 必為 false
|
||||||
|
svc.WaitForActiveTasks(3 * time.Second)
|
||||||
|
if svc.HasActiveTask() {
|
||||||
|
t.Errorf("HasActiveTask still true after all 3 tasks done")
|
||||||
|
}
|
||||||
|
|
||||||
|
// 每個 driver 都剛好被呼叫一次(沒誤匹配到別的 deviceID)
|
||||||
|
for _, d := range []*fakeDriver{drvA, drvB, drvC} {
|
||||||
|
d.mu.Lock()
|
||||||
|
if d.callCount != 1 {
|
||||||
|
t.Errorf("driver(%s) callCount = %d, want 1", d.info.ID, d.callCount)
|
||||||
|
}
|
||||||
|
if d.calledChip != ChipKL520 {
|
||||||
|
t.Errorf("driver(%s) chip = %q, want KL520", d.info.ID, d.calledChip)
|
||||||
|
}
|
||||||
|
d.mu.Unlock()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 12. Suggestion 1:StageDone 去重
|
||||||
|
//
|
||||||
|
// driver 端送雙保險 done(stderr push + sendCommand 補一發)、service 端
|
||||||
|
// forward 應只 forward 一次 done 給 caller。
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestUpgradeFirmware_DedupeDoneEvent(t *testing.T) {
|
||||||
|
drv := &fakeDriver{
|
||||||
|
info: driver.DeviceInfo{ID: "dev-1", FirmwareVer: "KDP"},
|
||||||
|
events: []FirmwareProgress{
|
||||||
|
{Stage: StagePreparing, Percent: 5},
|
||||||
|
{Stage: StageDone, Percent: 100, AfterVersion: "2.2.0"},
|
||||||
|
// 第二次 done(模擬 stderr + sendCommand 雙保險)
|
||||||
|
{Stage: StageDone, Percent: 100, AfterVersion: "2.2.0"},
|
||||||
|
},
|
||||||
|
delay: 1 * time.Millisecond,
|
||||||
|
}
|
||||||
|
svc := newServiceWith(drv, "dev-1")
|
||||||
|
|
||||||
|
_, ch, err := svc.UpgradeFirmware(context.Background(), "dev-1", ChipKL520)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("UpgradeFirmware error: %v", err)
|
||||||
|
}
|
||||||
|
got := drainProgress(t, ch, 5*time.Second)
|
||||||
|
|
||||||
|
// 應該只看到一個 done(去重後)
|
||||||
|
doneCount := 0
|
||||||
|
for _, ev := range got {
|
||||||
|
if ev.Stage == StageDone {
|
||||||
|
doneCount++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if doneCount != 1 {
|
||||||
|
t.Errorf("doneCount = %d, want 1 (deduped); events: %+v", doneCount, got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
// 13. drainProgress helper
|
||||||
|
// ──────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func drainProgress(t *testing.T, ch <-chan FirmwareProgress, maxWait time.Duration) []FirmwareProgress {
|
||||||
|
t.Helper()
|
||||||
|
var out []FirmwareProgress
|
||||||
|
timer := time.NewTimer(maxWait)
|
||||||
|
defer timer.Stop()
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case ev, ok := <-ch:
|
||||||
|
if !ok {
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
out = append(out, ev)
|
||||||
|
case <-timer.C:
|
||||||
|
t.Fatalf("drainProgress timeout after %v, got %d events: %+v", maxWait, len(out), out)
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
123
local-tool/server/internal/firmware/types.go
Normal file
123
local-tool/server/internal/firmware/types.go
Normal file
@ -0,0 +1,123 @@
|
|||||||
|
// Package firmware provides Kneron NPU firmware upgrade / downgrade
|
||||||
|
// services. See docs/autoflow/04-architecture/v2/firmware-management.md
|
||||||
|
// (TDD §2.10 v2.2) for the design.
|
||||||
|
//
|
||||||
|
// A 階段(M9-2)只實作 upgrade(KDP1 → KDP2、KL520 / KL720)。
|
||||||
|
// Downgrade、多版本管理(B 階段 M9-11/M9-12)延後。
|
||||||
|
package firmware
|
||||||
|
|
||||||
|
import "time"
|
||||||
|
|
||||||
|
// FirmwareVersion 描述一個 bundled firmware 版本(TDD §4.1)。
|
||||||
|
//
|
||||||
|
// A 階段(M9-2)`IsBundled` 永遠為 true(不做線上更新通道);B2 階段
|
||||||
|
// 多版本機制(M9-11)擴展時保持同一 struct。
|
||||||
|
type FirmwareVersion struct {
|
||||||
|
Version string `json:"version"` // "v2.2.0" / "kdp1" / "SDK-v2.5.7"
|
||||||
|
DisplayName string `json:"displayName"` // "v2.2.0 (current)" / "KDP1 (legacy)"
|
||||||
|
IsCurrent bool `json:"isCurrent"` // 是否為當前 bundled current
|
||||||
|
IsBundled bool `json:"isBundled"` // 是否在安裝包內、A 階段永遠 true
|
||||||
|
ReleaseDate string `json:"releaseDate,omitempty"` // ISO 8601、optional
|
||||||
|
Notes string `json:"notes,omitempty"` // 「KDP1:限制 NPU 功能」等說明
|
||||||
|
}
|
||||||
|
|
||||||
|
// FirmwareProgress 是 service 層往 progressCh 推、再經 M9-3 WebSocket
|
||||||
|
// 廣播給前端的進度事件(TDD §4.2)。
|
||||||
|
//
|
||||||
|
// 成功路徑欄位:DeviceID / Stage / Percent / Direction / Message /
|
||||||
|
// ElapsedMs / EtaMs。失敗(Stage="error"、Percent=-1)時還會填 Reason /
|
||||||
|
// RawError / BeforeVersion / ErrorCode。完成(Stage="done"、Percent=100)
|
||||||
|
// 時會填 AfterVersion / Method。
|
||||||
|
//
|
||||||
|
// JSON tag 對齊 TDD §4.2、與 bridge.py 推出的 `firmware_progress` JSON
|
||||||
|
// schema 一致(欄位 snake_case 對應)、Go 端轉成 camelCase 走 WebSocket。
|
||||||
|
type FirmwareProgress struct {
|
||||||
|
DeviceID string `json:"deviceId"`
|
||||||
|
Stage string `json:"stage"` // preparing/loading/flashing/verifying/done/error
|
||||||
|
Percent int `json:"percent"` // 0-100、-1 表 error
|
||||||
|
Direction string `json:"direction,omitempty"` // "upgrade" / "downgrade"
|
||||||
|
Message string `json:"message,omitempty"`
|
||||||
|
ElapsedMs int64 `json:"elapsedMs"`
|
||||||
|
EtaMs int64 `json:"etaMs,omitempty"`
|
||||||
|
|
||||||
|
// 失敗欄位(Stage == "error" 時填)
|
||||||
|
Error string `json:"error,omitempty"` // 使用者可讀訊息
|
||||||
|
Reason string `json:"reason,omitempty"` // scan_not_found/connect_failed/loader_write_failed/upgrade_mid_failed/disconnect_during_op/timeout/verify_mismatch/verify_not_found
|
||||||
|
RawError string `json:"rawError,omitempty"` // bridge.py 原始 traceback、客服診斷
|
||||||
|
BeforeVersion string `json:"beforeVersion,omitempty"` // 升降版前 firmware 字串
|
||||||
|
ErrorCode string `json:"errorCode,omitempty"` // 內部追蹤碼
|
||||||
|
|
||||||
|
// 成功 done 欄位
|
||||||
|
AfterVersion string `json:"afterVersion,omitempty"` // 升降版後 firmware 字串
|
||||||
|
Method string `json:"method,omitempty"` // ctypes_kp_update_kdp_firmware_from_files 等
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stage 列舉(TDD §4.3、採 Design Spec §8 命名)。
|
||||||
|
const (
|
||||||
|
StagePreparing = "preparing"
|
||||||
|
StageLoading = "loading"
|
||||||
|
StageFlashing = "flashing"
|
||||||
|
StageVerifying = "verifying"
|
||||||
|
StageDone = "done"
|
||||||
|
StageError = "error"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Direction 列舉。
|
||||||
|
const (
|
||||||
|
DirectionUpgrade = "upgrade"
|
||||||
|
DirectionDowngrade = "downgrade"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Reason 列舉(TDD §3.4 / §6.1 reason enum、與 bridge.py 對齊)。
|
||||||
|
const (
|
||||||
|
ReasonScanNotFound = "scan_not_found"
|
||||||
|
ReasonConnectFailed = "connect_failed"
|
||||||
|
ReasonLoaderWriteFailed = "loader_write_failed"
|
||||||
|
ReasonUpgradeMidFailed = "upgrade_mid_failed"
|
||||||
|
ReasonDisconnectDuringOp = "disconnect_during_op"
|
||||||
|
ReasonTimeout = "timeout"
|
||||||
|
ReasonVerifyMismatch = "verify_mismatch"
|
||||||
|
ReasonVerifyNotFound = "verify_not_found"
|
||||||
|
// downgrade 專屬(B2 階段、保留 reason key)
|
||||||
|
ReasonValidateFailed = "validate_failed"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Chip 列舉(A 階段只允許 KL520 / KL720、TDD §6.1)。
|
||||||
|
const (
|
||||||
|
ChipKL520 = "KL520"
|
||||||
|
ChipKL720 = "KL720"
|
||||||
|
)
|
||||||
|
|
||||||
|
// SupportedUpgradeChip 回傳 chip 是否被 A 階段支援。KL630 / KL730 在
|
||||||
|
// B 階段 M9-10 才開、現在拒絕。
|
||||||
|
func SupportedUpgradeChip(chip string) bool {
|
||||||
|
return chip == ChipKL520 || chip == ChipKL720
|
||||||
|
}
|
||||||
|
|
||||||
|
// ActiveTaskInfo 給 graceful shutdown handler / control panel modal 顯示用
|
||||||
|
// (TDD §8.6.1 + §8.6.2、簡化的 task 狀態快照)。
|
||||||
|
type ActiveTaskInfo struct {
|
||||||
|
TaskID string `json:"taskId"`
|
||||||
|
DeviceID string `json:"deviceId"`
|
||||||
|
DeviceName string `json:"deviceName,omitempty"`
|
||||||
|
Chip string `json:"chip"`
|
||||||
|
Direction string `json:"direction"`
|
||||||
|
Stage string `json:"stage"`
|
||||||
|
StartTs time.Time `json:"startTs"`
|
||||||
|
ElapsedMs int64 `json:"elapsedMs"`
|
||||||
|
EtaSeconds int `json:"etaSeconds,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpgradeTimeoutFor 依 chip 回傳 service 層 timeout 上界(TDD §3.4
|
||||||
|
// 與 §8.6.1 對齊:KL520 60s、KL720 200s)。bridge.py 端有同名 constant
|
||||||
|
// 做第一道防護、service 端是第二道(外層保險、避免 bridge.py SIGTERM
|
||||||
|
// 拒絕導致 service goroutine 永遠等不到回應)。
|
||||||
|
func UpgradeTimeoutFor(chip string) time.Duration {
|
||||||
|
switch chip {
|
||||||
|
case ChipKL720:
|
||||||
|
return 200 * time.Second
|
||||||
|
default:
|
||||||
|
// KL520 + 預設 fallback
|
||||||
|
return 60 * time.Second
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -612,8 +612,9 @@ class TestFirmwareUpgradeGracefulShutdown(unittest.TestCase):
|
|||||||
|
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
# 確保旗標歸零
|
# 確保旗標歸零
|
||||||
|
# Reviewer m5(M9-1 round 2、cleanup in M9-2):m4 已砍 bridge._firmware_upgrade_start_ts
|
||||||
|
# 全域變數、test 端的 setattr 留下死碼會掩蓋 m4 修改徹底性、移除。
|
||||||
bridge._firmware_upgrade_in_progress = False
|
bridge._firmware_upgrade_in_progress = False
|
||||||
bridge._firmware_upgrade_start_ts = 0.0
|
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
try:
|
try:
|
||||||
@ -628,8 +629,11 @@ class TestFirmwareUpgradeGracefulShutdown(unittest.TestCase):
|
|||||||
import signal
|
import signal
|
||||||
|
|
||||||
bridge._firmware_upgrade_in_progress = True
|
bridge._firmware_upgrade_in_progress = True
|
||||||
|
# Reviewer m5(M9-1 round 2、cleanup in M9-2):m4 已砍
|
||||||
|
# bridge._firmware_upgrade_start_ts 全域變數、start_ts 改由
|
||||||
|
# closure capture(_fw_register_sigterm_handler 參數)、test 端
|
||||||
|
# 不再 setattr。
|
||||||
start_ts = time.monotonic()
|
start_ts = time.monotonic()
|
||||||
bridge._firmware_upgrade_start_ts = start_ts
|
|
||||||
|
|
||||||
# 攔截 stderr
|
# 攔截 stderr
|
||||||
capture = io.StringIO()
|
capture = io.StringIO()
|
||||||
@ -677,6 +681,11 @@ class TestFirmwareUpgradeGracefulShutdown(unittest.TestCase):
|
|||||||
msg="handler should be installed during upgrade")
|
msg="handler should be installed during upgrade")
|
||||||
|
|
||||||
# 重新 install 後 unregister
|
# 重新 install 後 unregister
|
||||||
|
# Reviewer s5(M9-1 round 2、cleanup in M9-2):第二次呼叫
|
||||||
|
# _fw_register_sigterm_handler 是刻意設計、用來驗證 register 行為
|
||||||
|
# 可重入(idempotent shape)—— 上一次 unregister 後或同步再 register
|
||||||
|
# 一次仍能正確 install handler、之後 unregister 仍能還原為非 wrapper。
|
||||||
|
# 這裡 retest 一次 install→unregister 對的清空效果。
|
||||||
bridge._fw_register_sigterm_handler(time.monotonic())
|
bridge._fw_register_sigterm_handler(time.monotonic())
|
||||||
bridge._fw_unregister_sigterm_handler()
|
bridge._fw_unregister_sigterm_handler()
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user