jim800121chen 8cd5751ce3 feat(local-tool): M8 重構 — Wails 控制台 + 瀏覽器 Web UI（R5 決策）

依 R5 五輪決策把 visionA-local 從「Wails 內嵌 Next.js」重構為「Wails
本機伺服器控制台 + 瀏覽器 Web UI」模式（類比 Docker Desktop / Ollama）。

程式碼變動
  - M8-1 砍 yt-dlp 全套（後端 resolver / URL handler / 前端 URL tab /
    Makefile vendor / installer / bootstrap / CI workflow，-555 行）
  - M8-2 砍 Mock 模式全套（driver/mock、mock_camera、Settings runtimeMode、
    VISIONA_MOCK 環境變數，-528 行）
  - M8-3 ffmpeg 從 GPL 切換到 LGPL 混合方案：Windows/Linux 用 BtbN 現成
    LGPL binary，macOS 自 build minimal decoder-only 進 git
    （vendor/ffmpeg/macos/ffmpeg 5.7MB + ffprobe 5.6MB，比 GPL 版省 85% 空間）
  - M8-4 Wails Server Controller：state machine、log ring buffer 2000 行、
    preferences.json atomic write、boot-id、Gin SkipPaths、shutdown 7+1 秒、
    notify_*.go 三平台 OS 通知、watchServer 改 Error state 不 os.Exit
  - M8-4b 啟動階段管線 R5-E：6 階段進度 event、20s soft / 60s hard timeout、
    stage 5/6 skip 規則、sentinel file、RestartStartupSequence 5 步驟
  - M8-5 Wails 控制台 vanilla HTML/JS/CSS（9 檔 ~2012 行）取代 M7-B splash：
    state 視覺、log panel、startup progress panel、Stage 6 manual CTA
    pulse、shutdown modal、Settings、Dark Mode、i18n 中英雙語
  - M8-6 上傳影片副檔名擴充（mp4/avi/mov/mpeg/mpg）
  - M8-7 Web UI Server Offline Overlay（role=alertdialog + focus trap +
    wsEverConnected 容錯 + Page Visibility）
  - M8-8 CORS middleware（127.0.0.1/localhost only + suffix attack 防護）+
    ws/origin.go 獨立 WebSocket CheckOrigin 避 package cycle
  - MAJ-4 server:shutdown-imminent WebSocket broadcast 機制
    （/ws/system endpoint + notifyShutdownImminent helper）
  - M8-9 Boot-ID + 瀏覽器 tab 自動重連（sessionStorage loop guard）

品質
  - ~105+ 新 unit test + race detector (-count=2) 全綠
  - 10 個 milestone 全部通過 Reviewer 審查
  - 三方 v2 + v2.1 文件（PRD / Design Spec / TDD）+ 交叉互審紀錄
    收錄在 .autoflow/

交付前待處理（M8-10）
  - 重跑 make payload-macos 把舊 GPL 77MB binary 換成新 LGPL
  - 三平台 end-to-end build 驗證

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-15 17:57:54 +08:00

23 KiB

Raw Permalink Blame History

Reviewer 審查 M8-4 ServerController + Log Ring Buffer + Preferences + Notify + Boot-ID（2026-04-15）

摘要

總結論：⚠️ 需小改（核心架構正確，有 2 個 Major 阻擋後續 milestone）
阻擋情況：
- 阻擋 M8-5/M8-7/M8-9：watchServerV2 在 Stop 後仍存活會誤觸 Error state（誤發崩潰通知）→ 直接破壞 M8-5 控制台 UX；server 端 server:shutdown-imminent WebSocket 廣播未實作會直接讓 M8-9 的 race 風險回來
- 不阻擋 M8-4b：M8-4b 是擴充 ServerController 加 pipeline hook，本層 state machine 介面已就緒，可平行進行
親跑驗證：build / vet / test / race detector 全綠；smoke test boot-id + SkipPaths 完全符合預期

A. Log Ring Buffer（`log_buffer.go`）

檢查項	結論	備註
容量 2000 行	✅	`logBufferCap = 2000`（L26）
Thread-safe	✅	Append/Snapshot/Reset/Size/Dropped 全 mutex
Wrap-around	✅	head 環形遞增、size 飽和、dropped 計數正確（test 驗證 line-100..2099）
Snapshot 回 copy	✅	`out := make([]LogLine, 0, n)` + append（L96-105）
Rate limit 200/sec burst	⚠️ Minor	演算法本身正確（固定視窗 CAS），但呼叫單位有偏差（見 D 段）
`parseLogLevel`	✅	bracket / 冒號 / Gin 三種格式齊備；空字串、非 ASCII 不 panic
5 個 unit test	✅	TestLogBuffer_*、TestParseLogLevel 全綠

Minor 1（A.1）：Snapshot 從 i := 0 走到 b.size，前 skip 行只是 continue，可以直接從 (start + skip) % logBufferCap 開始走 n 步，效率較好。當 size = 2000、n = 100 時白跑 1900 圈。非阻擋，效能影響小於 0.1 ms。

Minor 2（A.2）：ShouldEmit 內 CAS 與 Store(0) 的微 race — A 視窗內 CAS 成功還沒 Store(0) 之前，B 已先 Add(1)，可能讓新視窗第一行被舊 count 計算。實際只會多放 ≤ 1 行 emit，不影響功能正確性，可不修。

B. Preferences（`preferences.go`）

檢查項	結論	備註
`DefaultPreferences()` 平台分支	✅	`runtime.GOOS != "linux"` → true（L42）
Atomic write-rename	✅	tmp → `os.Rename` → cleanup（L78-95）
讀取失敗 fallback	✅	缺檔 / JSON 損毀 → 回 default + warning（L57-71）
JSON 欄位齊全	✅	autoOpenBrowser / locale / logRingSize
5 個 unit test	✅	平台預設 / missing file / corrupt JSON / roundtrip / 殘留 tmp

極佳：SavePreferences 失敗時 os.Remove(tmpPath) 把垃圾 tmp 清掉（L92），符合 atomic 持久化最佳實踐。

Minor 3（B.1）：SavePreferences 路徑在跨檔案系統 rename 時會失敗（罕見邊界，例如 dataDir 是 NFS / overlayfs）。建議將來加 io.Copy + os.Remove(tmp) fallback，但本 milestone 不必處理。

C. ServerController State Machine（`server_control.go`）

C.1 雙鎖機制

檢查項	結論	備註
State enum 完整	✅	Idle / Starting / Running / Stopping / Stopped / Error（L43-50）
txMu 序列化 transition	✅	Start/Stop/ForceKill 都 `c.txMu.Lock(); defer c.txMu.Unlock()`
mu 保護欄位讀寫	✅	state / proc / startedAt / lastError 全走 mu
setState emit Wails event	✅	L120 `EventsEmit("server:state-change", status)`
鎖順序避免 deadlock	✅	a.mu → c.mu 全部單向
race detector	✅	`go test -race ./...` 全綠

C.2 Start / Stop / Restart / ForceKill 行為

行為	結論	備註
Start 允許 cold start fallback	✅	`startInternal(0)` → `pickPort(defaultPreferredPort)`
Restart 強制保留 port	✅	L233-244 讀 oldPort → Stop → StartWithPort(oldPort)
StartWithPort 對佔 port 不 fallback	✅	L408-411 直接 return error
`killStaleServerOnPort` 嘗試清乾淨	✅	L405-408（500 ms wait）
ForceKill 不發 crash 通知	✅	直接呼叫 `proc.forceKill()`，不走 handleWatchFailure
Shutdown 7 + 1 秒 modal	✅	`shutdownGraceV2 = 7s`、modalTimer = 1s（L319-339）
Windows 沒 SIGTERM 直接 Kill	✅	L307 `runtime.GOOS == "windows"` 分支
watchServerV2 連 3 次失敗進 Error	✅	L644-649 `handleWatchFailure`
不 os.Exit	✅	watchServerV2 完全沒呼叫 `reportFatal` / `os.Exit`

C.3 Major 與 Minor 問題（針對 ServerController）

Major 1（C.MAJ-1）：Stop() / ForceKill() 不會 cancel watchCancel，導致 watchServerV2 在 server 死後仍存活，30 秒（3 × 10s）後會把 state 從 Stopped 翻成 Error，並發 OS 崩潰通知**。

檔案/行：server_control.go:198-229（Stop）、server_control.go:251-265（ForceKill）、server_control.go:617-652（watchServerV2）。

重現：

Start → Running

Stop → state = Stopped、proc = nil

watchServerV2 仍持有舊 sp 與舊 port，下一個 ticker 到 → health check fail → failures++

30 秒後 failures = 3 → handleWatchFailure → setState(Error) + EventsEmit("server:error") + sendCrashNotification("Server 崩潰")

後果：使用者按 Stop 後 30 秒看到「visionA Local — Server 崩潰」OS 通知；M8-5 控制台 UI 會把 Stopped 翻紅變 Error。這是嚴重 UX bug。

修法：Stop() 與 ForceKill() 在持有 txMu 期間先 cancel a.watchCancel：
a.mu.Lock()
if a.watchCancel != nil { a.watchCancel(); a.watchCancel = nil }
a.mu.Unlock()
一行集中到 helper 函式更乾淨。

Major 2（C.MAJ-2）：handleWatchFailure 不持 txMu，會與正在進行的 Stop / ForceKill / Restart race。

檔案/行：server_control.go:269-291。

重現：

watchServerV2 連續失敗 2 次（failures = 2）

使用者按 Stop → ctrl.Stop() 取得 txMu → setState(Stopping) → proc.stopGraceful() → setState(Stopped)

同時 watchServerV2 第 3 次失敗 → handleWatchFailure → setState(Error)

競態結果：state 可能是 Error（覆蓋 Stopped），即使使用者剛剛主動 Stop

修法：handleWatchFailure 進場先取 txMu，並在 setState 前重新檢查 state（已是 Stopped/Stopping/Idle 就 return）。或更穩：watchServerV2 在 ticker 失敗時也檢查 c.State() != ServerStateRunning 直接退出 goroutine。

Minor 4（C.MIN-1）：Restart() 拆成獨立的 Stop() 與 StartWithPort() 兩段 txMu 持有，中間有窗口期讓另一個 binding 呼叫插隊。實務上前端按鈕在 Stopping 時 disable 應該擋住，但仍是潛在 race。建議 Restart() 自己持 txMu 整段（或加 restartInternal 不再呼叫 Stop / StartWithPort，而是內部 inline 邏輯）。

Minor 5（C.MIN-2）：stopGraceful 走完 done 路徑後立刻 closeLogFiles()，但 logPump goroutine 還在處理 lineCh 中 buffered 的最後幾行（會寫 fileWriter）。pipe EOF 與 stopGraceful 的時序可能有極窄的 race（race detector 沒抓到是因為 unit test 沒實際 spawn process）。建議：stopGraceful 等待 logPump 結束（加 done channel）再 close file，或讓 logPump 持有 file ownership 自己 defer close。

D. Log Pump goroutine

檢查項	結論	備註
每個 pipe 一個 goroutine	✅	`go a.logPump(stdoutPipe, ...)` × 2（L491-492）
10 ms micro-batch	✅	`time.NewTicker(10 * time.Millisecond)`
Rate limit fallback：超量只寫檔不 emit	⚠️ 偏差	見下方 D.MIN-1
pipe EOF 後 goroutine 自行退出	✅	scanner.Scan() 結束 → close(scanDone) → main loop 收到 → flush + return
lineCh 沒被 close	⚠️ Minor	由 GC 處理；scanDone 訊號替代

D.MIN-1（rate limit 計數單位）：TDD §4.5 + Reviewer prompt 描述「1 秒 200 行」，但 logPump 是每個 batch 呼叫一次 ShouldEmit()（server_control.go:572），不是每行。因為 batch 大小 1~數十不等，實際 burst 容許行數可能達 200 × 平均 batch size。

不算 bug（限流仍能擋住災難級瀑布），但單元測試 TestLogBuffer_RateLimit（line 108-127）的假設與真實使用方式不符 — test 模擬「每行一次 ShouldEmit」，實際 logPump 是「每 flush 一次 ShouldEmit」。

建議：要嘛把 ShouldEmit(n int) 改成接行數參數一次扣 n；要嘛在 logPump flush 時 for i := 0; i < len(batch); i++ { if !ShouldEmit() { drop } }。不阻擋本 milestone。

D.MAJ-1（雖然影響不大，但屬於資料丟失 → 標 Major）：logPump 在 server 子程序剛 exit 時可能丟掉最後幾行 stderr（崩潰原因）。

檔案/行：server_control.go:579-608。

原因：select 同時看 lineCh 與 scanDone。當 scanner goroutine 結束 → close(scanDone)，lineCh buffer 內可能仍有最多 128 行未處理。Go select 在多個 ready case 隨機選擇，可能直接走 <-scanDone flush 後 return，把 lineCh 中剩餘的行全數丟棄，包括寫檔與 ring buffer。

後果：server crash 時最後幾行 stderr（最有價值的 stack trace）有機率丟。

修法：scanner goroutine 結束時 close(lineCh) 而不是 close(scanDone)；main loop 用 case line, ok := <-lineCh: if !ok { return } 作為唯一退出條件。或更簡單：scanDone 收到後先 drain lineCh 再 return：
case <-scanDone:
    for {
        select {
        case line := <-lineCh:
            // 處理寫檔 + ring buffer + batch
        default:
            flush(); return
        }
    }

Minor 6（D.MIN-2）：scanner goroutine 內 select { case lineCh <- text: default: lineCh <- text } 寫法奇怪 — default 之後又 blocking send 同一個值，等同沒 default。建議直接 lineCh <- text 一行。

E. OnBeforeClose handler（`main.go`）

檢查項	結論	備註
呼叫關閉時觸發 graceful shutdown	✅	return false → Wails OnShutdown → app.shutdown → ctrl.Stop
return false 允許關閉	✅	L34
不彈 confirm 對話框	✅	無對話框邏輯，符合 R5-2

潛在 UX 風險（非 bug）：Wails 視窗一旦開始 close，前端 JS 是否還能渲染 shutdown:modal-show 是不確定的（不同 OS 行為不同）。M8-5 控制台 UI 改寫時應實機驗證；現階段不阻擋。

F. Notify 三平台

檔案	build tag	機制	結論
`notify_darwin.go`	`//go:build darwin` ✅	osascript display notification + escape	✅
`notify_linux.go`	`//go:build linux` ✅	notify-send -u critical -i dialog-error	✅
`notify_windows.go`	`//go:build windows` ✅	PowerShell BurntToast + msg * fallback	✅

檢查項	結論	備註
三平台 build tag 齊全	✅	沒重複定義 / 沒缺失
AppleScript escape	✅	反斜線 + 雙引號
PowerShell escape	✅	單引號 → ''
fire-and-forget 呼叫	✅	呼叫端用 `go sendCrashNotification(...)`
失敗只 log 不 fail	✅	三平台都 fmt.Fprintf stderr

Minor 7（F.MIN-1）：三平台都用 cmd.Run() 同步等子程序結束。雖然呼叫端用 go wrapper，但 goroutine 內若 osascript / powershell hang 住會 leak（極罕見）。建議：包一個 context.WithTimeout(5*time.Second) + exec.CommandContext，過期 Kill。本 milestone 可不修。

Minor 8（F.MIN-2）：notify_linux.go 假設 notify-send 一定在 PATH 上。雖然 prompt 提到「若不存在 log warning 不 fail」，實作確實不會 crash，但 exec.Command.Run() 在 binary not found 時回的是 *exec.Error，輸出的 warning 對使用者不夠友善（純 path not found）。可接受，不修。

G. Boot-ID + SkipPaths

檢查項	結論	備註
`crypto/rand.Read` + `hex.EncodeToString`	✅	`system_handler.go:27-32`
32 字元 hex	✅	smoke test 實測 `bd94ae7f976a6e526bd9840fc385b92d`
process-scoped（每次啟動新 ID）	✅	`NewSystemHandler` 內呼叫 `newBootID()`
`GET /api/system/boot-id` 註冊	✅	`router.go:59`
Response shape	✅	`{success, data: {bootId, startedAt}}`
`broadcasterLoggerSkipPaths` 含 boot-id + health	✅	`router.go:120-123`
skip 邏輯只比對 path	✅	`router.go:140-142`
Smoke test：兩個 endpoint 不寫 access log	✅	實測 stdout 只有 `[GIN] 200 ... GET /api/devices`，沒有 boot-id 或 health 的 GIN log

Minor 9（G.MIN-1）：boot-id 產生位置偏離 TDD §9.1。TDD 寫的是「server/main.go 在啟動時產生 boot-id，傳入 SystemHandler」，實作把 newBootID() 放在 NewSystemHandler 內部呼叫（system_handler.go:42）。功能完全等價，但讓單元測試難以注入固定 boot-id 做 deterministic test。Minor 偏離規格，可接受。

Major 3（G.MAJ-1）：TDD §8.1 + milestone-plan 要求 server/main.go 的 shutdownFn timeout 從 10 秒改成 6 秒，未改。

檔案/行：server/main.go:166。
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
應改為 6*time.Second。

後果：Wails ctrl.Stop 在第 7 秒會 SIGKILL，但 server 自己內部還在 10 秒 graceful cleanup window 中 — 可能出現「server 還在 sync 檔案到一半被打斷」。對 v2 「7 + 1 秒 modal」UX 設計直接破壞。

H. v1/v2 並存策略評估

實作保留了 ServerProcess.stop() / kill() / watchServer v1 路徑（app.go:583-685）+ v1 shutdownGracePeriod = 5s 常數，宣稱為「fallback」 + 「M8-5 前端改寫期間相容」。

評估結論：⚠️ 策略合理但需立即標記砍除時程

問題	風險	結論
v1/v2 路徑會被同時呼叫嗎？	不會	startup 只走 ctrl.Start → startServerV2；shutdown 只走 ctrl.Stop → stopGraceful。`app.stopServer()` 與 `App.startServer` 已被 dead code
v1 常數 `shutdownGracePeriod = 5s` 會被誤用嗎？	低	只有 v1 ServerProcess.stop() 用到，已是 dead path
v1 `watchServer` 會被誤啟動嗎？	低	只在 v1 startServer 的 dead path 啟動；新 startServerV2 走 watchServerV2
reportFatal v1 路徑仍存在	中	startup 失敗 / single-instance 失敗仍會走，這合理（startup 完全失敗時控制台還沒就緒，無 fallback 只能原生對話框）
未來何時砍	未明確	必須在 progress.md「未解決問題」記錄：M8-5 完成後立即砍全部 v1：`ServerProcess.stop/kill`、`watchServer`、`stopServer`、`shutdownGracePeriod`、`startServer`

Minor 10（H.MIN-1）：shutdown()（app.go:222）的 fallback 路徑 a.stopServer() 是 dead code，因為 ctrl 在 startup 一定會被建立 — 直接刪 else 分支，避免讓未來的開發者誤以為兩條路徑都有效。

Minor 11（H.MIN-2）：v1 watchServer 與 v2 watchServerV2 內容 90% 相同，只差最後失敗的 reportFatal vs handleWatchFailure。等 M8-5 砍 v1 即可消除這份重複。

I. Build / Test / Race detector 結果

cd server && go build ./...                  ✅ PASS
cd server && go vet ./...                    ✅ PASS
cd server && go test ./...                   ✅ PASS（device + model 既有 test）
cd server && go test -race ./...             ✅ PASS

cd visiona-local && go build .               ✅ PASS
cd visiona-local && go vet ./...             ✅ PASS
cd visiona-local && go test -count=1 ./...   ✅ PASS（20 tests 全綠）
cd visiona-local && go test -race ./...      ✅ PASS

20 個 unit test 明細：

log_buffer_test.go：5 個（Append / Wrap / Reset / RateLimit / parseLogLevel）
preferences_test.go：5 個（platform default / missing / corrupt / roundtrip / atomic rename）
server_control_test.go：10 個（Initial / setState / Stop×2 / ForceKill / snapshotStatus / GetRecentLogs / GetSystemInfo / ClearLogs / GetServerStatusV2）

race detector 沒抓到任何 race，但要注意 unit test 沒有實際 spawn server binary，因此 logPump↔stopGraceful 的 file handle race（C.MIN-2 / D.MAJ-1 / Major 1）必須靠人工 review 與 integration test 才能驗證。

J. Smoke test 結果

執行：go run server 啟動真實 server，curl 各 endpoint。

步驟	結果
1. `curl /api/system/boot-id`	✅ `{"data":{"bootId":"bd94ae7f976a6e526bd9840fc385b92d","startedAt":1776213642971},"success":true}`（32 字元 hex）
2. `curl /api/system/health`	✅ `{"status":"ok"}`
3. `curl /api/devices`	✅ 正常回應
4. stdout 內 GIN access log	✅ 只有 `[GIN] 200 \| 137.674µs \| GET /api/devices`，沒有 boot-id / health 的 access log
5. 連續 4 次 poll boot-id 都回同一 ID（process-scoped）	✅

SkipPaths 機制完全符合 TDD §9.1a 與 milestone-plan 預期。

K. 遺漏 / 誤解

Major 4（K.MAJ-1）：未實作 server:shutdown-imminent WebSocket 廣播（milestone-plan #157，「Minor 4」明確要求）。

檔案/行：server_control.go:302-340（stopGraceful 內應在 SIGTERM 前先廣播）；server 端應有 /ws/system 或擴充 /ws/server-logs 來廣播 server:shutdown-imminent。

影響：M8-9 的 web UI offline overlay race condition 防護機制完全缺席。M8-9 將會被迫補做、或 fall back 到「3 次 poll 失敗 = 15s 才顯示 overlay」的舊行為。

狀態：阻擋 M8-9，建議在 M8-4b 一併補做或拆獨立小 task。

Minor 12（K.MIN-1）：milestone-plan #153 要求「shutdownGracePeriod 5 s → 7 s」，實作改用新常數 shutdownGraceV2 = 7s、舊常數 5s 還在。功能等效但字面違反 milestone-plan。可接受作為 v1/v2 並存策略的副作用，但要在 H 段提到的「砍 v1」清單中一併處理。

Minor 13（K.MIN-2）：ForceKillServer 確認不發崩潰通知 — 走 proc.forceKill() 路徑直接 SIGKILL，不經過 handleWatchFailure，符合 prompt 要求。✅

Minor 14（K.MIN-3）：preferences 讀取失敗的 fallback 行為符合 TDD §11（不 fail、用 default）。✅

Minor 15（K.MIN-4）：notify 三平台 timeout 處理（F.MIN-1）— 已在 F 段提，不重複列。

L. 問題清單

Major（阻擋後續 milestone）

#	檔案:行	問題	影響的 milestone	修法
MAJ-1	server_control.go:198-229 (Stop), 251-265 (ForceKill)	`Stop()` / `ForceKill()` 不 cancel watchCancel，30 秒後誤觸 Error state + 發 OS 崩潰通知	M8-5、M8-7	Stop/ForceKill 進場後加 `if a.watchCancel != nil { a.watchCancel(); a.watchCancel = nil }`
MAJ-2	server_control.go:269-291	`handleWatchFailure` 不取 txMu，與 Stop/ForceKill race，可能把 Stopped 翻成 Error	M8-5	進場取 txMu，setState 前重新檢查 state；或 watchServerV2 ticker 內檢查 `c.State() != ServerStateRunning` 直接退出
MAJ-3	server/main.go:166	shutdownFn timeout 仍是 10s，未對齊 TDD §8.1 + milestone-plan 的 6s	M8-5、M8-9	改 `6*time.Second`
MAJ-4	server_control.go:302-340 + 缺 server WebSocket 廣播	未實作 `server:shutdown-imminent` 廣播（milestone-plan Minor 4）	M8-9	server 端加 `/ws/system` 或擴充 `/ws/server-logs`；ctrl.Stop SIGTERM 前 HTTP/WS 觸發
MAJ-5	server_control.go:579-608 (logPump main loop)	scanDone 觸發後 lineCh 仍有 buffered 行未處理 → 資料丟失（含崩潰時最後幾行 stderr）	M8-5 (debug experience)	scanDone case 加 drain loop 或讓 scanner goroutine 自己 close lineCh

Minor（建議修，不阻擋）

#	檔案	問題
MIN-1	log_buffer.go:99-105	Snapshot 從 i=0 走 size 圈，可改為從 (start+skip) 走 n 圈
MIN-2	log_buffer.go:139-151	ShouldEmit CAS race：A 視窗 Store(0) 前 B 已 Add(1)，極小機率多放 1 行
MIN-3	preferences.go:91	跨 filesystem rename 失敗時無 io.Copy fallback（罕見邊界）
MIN-4	server_control.go:232-245	Restart 拆兩段 txMu，中間有插隊窗口
MIN-5	server_control.go:302-340	stopGraceful 結束時關 file 與 logPump goroutine 還在寫的時序 race
MIN-6	server_control.go:556-562	scanner goroutine `select { case lineCh<-text: default: lineCh<-text }` 寫法奇怪、等同 blocking
MIN-7	notify_*.go	三平台都用 cmd.Run() 無 timeout，極罕見情況下 hang 會 leak goroutine
MIN-8	notify_linux.go	notify-send not in PATH 時 warning 不夠友善
MIN-9	system_handler.go:42	newBootID 放在 handler 內呼叫，偏離 TDD §9.1 「main.go 注入」設計，不利於 deterministic test
MIN-10	app.go:222-227	shutdown() 的 v1 fallback 是 dead code，建議刪
MIN-11	app.go:583-622	watchServer v1 與 watchServerV2 高度重複，待 M8-5 後砍
MIN-12	app.go:46	shutdownGracePeriod = 5s 仍在，違反 milestone-plan #153 字面
MIN-13	log_buffer_test.go:108-127	TestLogBuffer_RateLimit 假設「每行一次 ShouldEmit」，與 logPump 實際「每 batch 一次」不一致

M. 結論

M8-4 的核心架構正確且品質高：

做得好的地方（值得讚許）：

雙鎖機制 txMu + mu 設計清晰、race detector 全綠
LogBuffer 的 ring buffer + atomic rate limit 結構漂亮
Preferences atomic write-rename 含 tmp cleanup
三平台 notify 用 build tag 隔離乾淨、escape 處理到位
boot-id 用 crypto/rand 不引 google/uuid，符合 TDD 定版決策
broadcasterLogger SkipPaths 實測完美過 smoke test
20 unit test 全綠、race detector 全綠
v1/v2 並存策略雖然增加維護成本，但邏輯上沒有衝突
文件註解極詳細，每個函式都標 TDD 出處

必須處理的問題（阻擋後續 milestone）：

5 個 Major — MAJ-1 ~ MAJ-5。其中 MAJ-1（watch leak 誤觸 Error）與 MAJ-2（handleWatchFailure race）是直接破壞 M8-5 UX 的 bug，第二輪修改前必須處理。MAJ-3（shutdownFn 6s）與 MAJ-4（shutdown-imminent 廣播）是純遺漏，補就好。MAJ-5（logPump 丟最後幾行）影響 debug 體驗。
15 個 Minor — 大多可以放到 v1 砍除時一併處理，或當作技術債記錄。

對 Orchestrator 的建議：

第二輪修改範圍（必須）：
- MAJ-1：Stop/ForceKill cancel watchCancel
- MAJ-2：handleWatchFailure 取 txMu + 二次檢查 state
- MAJ-3：server/main.go shutdownFn 6s
- MAJ-5：logPump scanDone 加 drain loop
MAJ-4 處理時機：M8-4b 一併做，因為它需要 server 端新增 /ws/system endpoint，與 startup-pipeline 的 event 設計可一起規劃。或拆獨立 micro-task 立刻補做，不要等到 M8-9 才發現缺。
v1 砍除追蹤：在 progress.md「未解決問題」加一條：「M8-4 v1 路徑（ServerProcess.stop/kill、watchServer、stopServer、shutdownGracePeriod、startServer）保留作為相容過渡，M8-5 完成後立即砍除（含相關 dead code 清理 MIN-10/11/12）」。
整合測試：M8-4 的 unit test 沒實際 spawn server binary，因此 logPump file handle race（MIN-5）、stopGraceful timing、preferences 跨檔案系統等只能靠 manual / integration test 驗證。建議在 M8-5 完成後安排一次完整 integration smoke test（Start → 噴 log → Stop → Restart → Force Kill → 模擬 crash → 各事件都要正確）。

結論：⚠️ 需小改後通過。修完 5 個 Major 即可進 M8-4b / M8-5。Minor 不阻擋。

23 KiB Raw Permalink Blame History Unescape Escape