From 5aa374625f0053e1827d8efd2cc21d9f3c9387cd Mon Sep 17 00:00:00 2001
From: abin <abinchen0914@gmail.com>
Date: Mon, 6 Apr 2026 19:31:52 +0800
Subject: [PATCH] docs: add autoflow project docs and test infrastructure

- Add .autoflow/ with health check, PRD, Design Doc, TDD, progress tracking
- Add tests/conftest.py with PyQt5/KP SDK stubs for unit testing
- Add pytest config to pyproject.toml (pythonpath, import-mode, test naming)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .autoflow/00-onboarding/health-check.md |  141 +++
 .autoflow/02-prd/PRD.md                 |  344 +++++++
 .autoflow/04-architecture/TDD.md        | 1149 +++++++++++++++++++++++
 .autoflow/04-architecture/design-doc.md |  581 ++++++++++++
 .autoflow/progress.md                   |   39 +
 pyproject.toml                          |    8 +
 tests/conftest.py                       |   46 +
 7 files changed, 2308 insertions(+)
 create mode 100644 .autoflow/00-onboarding/health-check.md
 create mode 100644 .autoflow/02-prd/PRD.md
 create mode 100644 .autoflow/04-architecture/TDD.md
 create mode 100644 .autoflow/04-architecture/design-doc.md
 create mode 100644 .autoflow/progress.md
 create mode 100644 tests/conftest.py

diff --git a/.autoflow/00-onboarding/health-check.md b/.autoflow/00-onboarding/health-check.md
new file mode 100644
index 0000000..6c7c243
--- /dev/null
+++ b/.autoflow/00-onboarding/health-check.md
@@ -0,0 +1,141 @@
+# 專案健檢報告
+
+## 基本資訊
+
+- **專案名稱**：Cluster4NPU UI — Visual Pipeline Designer
+- **版本**：v0.0.3
+- **程式碼來源**：本地路徑 `C:\Users\sungs\Documents\abin\temp\cluster4npu`
+- **Git 分支**：developer（主分支為 main）
+- **最後 commit**：feat: Reorganize test scripts and improve YOLOv5 postprocessing
+- **健檢日期**：2026-04-05
+
+---
+
+## 技術堆疊
+
+| 層級 | 技術 | 版本 |
+|------|------|------|
+| 語言 | Python | >=3.9, <3.12 |
+| GUI 框架 | PyQt5 | >=5.15.11 |
+| 視覺節點編輯器 | NodeGraphQt | >=0.6.40 |
+| 影像處理 | OpenCV | (runtime dependency) |
+| 數值運算 | NumPy | (runtime dependency) |
+| 硬體 SDK | Kneron KP SDK | (runtime, NPU dongle 驅動) |
+| 套件管理 | uv | — |
+| 打包 | PyInstaller (main.spec) | — |
+
+**支援硬體：** Kneron NPU dongles — KL520、KL720、KL1080
+
+---
+
+## 專案結構概覽
+
+```
+cluster4npu/
+├── main.py                        # 應用程式入口點
+├── config/                        # 設定與主題 (settings.py, theme.py)
+├── core/
+│   ├── pipeline.py                # Pipeline 分析、stage 偵測、驗證
+│   ├── functions/
+│   │   ├── InferencePipeline.py   # 多 stage pipeline 執行引擎（多執行緒）
+│   │   ├── Multidongle.py         # NPU dongle 管理與自動偵測
+│   │   ├── camera_source.py       # 相機輸入來源
+│   │   ├── video_source.py        # 影片輸入來源
+│   │   ├── result_handler.py      # 推論結果處理
+│   │   ├── workflow_orchestrator.py
+│   │   ├── mflow_converter.py     # .mflow 格式轉換
+│   │   └── yolo_v5_postprocess_reference.py
+│   └── nodes/                     # 節點定義（5 種類型）
+│       ├── base_node.py
+│       ├── input_node.py
+│       ├── model_node.py
+│       ├── preprocess_node.py
+│       ├── postprocess_node.py
+│       ├── output_node.py
+│       ├── simple_input_node.py
+│       └── exact_nodes.py
+├── ui/
+│   ├── windows/                   # 主視窗（login.py, dashboard.py, pipeline_editor.py）
+│   ├── components/                # 可重用元件（node_palette, properties_widget, common_widgets）
+│   └── dialogs/                   # 對話框（deployment, performance, stage_config 等）
+├── utils/                         # 工具函式（file_utils, folder_dialog, ui_utils）
+├── example_utils/                 # 範例後處理工具（ByteTrack 等）
+├── tests/                         # 測試腳本（42 個，多為腳本式，非正式 test suite）
+├── resources/                     # 資源檔案
+└── output/                        # 推論輸出結果
+```
+
+---
+
+## 文件完整度
+
+| 文件類型 | 狀態 | 位置 | 備註 |
+|---------|------|------|------|
+| README | ✅ 有 | `README.md` | 詳細，含安裝、架構說明 |
+| 產品需求 / PRD | ⚠️ 部分 | `PROJECT_SUMMARY.md` | 有願景與待開發功能，但非正式 PRD 格式 |
+| 開發路線圖 | ✅ 有 | `DEVELOPMENT_ROADMAP.md` | 四個 Phase，有具體目標 |
+| 架構設計文件 | ❌ 無 | — | README 內有簡介，但無正式 Design Doc |
+| API 文件 | ❌ 無 | — | 無正式 API 文件 |
+| 設計稿 | ❌ 無 | 只有 `Flowchart.jpg` | 無 Wireframe 或 UI 規格 |
+| 技術設計文件 (TDD) | ❌ 無 | — | 無 |
+| 測試計畫 | ❌ 無 | — | 有測試腳本但無正式測試計畫 |
+| 部署文件 | ⚠️ 部分 | README 內 | 有基本步驟，無完整部署文件 |
+| Release Notes | ✅ 有 | `release_note.md` | 目前到 v0.0.2 |
+
+---
+
+## 程式碼健康度
+
+- **測試覆蓋率**：⚠️ 部分測試 — `tests/` 下有 42 個腳本，但多為情境測試腳本（非 pytest 單元測試），缺乏系統性覆蓋
+- **程式碼品質**：中等 — 有明確的模組分離；部分根目錄腳本（debug_*.py, force_cleanup.py 等）為開發過程遺留，結構略混亂
+- **安全性**：低風險（本地桌面應用，無網路 API）
+- **技術債**：
+  - 根目錄有多個 debug/cleanup 腳本未整理
+  - tests/ 下腳本命名與分類混亂（部分非 test_ 開頭）
+  - 缺乏正式的 pytest 測試架構
+
+---
+
+## 現有功能清單
+
+| 功能 | 描述 | 狀態 |
+|------|------|------|
+| 視覺化 Pipeline 編輯器 | 拖拽節點建立 Pipeline（NodeGraphQt） | ✅ 完成 |
+| 5 種節點類型 | Input / Preprocess / Model / Postprocess / Output | ✅ 完成 |
+| Pipeline 驗證 | 即時 stage 偵測與錯誤標示 | ✅ 完成 |
+| .mflow 檔案格式 | Pipeline 儲存與載入（JSON） | ✅ 完成 |
+| 多 NPU Dongle 支援 | KL520 / KL720 / KL1080 自動偵測 | ✅ 完成 |
+| 多 stage 推論引擎 | 多執行緒 Pipeline 執行 | ✅ 完成 |
+| 效能監控 | FPS、延遲即時顯示 | ✅ 完成（有 known bugs） |
+| 相機 / 影片 / 圖片輸入 | 多種輸入來源 | ✅ 完成 |
+| 專案管理 | 登入畫面、最近專案、新增/載入 Pipeline | ✅ 完成 |
+| YOLOv5 後處理 | 偵測結果格式化 | ✅ 完成（最近改善） |
+| ByteTrack 追蹤 | 物件追蹤後處理 | ✅ 完成（example_utils） |
+
+---
+
+## 缺失項目摘要（待開發）
+
+根據 `PROJECT_SUMMARY.md` 與 `DEVELOPMENT_ROADMAP.md`：
+
+1. **效能視覺化**：並行 vs 循序執行比較、Speedup 指標顯示（Phase 1）
+2. **Benchmarking 系統**：自動化效能測試、圖表比較（Phase 1）
+3. **裝置管理介面**：視覺化裝置分配、負載平衡（Phase 2）
+4. **即時監控 Dashboard**：FPS/延遲圖表、資源使用率（Phase 2）
+5. **優化引擎**：自動化建議、效能預測（Phase 3）
+
+已知 Bug：
+- 節點屬性顯示問題
+- 輸出視覺化（含後處理）
+
+---
+
+## CI/CD 與基礎設施
+
+| 項目 | 狀態 |
+|------|------|
+| Docker | ❌ 無 |
+| CI/CD | ❌ 無 |
+| 部署設定 | ❌ 無（本地桌面應用，有 PyInstaller spec） |
+| 環境變數管理 | ❌ 無 |
+| 版本控制 | ✅ Git（GitHub 遠端） |
diff --git a/.autoflow/02-prd/PRD.md b/.autoflow/02-prd/PRD.md
new file mode 100644
index 0000000..ff262be
--- /dev/null
+++ b/.autoflow/02-prd/PRD.md
@@ -0,0 +1,344 @@
+# PRD — Cluster4NPU UI
+
+> 此 PRD 為從既有程式碼與文件反向整理，反映截至 2026-04-05 的實際狀況。  
+> 版本：v0.0.3（developer 分支）
+
+---
+
+## 1. 產品概覽
+
+### 1.1 產品願景
+
+Cluster4NPU UI 的目標是讓任何人（不需要寫程式）都能夠透過直覺的視覺化拖拽介面，設計並執行平行 AI 推論 Pipeline，充分發揮 Kneron NPU Dongle 的硬體效能，並清楚看見平行處理帶來的效能提升。
+
+**一句話描述**：「用拖拽的方式設計 AI Pipeline，不需要一行程式碼，就能讓多個 NPU Dongle 平行加速你的 AI 推論工作。」
+
+### 1.2 目標用戶
+
+**主要用戶：AI 應用整合工程師 / 系統整合商**
+
+- 具備 AI 模型使用知識，但未必熟悉底層 NPU 程式設計
+- 需要快速驗證多模型串接 Pipeline 的效能
+- 希望在不修改程式碼的情況下調整 Pipeline 設定與硬體分配
+
+**次要用戶：AI 研究員 / 技術評估人員**
+
+- 需要比較不同 Pipeline 配置下的效能表現
+- 希望有可視化的數據佐證平行處理的效益（用於提案或報告）
+
+**潛在用戶：Kneron 硬體銷售團隊**
+
+- 需要 Demo 工具，向潛在客戶展示 Kneron NPU 的效能優勢
+
+### 1.3 核心價值主張
+
+1. **無程式碼 Pipeline 設計**：拖拽介面即可建立複雜多模型 AI Pipeline
+2. **平行效能可視化**：清楚顯示平行 vs 循序處理的效能差異（2x、3x、4x 加速）
+3. **硬體自動管理**：自動偵測並最佳化 NPU Dongle 分配，降低使用門檻
+4. **專業監控工具**：即時 FPS、延遲、吞吐量監控，滿足工程師級的分析需求
+
+---
+
+## 2. 市場背景
+
+### 2.1 問題陳述
+
+隨著 Edge AI 應用普及，使用者面臨以下問題：
+
+1. **設定複雜**：在多個 NPU Dongle 上執行平行 AI 推論需要撰寫大量底層程式碼
+2. **效能不透明**：難以量化平行處理帶來的效能增益，缺乏說服力
+3. **Pipeline 設計困難**：多模型串接（如 偵測 → 追蹤 → 分類）需要手動處理資料流
+4. **硬體管理負擔**：多個 NPU Dongle 的分配、監控、除錯缺乏統一工具
+
+### 2.2 目標市場
+
+- **主要市場**：使用 Kneron NPU 硬體（KL520、KL720、KL1080）的系統整合商與企業用戶
+- **市場範圍**：Edge AI 推論領域，偏向工業視覺、安全監控、智慧零售等應用場景
+- **地理範圍**：目前以繁體中文、英文環境為主（台灣、亞太地區）
+
+---
+
+## 3. 用戶故事
+
+以下用戶故事基於現有功能與規劃功能：
+
+**已實現的用戶故事：**
+
+- As a system integrator, I want to design an AI inference pipeline by dragging and dropping nodes, so that I can build complex multi-model workflows without writing code.
+- As a developer, I want to see real-time pipeline validation errors, so that I can fix configuration issues before deployment.
+- As a user, I want to save my pipeline configuration to a file (.mflow), so that I can reuse and share it with teammates.
+- As an engineer, I want to see live FPS and latency metrics during inference, so that I can monitor pipeline performance in real time.
+- As a hardware manager, I want the application to automatically detect available NPU dongles, so that I don't need to manually configure device connections.
+- As a user, I want to load video files, camera streams, or images as pipeline inputs, so that I can test my pipeline with different data sources.
+
+**待開發的用戶故事：**
+
+- As a user, I want to compare parallel vs sequential inference performance side by side, so that I can clearly see the speedup benefit of using multiple NPU dongles.
+- As an engineer, I want to run automated benchmarks with one click, so that I can measure performance without manual testing.
+- As a hardware manager, I want to visually assign NPU dongles to specific pipeline stages, so that I have fine-grained control over device allocation.
+- As a user, I want to see live performance graphs (FPS, latency over time), so that I can identify bottlenecks during pipeline execution.
+- As an engineer, I want to receive automated optimization suggestions, so that I can improve pipeline performance without deep NPU expertise.
+- As a sales engineer, I want to generate a performance report showing speedup metrics, so that I can present the ROI of parallel NPU processing to clients.
+
+---
+
+## 4. 功能需求
+
+### 4.1 已完成功能（現有）
+
+以下功能已在 v0.0.3 中實作完成（資料來源：健檢報告）：
+
+| 功能 | 描述 | 狀態 |
+|------|------|------|
+| 視覺化 Pipeline 編輯器 | 基於 NodeGraphQt 的拖拽節點介面 | 完成 |
+| 5 種節點類型 | Input / Preprocess / Model / Postprocess / Output | 完成 |
+| Pipeline 即時驗證 | 即時 Stage 偵測與錯誤標示 | 完成 |
+| .mflow 檔案格式 | Pipeline 儲存與載入（JSON 格式） | 完成 |
+| 三面板 UI 佈局 | 左：節點面板、中：編輯器、右：設定與監控 | 完成 |
+| 多 NPU Dongle 支援 | KL520 / KL720 / KL1080 自動偵測 | 完成 |
+| 多 Stage 推論引擎 | 基於多執行緒的平行 Pipeline 執行 | 完成 |
+| 效能基礎監控 | FPS、延遲即時顯示（有已知 Bug） | 完成（有瑕疵） |
+| 多種輸入來源 | 相機（USB）、影片（MP4/AVI/MOV）、圖片（JPG/PNG/BMP）、RTSP 串流（基本） | 完成 |
+| 專案管理 | 登入畫面、最近專案清單、新增 / 載入 Pipeline | 完成 |
+| YOLOv5 後處理 | 偵測結果格式化與邊界框處理 | 完成 |
+| ByteTrack 追蹤 | 物件追蹤後處理（example_utils） | 完成 |
+| 固件上傳支援 | upload_fw 選項與推論流程整合 | 完成（v0.0.2） |
+| PyInstaller 打包 | 獨立執行檔打包支援（main.spec） | 完成 |
+
+**已知 Bug（v0.0.2 記錄）：**
+
+- 節點屬性顯示問題
+- 輸出視覺化（含後處理結果）異常
+
+### 4.2 待開發功能（依優先級）
+
+#### Phase 1：效能視覺化（第 1-2 週，優先級：P0）
+
+**功能 1：平行 vs 循序效能比較**
+
+- **描述**：提供並行處理與循序處理的效能對照，視覺化顯示加速倍數（如 "3.2x FASTER"）
+- **驗收標準**：
+  - 可選擇「單裝置 / 多裝置」模式執行同一 Pipeline
+  - 顯示兩種模式的 FPS 與延遲數值
+  - 以視覺指標（進度條、倍數文字）呈現加速結果
+  - 比較結果可在 UI 中保留供查閱
+- **優先級**：P0
+- **所屬 Phase**：Phase 1
+
+**功能 2：自動化效能 Benchmark 系統（PerformanceBenchmarker）**
+
+- **描述**：一鍵啟動效能測試，自動執行單裝置與多裝置比較並記錄結果
+- **驗收標準**：
+  - 提供「執行 Benchmark」按鈕
+  - 自動完成測試並呈現結果圖表
+  - 結果可歷史保存（追蹤效能變化）
+  - 支援回歸測試（比較不同版本的效能）
+- **優先級**：P0
+- **所屬 Phase**：Phase 1
+
+**功能 3：即時效能儀表板（PerformanceDashboard）**
+
+- **描述**：在推論執行期間顯示即時 FPS、延遲、吞吐量折線圖
+- **驗收標準**：
+  - 以圖表形式顯示 FPS 隨時間變化
+  - 以圖表形式顯示延遲分佈
+  - 更新頻率 >= 1 Hz
+  - 不影響推論效能（CPU 使用率增加 < 5%）
+- **優先級**：P0
+- **所屬 Phase**：Phase 1
+
+#### Phase 2：裝置管理（第 3-4 週，優先級：P1）
+
+**功能 4：視覺化裝置管理面板（DeviceManagementPanel）**
+
+- **描述**：提供 NPU Dongle 狀態總覽，包含裝置健康度、型號、當前分配狀態
+- **驗收標準**：
+  - 列出所有已偵測的 NPU Dongle 及其狀態（線上/離線/繁忙）
+  - 顯示每個裝置的型號（KL520/KL720/KL1080）
+  - 顯示每個裝置當前分配至哪個 Pipeline Stage
+- **優先級**：P1
+- **所屬 Phase**：Phase 2
+
+**功能 5：手動裝置分配介面**
+
+- **描述**：允許用戶手動將特定 NPU Dongle 指定給特定 Pipeline Stage
+- **驗收標準**：
+  - 可透過下拉選單或拖拽方式指定裝置
+  - 指定後立即反映在 Pipeline 執行設定中
+  - 無效的分配（如指定離線裝置）會有錯誤提示
+- **優先級**：P1
+- **所屬 Phase**：Phase 2
+
+**功能 6：裝置效能分析（DeviceManager 強化）**
+
+- **描述**：追蹤個別 NPU Dongle 的效能指標與歷史記錄
+- **驗收標準**：
+  - 顯示每個裝置的推論吞吐量（Inference/sec）
+  - 顯示裝置使用率百分比
+  - 提供自動負載平衡建議
+- **優先級**：P1
+- **所屬 Phase**：Phase 2
+
+**功能 7：瓶頸偵測與警告系統**
+
+- **描述**：自動識別 Pipeline 中的效能瓶頸並發出警告
+- **驗收標準**：
+  - 當某 Stage 的佇列持續積壓時觸發警告
+  - 在 UI 中以視覺提示標示瓶頸 Stage
+  - 提供基本的改善建議（如增加裝置數量）
+- **優先級**：P1
+- **所屬 Phase**：Phase 2
+
+#### Phase 3：進階功能（第 5-6 週，優先級：P2）
+
+**功能 8：自動化優化引擎（OptimizationEngine）**
+
+- **描述**：分析當前 Pipeline 配置，自動產生效能優化建議
+- **驗收標準**：
+  - 分析 Stage 效能差異，建議最佳裝置分配方式
+  - 識別不必要的前後處理步驟並提出建議
+  - 建議以卡片形式呈現，用戶可選擇採納或忽略
+- **優先級**：P2
+- **所屬 Phase**：Phase 3
+
+**功能 9：Pipeline 設定範本**
+
+- **描述**：提供常見使用情境的預設 Pipeline 範本（如 YOLOv5 偵測、物件追蹤）
+- **驗收標準**：
+  - 提供至少 3 種常見範本
+  - 範本可直接載入並修改
+  - 現有 Pipeline 可儲存為自訂範本
+- **優先級**：P2
+- **所屬 Phase**：Phase 3
+
+**功能 10：效能預測（執行前估算）**
+
+- **描述**：在執行 Pipeline 之前，根據硬體設定預估效能表現
+- **驗收標準**：
+  - 顯示預估 FPS 與延遲範圍
+  - 預估值與實際值誤差 <= 20%（基於歷史資料）
+- **優先級**：P2
+- **所屬 Phase**：Phase 3
+
+#### Phase 4：專業潤色（第 7-8 週，優先級：P2）
+
+**功能 11：效能報告匯出**
+
+- **描述**：將 Benchmark 結果匯出為可分享的報告格式
+- **驗收標準**：
+  - 支援匯出為 PDF 或 CSV
+  - 報告包含：Pipeline 設定、裝置配置、效能指標、加速倍數
+- **優先級**：P2
+- **所屬 Phase**：Phase 4
+
+**功能 12：進階分析與趨勢圖**
+
+- **描述**：追蹤效能指標的歷史趨勢，識別長期的效能退化
+- **驗收標準**：
+  - 顯示多次執行的效能趨勢圖
+  - 支援篩選特定時間範圍
+- **優先級**：P2
+- **所屬 Phase**：Phase 4
+
+---
+
+## 5. 非功能需求
+
+### 5.1 效能需求
+
+- UI 互動回應時間 < 200ms（節點拖拽、屬性切換）
+- Pipeline 即時驗證延遲 < 100ms
+- 效能儀表板更新不得對推論 FPS 造成超過 5% 的影響
+- 應用程式啟動時間（含硬體偵測）< 10 秒
+
+### 5.2 相容性需求
+
+- **作業系統**：Windows 10/11（主要）；Linux（次要）
+- **Python 版本**：3.9 以上、3.12 以下
+- **硬體**：Kneron NPU Dongle（KL520、KL720、KL1080），USB 3.0 連接
+- **PyQt5 版本**：>= 5.15.11
+
+### 5.3 可用性需求
+
+- 首次使用者應能在 5 分鐘內完成基本 Pipeline 設計（拖拽 5 個節點並連接）
+- 節點設定面板需防止水平滾動條出現（已在 v0.0.2 修正）
+- 所有錯誤訊息應具有可讀性，避免技術術語
+
+### 5.4 可靠性需求
+
+- 重複執行推論不得出現錯誤（已在 v0.0.2 修正）
+- Pipeline 儲存（.mflow）需能完整還原節點設定與連接關係
+- 應用程式異常關閉後，下次啟動應能顯示最近專案清單
+
+### 5.5 可維護性需求
+
+- 新增節點類型需有對應的單元測試
+- 核心模組（InferencePipeline、Multidongle）需有 pytest 格式的測試覆蓋
+- 根目錄的 debug/cleanup 腳本應整理並移至 `tools/` 或 `tests/` 目錄
+
+---
+
+## 6. 成功指標
+
+### 6.1 核心使用目標（依產品階段）
+
+**Phase 1 完成標準（效能視覺化）：**
+- 用戶可在 3 步以內啟動 Benchmark 並看到加速倍數比較結果
+- 儀表板更新流暢（無明顯卡頓）
+
+**Phase 2 完成標準（裝置管理）：**
+- 用戶可在不修改程式碼的情況下手動調整裝置分配
+- 瓶頸偵測正確識別率 > 80%（在測試情境下）
+
+**Phase 3 完成標準（進階功能）：**
+- OptimizationEngine 建議的裝置分配方案，實際效能提升 > 10%
+- 提供至少 3 種可直接使用的 Pipeline 範本
+
+**整體產品品質標準：**
+- 已知 Bug（節點屬性顯示、輸出視覺化）全數修復
+- 完整的 pytest 測試覆蓋核心模組
+
+### 6.2 使用者體驗指標
+
+- Pipeline 設計完成時間（目標：首次使用 < 5 分鐘，熟悉後 < 2 分鐘）
+- Benchmark 一鍵啟動到結果呈現（目標：< 30 秒完成）
+
+---
+
+## 7. 超出範圍
+
+以下事項明確不在 v0.0.3 至 Phase 4 的開發範圍內：
+
+1. **雲端功能**：無雲端儲存、遠端執行、或 SaaS 服務
+2. **非 Kneron 硬體支援**：不支援其他廠商的 NPU（如 Hailo、Coral）
+3. **模型訓練**：本工具僅處理推論（Inference），不包含模型訓練功能
+4. **行動端 App**：僅為桌面應用（Windows / Linux）
+5. **多人協作**：不支援多人同時編輯同一 Pipeline
+6. **付費 / 授權系統**：目前無商業授權機制
+7. **自動語言切換 / 完整多語系**：目前以英文 UI 為主，無正式多語系支援
+8. **RTSP 串流完整支援**：RTSP 目前僅為基本支援，完整串流管理不在當前範圍
+
+---
+
+## 附錄
+
+### A. 版本歷史摘要
+
+| 版本 | 日期 | 主要變更 |
+|------|------|---------|
+| v0.0.1 | — | 初始版本（確切日期不明） |
+| v0.0.2 | 2025-07-31 | 自動資料清理、固件上傳支援、修復多次推論錯誤、FPS 修正 |
+| v0.0.3 | 進行中 | YOLOv5 後處理改善、測試腳本整理（developer 分支） |
+
+### B. 相關文件
+
+- 健檢報告：`C:\Users\sungs\Documents\abin\temp\cluster4npu\.autoflow\00-onboarding\health-check.md`
+- 開發路線圖：`C:\Users\sungs\Documents\abin\temp\cluster4npu\DEVELOPMENT_ROADMAP.md`
+- 專案摘要：`C:\Users\sungs\Documents\abin\temp\cluster4npu\PROJECT_SUMMARY.md`
+- README：`C:\Users\sungs\Documents\abin\temp\cluster4npu\README.md`
+
+### C. 技術限制說明
+
+- 本工具強依賴 Kneron KP SDK，SDK 版本更新可能影響硬體相容性
+- NodeGraphQt 的視覺編輯器版本（>= 0.6.40）限制了某些 UI 客製化能力
+- Python 版本限制（3.9–3.11）源自 PyQt5 與 Kneron SDK 的相容性需求
diff --git a/.autoflow/04-architecture/TDD.md b/.autoflow/04-architecture/TDD.md
new file mode 100644
index 0000000..89977d3
--- /dev/null
+++ b/.autoflow/04-architecture/TDD.md
@@ -0,0 +1,1149 @@
+# TDD — Cluster4NPU UI
+
+## 作者：Architect Agent
+## 狀態：Draft
+## 最後更新：2026-04-05
+## 版本對應：v0.0.3（developer 分支）
+
+---
+
+## 1. 模組清單
+
+| 模組路徑 | 類別/主要函式 | 狀態 |
+|---------|------------|------|
+| `core/pipeline.py` | `PipelineStage`、`analyze_pipeline_stages`、`validate_pipeline_structure`、`get_pipeline_summary` | 已完成 |
+| `core/nodes/base_node.py` | `BaseNodeWithProperties`、`create_node_property_widget` | 已完成 |
+| `core/nodes/input_node.py` | `InputNode` | 已完成 |
+| `core/nodes/model_node.py` | `ModelNode` | 已完成 |
+| `core/nodes/preprocess_node.py` | `PreprocessNode` | 已完成 |
+| `core/nodes/postprocess_node.py` | `PostprocessNode` | 已完成 |
+| `core/nodes/output_node.py` | `OutputNode` | 已完成 |
+| `core/nodes/simple_input_node.py` | `SimpleInputNode` | 已完成 |
+| `core/nodes/exact_nodes.py` | `ExactInputNode`、`ExactModelNode` 等 | 已完成 |
+| `core/functions/InferencePipeline.py` | `StageConfig`、`PipelineData`、`PipelineStage`（執行）、`InferencePipeline` | 已完成 |
+| `core/functions/Multidongle.py` | `MultiDongle`、`PreProcessor`、`PostProcessor`、`PostProcessorOptions`、結果資料類別 | 已完成 |
+| `core/functions/camera_source.py` | 相機輸入來源 | 已完成 |
+| `core/functions/video_source.py` | 影片輸入來源 | 已完成 |
+| `core/functions/result_handler.py` | 推論結果處理 | 已完成 |
+| `core/functions/mflow_converter.py` | .mflow 格式轉換 | 已完成 |
+| `core/functions/workflow_orchestrator.py` | 工作流程協調 | 已完成 |
+| `core/functions/yolo_v5_postprocess_reference.py` | YOLOv5 後處理（參考實作） | 已完成 |
+| `example_utils/` | ByteTrack 物件追蹤等後處理工具 | 已完成 |
+| `main.py` | `SingleInstance`、`setup_application`、`main` | 已完成 |
+| `ui/windows/login.py` | `DashboardLogin` | 已完成 |
+| `ui/windows/dashboard.py` | `DashboardWindow` | 已完成 |
+| `ui/windows/pipeline_editor.py` | `PipelineEditor` | 已完成 |
+| `ui/components/` | `NodePalette`、`PropertiesWidget`、通用 Widget | 已完成 |
+| `ui/dialogs/` | 部署、效能、Stage 設定對話框 | 已完成 |
+| `config/settings.py` | 應用程式設定 | 已完成 |
+| `config/theme.py` | Qt 主題套用 | 已完成 |
+| **`core/performance/benchmarker.py`** | `PerformanceBenchmarker`（待開發） | 待開發 |
+| **`core/performance/history.py`** | `PerformanceHistory`（待開發） | 待開發 |
+| **`core/device/device_manager.py`** | `DeviceManager`（待開發） | 待開發 |
+| **`core/optimization/engine.py`** | `OptimizationEngine`（待開發） | 待開發 |
+| **`ui/components/performance_dashboard.py`** | `PerformanceDashboard`（待開發） | 待開發 |
+| **`ui/components/device_management_panel.py`** | `DeviceManagementPanel`（待開發） | 待開發 |
+
+---
+
+## 2. 現有模組技術規格
+
+### 2.1 `core/pipeline.py`
+
+#### 公開介面
+
+```python
+def analyze_pipeline_stages(node_graph) -> List[PipelineStage]
+def get_stage_count(node_graph) -> int
+def validate_pipeline_structure(node_graph) -> Tuple[bool, str]
+def get_pipeline_summary(node_graph) -> Dict[str, Any]
+def find_connected_nodes(node, visited=None, direction='forward') -> List
+```
+
+#### `PipelineStage` 類別
+
+```python
+class PipelineStage:
+    stage_id: int
+    model_node: ModelNode
+    preprocess_nodes: List[PreprocessNode]
+    postprocess_nodes: List[PostprocessNode]
+    input_connections: list
+    output_connections: list
+
+    def add_preprocess_node(node: PreprocessNode) -> None
+    def add_postprocess_node(node: PostprocessNode) -> None
+    def get_stage_config() -> Dict[str, Any]
+    # 回傳格式：
+    # {
+    #   'stage_id': int,
+    #   'model_config': Dict,        # from ModelNode.get_inference_config()
+    #   'preprocess_configs': List,  # from PreprocessNode.get_preprocessing_config()
+    #   'postprocess_configs': List  # from PostprocessNode.get_postprocessing_config()
+    # }
+    def validate_stage() -> Tuple[bool, str]
+```
+
+#### `get_pipeline_summary` 回傳格式
+
+```python
+{
+    'stage_count': int,
+    'valid': bool,
+    'error': Optional[str],
+    'stages': List[Dict],        # 各 Stage 的 get_stage_config() 結果
+    'total_nodes': int,
+    'input_nodes': int,
+    'output_nodes': int,
+    'model_nodes': int,
+    'preprocess_nodes': int,
+    'postprocess_nodes': int
+}
+```
+
+#### 依賴關係
+
+- 輸入：NodeGraphQt `NodeGraph` 物件
+- 依賴：`core/nodes/*.py`（型別識別）
+- 輸出：純 Python 資料結構（Dict、List）
+
+#### 節點識別策略（多重 fallback）
+
+下列任一條件成立即判定為 ModelNode：
+1. `node.__identifier__` 包含 "model"
+2. `node.type_` 包含 "model"
+3. `node.NODE_NAME` 包含 "model"
+4. `type(node)` 名稱包含 "model"
+5. `hasattr(node, 'get_inference_config')` 為 True
+6. `type(node)` 名稱包含 "exactmodel"
+
+---
+
+### 2.2 `core/nodes/base_node.py`
+
+#### `BaseNodeWithProperties` 類別
+
+```python
+class BaseNodeWithProperties(NodeGraphQt.BaseNode):
+    # 內部狀態
+    _property_options: Dict[str, Any]
+    _property_validators: Dict[str, callable]
+    _business_properties: Dict[str, Any]
+
+    # 主要方法
+    def create_business_property(
+        name: str,
+        default_value: Any,
+        options: Optional[Dict[str, Any]] = None
+    ) -> None
+
+    def validate_property(name: str, value: Any) -> bool
+    # 驗證規則（options 格式）：
+    # - {'min': X, 'max': Y}  → 數值範圍驗證
+    # - [choice1, choice2]    → 選項列表驗證
+    # - {'type': 'file_path'} → 檔案路徑（不驗證存在）
+
+    def get_node_config() -> Dict[str, Any]
+    # 回傳格式：
+    # {
+    #   'type': str,         # 類別名稱
+    #   'name': str,         # 節點顯示名稱
+    #   'properties': Dict,  # 所有 business properties
+    #   'position': Tuple    # (x, y)
+    # }
+
+    def load_node_config(config: Dict[str, Any]) -> None
+    def update_business_property(name: str, value: Any) -> bool  # 含驗證
+```
+
+#### `create_node_property_widget` 函式
+
+```python
+def create_node_property_widget(
+    node: BaseNodeWithProperties,
+    prop_name: str,
+    prop_value: Any,
+    options: Optional[Dict[str, Any]] = None
+) -> QWidget
+```
+
+| prop_value 型別 | options 條件 | 產生 Widget |
+|--------------|-------------|-----------|
+| Any | `options.get('type') == 'file_path'` | `QPushButton`（開啟檔案對話框） |
+| bool | — | `QCheckBox` |
+| Any | `isinstance(options, list)` | `QComboBox` |
+| int | — | `QSpinBox`（min/max 取自 options） |
+| float | — | `QDoubleSpinBox`（min/max/decimals/step） |
+| str（預設） | — | `QLineEdit` |
+
+---
+
+### 2.3 `core/nodes/model_node.py`
+
+#### `ModelNode` 類別
+
+```python
+class ModelNode(BaseNodeWithProperties):
+    __identifier__ = 'com.cluster.model_node'
+    NODE_NAME = 'Model Node'
+
+    # 連接埠
+    # input: 'input'（single，橙色 #FF8C00）
+    # output: 'output'（綠色 #00FF00）
+    # 節點顏色：RGB(65, 84, 102)
+
+    def validate_configuration() -> Tuple[bool, str]
+    # 驗證規則：
+    # - model_path 不能為空
+    # - dongle_series 必須為 '520', '720', '1080', 'Custom' 之一
+    # - num_dongles 必須為 int 且 >= 1
+
+    def get_inference_config() -> Dict[str, Any]
+    # 回傳格式：
+    # {
+    #   'node_id': str,
+    #   'node_name': str,
+    #   'model_path': str,
+    #   'dongle_series': str,
+    #   'num_dongles': int,
+    #   'port_id': str,
+    #   'batch_size': int,
+    #   'max_queue_size': int,
+    #   'enable_preprocessing': bool,
+    #   'enable_postprocessing': bool
+    # }
+
+    def get_hardware_requirements() -> Dict[str, Any]
+    # 包含 dongle_series, num_dongles, port_id,
+    # estimated_memory (MB), estimated_power (W)
+```
+
+#### ModelNode 屬性規格
+
+| 屬性名稱 | 型別 | 預設值 | 驗證 |
+|---------|------|--------|------|
+| `model_path` | file_path | `''` | 非空（執行前） |
+| `dongle_series` | choice | `'520'` | 必須為 520/720/1080/Custom |
+| `num_dongles` | int | `1` | 1–16 |
+| `port_id` | str | `''` | 無（auto 接受） |
+| `batch_size` | int | `1` | 1–32 |
+| `max_queue_size` | int | `10` | 1–100 |
+| `enable_preprocessing` | bool | `True` | — |
+| `enable_postprocessing` | bool | `True` | — |
+
+---
+
+### 2.4 `core/functions/InferencePipeline.py`
+
+#### 資料結構
+
+```python
+@dataclass
+class StageConfig:
+    stage_id: str
+    port_ids: List[int]
+    scpu_fw_path: str
+    ncpu_fw_path: str
+    model_path: str
+    upload_fw: bool
+    max_queue_size: int = 50
+    multi_series_config: Optional[Dict[str, Any]] = None
+    input_preprocessor: Optional[PreProcessor] = None
+    output_postprocessor: Optional[PostProcessor] = None
+    stage_preprocessor: Optional[PreProcessor] = None
+    stage_postprocessor: Optional[PostProcessor] = None
+
+@dataclass
+class PipelineData:
+    data: Any
+    metadata: Dict[str, Any]  # 包含 start_timestamp, end_timestamp, total_processing_time
+    stage_results: Dict[str, Any]  # key = stage_id
+    pipeline_id: str               # 格式："pipeline_{counter}"
+    timestamp: float
+```
+
+#### `InferencePipeline` 類別
+
+```python
+class InferencePipeline:
+    def __init__(
+        stage_configs: List[StageConfig],
+        final_postprocessor: Optional[PostProcessor] = None,
+        pipeline_name: str = "InferencePipeline"
+    )
+
+    # 生命週期
+    def initialize() -> None    # 初始化所有 Stage（Sequential）
+    def start() -> None         # 啟動 Coordinator Thread + 所有 Stage Workers
+    def stop() -> None          # 優雅停止（Sentinel 模式 + join）
+
+    # 資料 I/O
+    def put_data(data: Any, timeout: float = 1.0) -> bool
+    # 若輸入佇列已滿：捨棄最舊的幀（即時性優先）
+
+    def get_result(timeout: float = 0.1) -> Optional[PipelineData]
+
+    # 回調設定
+    def set_result_callback(callback: Callable[[PipelineData], None]) -> None
+    def set_error_callback(callback: Callable[[PipelineData], None]) -> None
+    def set_stats_callback(callback: Callable[[Dict[str, Any]], None]) -> None
+
+    # 效能
+    def get_current_fps() -> float
+    # 計算公式：completed_counter / (now - fps_start_time)
+    # fps_start_time 設定時機：第一個有效結果完成時
+
+    def get_pipeline_statistics() -> Dict[str, Any]
+    # 回傳格式：
+    # {
+    #   'pipeline_name': str,
+    #   'total_stages': int,
+    #   'pipeline_input_submitted': int,
+    #   'pipeline_completed': int,
+    #   'pipeline_errors': int,
+    #   'pipeline_input_queue_size': int,
+    #   'pipeline_output_queue_size': int,
+    #   'current_fps': float,
+    #   'stage_statistics': List[Dict]  # 每個 Stage 的統計
+    # }
+
+    def start_stats_reporting(interval: float = 5.0) -> None
+```
+
+#### `PipelineStage`（執行層，與 pipeline.py 中的 PipelineStage 不同）
+
+```python
+class PipelineStage:  # 在 InferencePipeline.py 中
+    def put_data(data: PipelineData, timeout: float = 1.0) -> bool
+    def get_result(timeout: float = 0.1) -> Optional[PipelineData]
+    def get_statistics() -> Dict[str, Any]
+    # 回傳格式：
+    # {
+    #   'stage_id': str,
+    #   'processed_count': int,
+    #   'error_count': int,
+    #   'avg_processing_time': float,
+    #   'input_queue_size': int,
+    #   'output_queue_size': int,
+    #   'multidongle_stats': Dict  # 來自 MultiDongle.get_statistics()
+    # }
+```
+
+#### 佇列規格
+
+| 佇列 | maxsize | 滿時策略 |
+|------|---------|---------|
+| `pipeline_input_queue` | 100 | 捨棄最舊幀（即時性） |
+| Stage `input_queue` | `StageConfig.max_queue_size`（預設 50） | 捨棄並 drop |
+| Stage `output_queue` | `StageConfig.max_queue_size`（預設 50） | 捨棄最舊並替換 |
+| `pipeline_output_queue` | 50 | 捨棄最舊結果（預防性清理） |
+
+#### 有效推論結果判斷邏輯
+
+`_has_valid_inference_result(pipeline_data)` 判斷條件：
+
+```python
+# 有效結果（計入 FPS / 放入輸出佇列）：
+# - Tuple (prob, result_str)：prob is not None and result_str not in ['Processing']
+# - Dict：status 不為 "processing"/"async"，且 result 不為 "Processing"
+
+# 無效結果（不計入 FPS，丟棄）：
+# - {'status': 'processing'} 或 {'status': 'async'}
+# - {'result': 'Processing'}
+# - 空結果
+```
+
+---
+
+### 2.5 `core/functions/Multidongle.py`
+
+#### 結果資料類別
+
+```python
+@dataclass
+class BoundingBox:
+    x1: int; y1: int; x2: int; y2: int
+    score: float
+    class_num: int
+    class_name: str
+
+@dataclass
+class ObjectDetectionResult:
+    class_count: int
+    box_count: int
+    box_list: List[BoundingBox]
+    # Letterbox 映射資訊（用於座標還原）：
+    model_input_width: int; model_input_height: int
+    resized_img_width: int; resized_img_height: int
+    pad_left: int; pad_top: int; pad_right: int; pad_bottom: int
+
+@dataclass
+class ClassificationResult:
+    probability: float
+    class_name: str
+    class_num: int
+    confidence_threshold: float
+    # property: is_positive -> probability > confidence_threshold
+```
+
+#### 後處理類型（PostProcessType Enum）
+
+```python
+class PostProcessType(Enum):
+    FIRE_DETECTION = "fire_detection"   # 二元分類（火焰偵測）
+    YOLO_V3 = "yolo_v3"                 # 物件偵測
+    YOLO_V5 = "yolo_v5"                 # 物件偵測（使用參考實作）
+    CLASSIFICATION = "classification"   # 一般分類
+    RAW_OUTPUT = "raw_output"           # 原始 numpy 輸出
+```
+
+#### `PostProcessorOptions` 設定
+
+```python
+@dataclass
+class PostProcessorOptions:
+    postprocess_type: PostProcessType = PostProcessType.FIRE_DETECTION
+    threshold: float = 0.5             # 信心度閾值
+    class_names: List[str] = []        # 類別名稱列表
+    nms_threshold: float = 0.45        # NMS 閾值（YOLO）
+    max_detections_per_class: int = 100
+```
+
+#### `PreProcessor` 處理流程
+
+```python
+class PreProcessor(DataProcessor):
+    def process(frame: np.ndarray, target_size: tuple, target_format: str) -> np.ndarray:
+        # Step 1: resize（預設使用 cv2.resize）
+        # Step 2: format convert
+        #   - 'BGR565'   → cv2.cvtColor(frame, COLOR_BGR2BGR565)
+        #   - 'RGB8888'  → cv2.cvtColor(frame, COLOR_BGR2RGBA)
+        #   - 其他格式   → 直接回傳
+```
+
+#### 裝置算力規格（DongleSeriesSpec）
+
+```python
+SERIES_SPECS = {
+    "KL520": {"product_id": 0x100,  "gops": 2},
+    "KL720": {"product_id": 0x720,  "gops": 28},
+    "KL630": {"product_id": 0x630,  "gops": 400},
+    "KL730": {"product_id": 0x730,  "gops": 1600},
+}
+```
+
+---
+
+## 3. 待開發功能技術規格
+
+### Phase 1：效能視覺化 / Benchmarking
+
+#### 3.1.1 `PerformanceBenchmarker`（`core/performance/benchmarker.py`）
+
+**職責：** 自動執行單裝置 vs 多裝置效能測試，計算加速倍數。
+
+```python
+@dataclass
+class BenchmarkConfig:
+    pipeline_config: List[StageConfig]   # 來自 UI 的 Pipeline 設定
+    test_duration_seconds: float = 30.0  # 測試持續時間
+    warmup_frames: int = 50              # 熱機幀數（不計入統計）
+    test_input_source: str               # 測試輸入（影片/相機）
+
+@dataclass
+class BenchmarkResult:
+    mode: str                            # 'sequential' | 'parallel'
+    fps: float
+    avg_latency_ms: float
+    p95_latency_ms: float
+    total_frames: int
+    timestamp: float
+    device_config: Dict[str, Any]        # 裝置分配設定
+
+class PerformanceBenchmarker:
+    def run_sequential_benchmark(config: BenchmarkConfig) -> BenchmarkResult
+    # 強制使用單一 Dongle 執行 Pipeline
+
+    def run_parallel_benchmark(config: BenchmarkConfig) -> BenchmarkResult
+    # 使用全部可用 Dongle 執行 Pipeline
+
+    def calculate_speedup(
+        seq: BenchmarkResult,
+        par: BenchmarkResult
+    ) -> float
+    # 計算公式：par.fps / seq.fps
+
+    def run_full_benchmark(config: BenchmarkConfig) -> Tuple[BenchmarkResult, BenchmarkResult, float]
+    # 回傳：(sequential_result, parallel_result, speedup)
+```
+
+**執行序列：**
+1. 暖機（warmup_frames 幀，不計入）
+2. 循序模式：強制單一 Dongle → 收集 `test_duration_seconds` 秒資料
+3. 清空佇列 + 重啟 Pipeline
+4. 平行模式：使用全部 Dongle → 收集相同時間資料
+5. 計算加速倍數
+
+#### 3.1.2 `PerformanceHistory`（`core/performance/history.py`）
+
+**職責：** 本地儲存 Benchmark 歷史記錄，支援回歸測試追蹤。
+
+```python
+class PerformanceHistory:
+    def __init__(storage_path: str = "~/.cluster4npu/benchmark_history.json")
+
+    def record(result: BenchmarkResult) -> None
+    def get_history(
+        limit: int = 50,
+        mode: Optional[str] = None
+    ) -> List[BenchmarkResult]
+
+    def get_regression_report(
+        baseline_id: str,
+        compare_id: str
+    ) -> Dict[str, Any]
+    # 比較兩次測試的 FPS/延遲差異
+```
+
+**儲存格式（JSON）：**
+```json
+{
+  "records": [
+    {
+      "id": "benchmark_20260405_143022",
+      "mode": "parallel",
+      "fps": 45.2,
+      "avg_latency_ms": 22.1,
+      "p95_latency_ms": 35.0,
+      "total_frames": 1356,
+      "timestamp": 1743856222.0,
+      "device_config": {"KL720": 2}
+    }
+  ]
+}
+```
+
+#### 3.1.3 `PerformanceDashboard`（`ui/components/performance_dashboard.py`）
+
+**職責：** 顯示即時 FPS 和延遲折線圖，更新頻率 >= 1 Hz。
+
+```python
+class PerformanceDashboard(QWidget):
+    # Qt Signals
+    update_requested = pyqtSignal(dict)  # 接收來自 InferencePipeline 的統計資料
+
+    def __init__(parent: Optional[QWidget] = None)
+
+    def update_stats(stats: Dict[str, Any]) -> None
+    # stats 格式與 InferencePipeline.get_pipeline_statistics() 相同
+
+    def reset() -> None  # 清空圖表歷史
+
+    def set_display_window(seconds: int = 60) -> None
+    # 設定圖表顯示的時間視窗（秒）
+```
+
+**效能約束：**
+- 圖表更新不得使推論 FPS 下降超過 5%（使用 `QTimer` 限速 + 背景執行緒計算）
+- 建議使用 pyqtgraph（效能優於 matplotlib）
+
+#### 3.1.4 Benchmark 觸發 UI（`ui/dialogs/benchmark_dialog.py`）
+
+**職責：** 一鍵啟動 Benchmark 的對話框，顯示進度與結果。
+
+```python
+class BenchmarkDialog(QDialog):
+    def __init__(
+        parent: QWidget,
+        pipeline_config: List[StageConfig]
+    )
+
+    # 顯示：
+    # - 進度條（熱機 / 循序測試 / 平行測試）
+    # - 即時 FPS 顯示
+    # - 完成後：加速倍數（大字體，如 "3.2x FASTER"）
+    # - 循序 vs 平行的 FPS 與延遲對比表
+```
+
+---
+
+### Phase 2：裝置管理
+
+#### 3.2.1 `DeviceManager`（`core/device/device_manager.py`）
+
+**職責：** 擴展 MultiDongle 的裝置管理能力，提供視覺化分配介面所需的資料。
+
+```python
+@dataclass
+class DeviceInfo:
+    device_id: str              # 唯一識別碼（如 USB port 位置）
+    series: str                 # "KL520" | "KL720" | "KL1080"
+    product_id: int             # 來自 DongleSeriesSpec
+    status: str                 # "online" | "offline" | "busy"
+    gops: int                   # 算力（來自 DongleSeriesSpec）
+    assigned_stage: Optional[str]  # 目前分配的 Stage ID
+    current_fps: float          # 當前推論吞吐量
+    utilization_pct: float      # 使用率百分比（0.0–100.0）
+
+@dataclass
+class DeviceHealth:
+    device_id: str
+    temperature_celsius: Optional[float]  # 如 SDK 支援
+    error_count: int
+    last_error: Optional[str]
+    uptime_seconds: float
+
+class DeviceManager:
+    def scan_devices() -> List[DeviceInfo]
+    # 呼叫 Kneron KP SDK 掃描已連接的 Dongle
+
+    def get_device_health(device_id: str) -> DeviceHealth
+
+    def assign_device(device_id: str, stage_id: str) -> bool
+    # 若 device_id 離線或已分配至其他 Stage → 回傳 False
+
+    def unassign_device(device_id: str) -> bool
+
+    def get_load_balance_recommendation(
+        stages: List[str]
+    ) -> Dict[str, str]
+    # 依據 DongleSeriesSpec.gops 分配裝置給各 Stage
+    # 回傳格式：{stage_id: device_id}
+
+    def get_device_statistics() -> Dict[str, DeviceInfo]
+    # 回傳所有裝置的即時狀態
+```
+
+**負載平衡演算法（初版）：**
+- 計算每個 Stage 的推論需求（GOPS）
+- 依 Dongle 的 GOPS 算力做比例分配
+- 優先分配高算力 Dongle 給第一個（或最繁忙的）Stage
+
+#### 3.2.2 `DeviceManagementPanel`（`ui/components/device_management_panel.py`）
+
+**職責：** 顯示所有 NPU Dongle 的即時狀態與分配情況。
+
+```python
+class DeviceManagementPanel(QWidget):
+    device_assignment_changed = pyqtSignal(str, str)  # (device_id, stage_id)
+
+    def __init__(device_manager: DeviceManager, parent: Optional[QWidget] = None)
+
+    def refresh() -> None
+    # 重新掃描裝置並更新 UI
+
+    def set_auto_refresh(interval_ms: int = 2000) -> None
+    # 設定自動刷新間隔
+```
+
+**UI 顯示內容：**
+- 每個 Dongle 一張卡片：型號、狀態指示燈、目前分配 Stage、即時 FPS
+- 下拉選單允許手動更改分配
+- 「自動平衡」按鈕呼叫 `get_load_balance_recommendation()`
+
+#### 3.2.3 瓶頸偵測（整合至 `InferencePipeline`）
+
+**觸發條件：** 某 Stage 的 `input_queue.qsize() > max_queue_size * 0.8` 持續超過 5 秒。
+
+```python
+@dataclass
+class BottleneckAlert:
+    stage_id: str
+    queue_fill_rate: float       # 佇列使用率（0.0–1.0）
+    suggested_action: str        # 如 "增加此 Stage 的 Dongle 數量"
+    severity: str                # "warning" | "critical"
+
+# 在 InferencePipeline 中新增：
+def get_bottleneck_alerts() -> List[BottleneckAlert]
+def set_bottleneck_callback(callback: Callable[[BottleneckAlert], None]) -> None
+```
+
+---
+
+### Phase 3：進階功能
+
+#### 3.3.1 `OptimizationEngine`（`core/optimization/engine.py`）
+
+**職責：** 分析 Pipeline 執行統計，產生可執行的優化建議。
+
+```python
+@dataclass
+class OptimizationSuggestion:
+    suggestion_id: str
+    type: str               # "rebalance_devices" | "remove_node" | "add_devices" | "adjust_queue"
+    description: str        # 使用者可讀的說明（避免技術術語）
+    estimated_improvement_pct: float   # 預估改善百分比
+    confidence: str         # "high" | "medium" | "low"
+    action_params: Dict[str, Any]      # 執行建議所需的參數
+
+class OptimizationEngine:
+    def analyze_pipeline(
+        stats: Dict[str, Any]   # 來自 InferencePipeline.get_pipeline_statistics()
+    ) -> List[OptimizationSuggestion]
+
+    def apply_suggestion(
+        suggestion: OptimizationSuggestion,
+        device_manager: DeviceManager
+    ) -> bool
+
+    def predict_performance(
+        config: List[StageConfig],
+        available_devices: List[DeviceInfo]
+    ) -> Dict[str, float]
+    # 回傳格式：
+    # {
+    #   'estimated_fps': float,
+    #   'estimated_latency_ms': float,
+    #   'confidence_range': Tuple[float, float]  # [min, max] FPS
+    # }
+```
+
+**優化規則（初版）：**
+1. `rebalance_devices`：若某 Stage 的佇列使用率 > 70%，建議將算力較高的 Dongle 分配給該 Stage
+2. `adjust_queue`：若 `avg_processing_time` 差異超過 2 倍，建議調整佇列大小
+3. `add_devices`：若所有 Dongle 使用率 > 85%，建議增加更多 Dongle
+
+#### 3.3.2 Pipeline 設定範本（`core/templates/`）
+
+**職責：** 提供常見使用情境的預設 Pipeline 範本。
+
+```python
+@dataclass
+class PipelineTemplate:
+    template_id: str
+    name: str               # 如 "YOLOv5 物件偵測"
+    description: str
+    nodes: List[Dict]       # 節點定義（與 .mflow 格式相同）
+    connections: List[Dict]
+
+class TemplateManager:
+    def get_builtin_templates() -> List[PipelineTemplate]
+    # 至少提供 3 種範本：
+    # 1. YOLOv5 物件偵測（Input → Preprocess → Model → Postprocess → Output）
+    # 2. 火焰偵測分類（Input → Model → Postprocess → Output）
+    # 3. 雙模型串接（Input → Model1 → Postprocess1 → Model2 → Postprocess2 → Output）
+
+    def save_as_template(
+        pipeline_config: Dict,
+        name: str,
+        description: str
+    ) -> PipelineTemplate
+
+    def load_template(template_id: str) -> PipelineTemplate
+```
+
+---
+
+### Phase 4：效能報告匯出（PDF/CSV）
+
+#### 3.4.1 模組位置
+
+```
+core/
+└── performance/
+    └── report_exporter.py   # ReportExporter 主類別、ReportData dataclass
+```
+
+UI 觸發入口：`ui/dialogs/export_report_dialog.py`（`ExportReportDialog`，讓使用者選擇格式與儲存路徑）
+
+#### 3.4.2 函式庫選型
+
+| 用途 | 選用函式庫 | 理由 |
+|------|-----------|------|
+| PDF 匯出 | `reportlab` | 功能完整、純 Python、無系統依賴、在 PyQt5 環境下相容性佳；支援表格、圖表嵌入、自訂版面 |
+| CSV 匯出 | `csv`（標準庫） | 零依賴、足夠滿足表格資料匯出需求 |
+| 圖表截圖（嵌入 PDF） | `pyqtgraph` + `QPixmap.grabWidget()` | 直接從現有 `PerformanceDashboard` 擷取折線圖，不需要重新繪製 |
+
+**不選 `fpdf2` 的原因：** 嵌入 pyqtgraph 圖表截圖時需要額外影像處理步驟，reportlab 的 `ImageReader` 對 PIL/QPixmap 轉換更直接。
+
+安裝指令（加入 `requirements.txt`）：
+```
+reportlab>=4.0.0
+```
+
+#### 3.4.3 資料結構
+
+```python
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import List, Optional, Dict, Any
+import time
+
+@dataclass
+class DeviceSummary:
+    """單一裝置的摘要資訊，來自 DeviceManager"""
+    device_id: str
+    product_name: str          # 如 "KL720"
+    firmware_version: str
+    is_active: bool
+
+@dataclass
+class ReportData:
+    """
+    報告所需的完整資料，由呼叫方（UI 層）從各模組收集後傳入 ReportExporter。
+    設計為純資料容器，與 UI / SDK 解耦，方便單元測試。
+    """
+    # 報告基本資訊
+    report_title: str = "效能測試報告"
+    generated_at: float = field(default_factory=time.time)   # UNIX timestamp
+    pipeline_name: str = ""        # 來自 .mflow 檔名或使用者命名
+
+    # Benchmark 結果（來自 PerformanceBenchmarker.run_full_benchmark()）
+    sequential_result: Optional[Any] = None   # BenchmarkResult
+    parallel_result: Optional[Any] = None     # BenchmarkResult
+    speedup: Optional[float] = None           # par.fps / seq.fps
+
+    # 歷史記錄（來自 PerformanceHistory.get_history()）
+    history_records: List[Any] = field(default_factory=list)  # List[BenchmarkResult]
+
+    # 裝置資訊（來自 DeviceManager.get_all_devices()）
+    devices: List[DeviceSummary] = field(default_factory=list)
+
+    # 圖表截圖（由 UI 層在匯出前擷取）
+    chart_image_bytes: Optional[bytes] = None  # PNG bytes，來自 PerformanceDashboard
+```
+
+#### 3.4.4 核心類別設計：`ReportExporter`
+
+```python
+from pathlib import Path
+from typing import Optional
+import csv, io, time
+
+class ReportExporter:
+    """
+    負責將 ReportData 序列化為 PDF 或 CSV 檔案。
+    無狀態設計（stateless）：每次匯出建立新實例或直接呼叫靜態方法。
+    """
+
+    # --- PDF 匯出 ---
+
+    def export_pdf(
+        self,
+        data: ReportData,
+        output_path: str | Path
+    ) -> Path:
+        """
+        將完整效能報告匯出為 PDF。
+        回傳實際寫入的檔案路徑。
+        若 output_path 的父目錄不存在，自動建立。
+        """
+
+    def _build_cover_page(
+        self,
+        canvas,         # reportlab.pdfgen.canvas.Canvas
+        data: ReportData
+    ) -> None:
+        """繪製封面：報告標題、生成時間、Pipeline 名稱、裝置清單"""
+
+    def _build_benchmark_table(
+        self,
+        story: list,    # reportlab Flowable 清單
+        data: ReportData
+    ) -> None:
+        """
+        建立 Benchmark 結果對比表（reportlab Table）。
+        欄位：指標 / 循序模式 / 平行模式 / 差異%
+        指標：FPS、平均延遲(ms)、P95 延遲(ms)、總幀數
+        """
+
+    def _build_trend_chart(
+        self,
+        story: list,
+        data: ReportData
+    ) -> None:
+        """
+        若 data.chart_image_bytes 不為 None，將圖表 PNG 嵌入 PDF。
+        若為 None，插入「無圖表資料」的提示文字。
+        """
+
+    def _build_history_table(
+        self,
+        story: list,
+        data: ReportData
+    ) -> None:
+        """
+        建立歷史記錄表（最多顯示 20 筆，超過則截斷並標注）。
+        欄位：測試時間 / 模式 / FPS / 平均延遲(ms) / P95 延遲(ms)
+        """
+
+    def _build_device_info(
+        self,
+        story: list,
+        data: ReportData
+    ) -> None:
+        """列出測試時連接的裝置清單：裝置 ID、型號、韌體版本、是否啟用"""
+
+    # --- CSV 匯出 ---
+
+    def export_csv(
+        self,
+        data: ReportData,
+        output_path: str | Path
+    ) -> Path:
+        """
+        將 Benchmark 結果與歷史記錄匯出為 CSV。
+        CSV 包含兩個邏輯區塊（以空行分隔）：
+        1. Benchmark 摘要（循序 vs 平行對比）
+        2. 歷史記錄（每筆 BenchmarkResult 一行）
+        回傳實際寫入的檔案路徑。
+        """
+
+    # --- 工廠方法（方便測試 mock） ---
+
+    @staticmethod
+    def _get_timestamp_str(ts: float) -> str:
+        """將 UNIX timestamp 格式化為 'YYYY-MM-DD HH:MM:SS'（本地時間）"""
+```
+
+#### 3.4.5 PDF 報告內容結構
+
+| 區塊 | 內容 | 實作方法 |
+|------|------|---------|
+| 封面 | 報告標題、生成時間（本地時間）、Pipeline 名稱、裝置數量摘要 | `_build_cover_page` |
+| Benchmark 結果表 | 循序 vs 平行的 FPS / 延遲對比，加速倍數以大字體標示（如「加速 3.2x」） | `_build_benchmark_table` |
+| 效能趨勢圖 | 從 `PerformanceDashboard` 擷取的 pyqtgraph 折線圖截圖（PNG 嵌入） | `_build_trend_chart` |
+| 歷史記錄表 | 最近 20 筆 Benchmark 記錄（時間、模式、FPS、延遲） | `_build_history_table` |
+| 裝置資訊 | 測試時連接的裝置清單（型號、韌體、是否啟用） | `_build_device_info` |
+
+PDF 頁面規格：A4（210×297mm），reportlab 預設單位 point（72pt = 1 inch）。
+
+#### 3.4.6 CSV 匯出格式
+
+**區塊 1：Benchmark 摘要**
+```
+section,metric,sequential,parallel,diff_pct
+benchmark_summary,fps,14.2,45.6,+221.1%
+benchmark_summary,avg_latency_ms,70.4,21.9,-68.9%
+benchmark_summary,p95_latency_ms,95.0,33.2,-65.1%
+benchmark_summary,total_frames,426,1368,—
+benchmark_summary,speedup,—,3.21x,—
+```
+
+**區塊 2：歷史記錄**（空一行後接續）
+```
+id,timestamp,mode,fps,avg_latency_ms,p95_latency_ms,total_frames
+benchmark_20260405_143022,2026-04-05 14:30:22,parallel,45.2,22.1,35.0,1356
+...
+```
+
+#### 3.4.7 與現有模組的整合點
+
+| 資料來源 | 取用方式 | 對應 `ReportData` 欄位 |
+|---------|---------|----------------------|
+| `PerformanceBenchmarker.run_full_benchmark()` | 回傳 `(sequential_result, parallel_result, speedup)` | `sequential_result`, `parallel_result`, `speedup` |
+| `PerformanceHistory.get_history(limit=20)` | 回傳 `List[BenchmarkResult]` | `history_records` |
+| `DeviceManager.get_all_devices()` | 回傳裝置列表（型號、韌體等） | `devices`（轉為 `DeviceSummary`） |
+| `PerformanceDashboard`（UI 層） | `QPixmap` 截圖後轉 PNG bytes | `chart_image_bytes` |
+
+**整合原則：** `ReportExporter` 本身不直接依賴上述任何模組。UI 層（`ExportReportDialog`）負責收集資料、組裝 `ReportData`，再傳給 `ReportExporter`。這樣 `ReportExporter` 可在無 PyQt5 環境（如 CI）下做單元測試。
+
+#### 3.4.8 UI 觸發入口（`ui/dialogs/export_report_dialog.py`）
+
+```python
+class ExportReportDialog(QDialog):
+    def __init__(
+        self,
+        parent: QWidget,
+        benchmarker: PerformanceBenchmarker,
+        history: PerformanceHistory,
+        device_manager: DeviceManager,
+        dashboard: PerformanceDashboard
+    )
+
+    def _collect_report_data(self) -> ReportData:
+        """從各模組收集資料，組裝 ReportData"""
+
+    def _on_export_pdf(self) -> None:
+        """使用 QFileDialog 取得儲存路徑，呼叫 ReportExporter.export_pdf()"""
+
+    def _on_export_csv(self) -> None:
+        """使用 QFileDialog 取得儲存路徑，呼叫 ReportExporter.export_csv()"""
+```
+
+對話框 UI 元素：
+- 匯出格式選擇（PDF / CSV / 兩者皆匯出）
+- 儲存路徑輸入框（含「瀏覽」按鈕，`QFileDialog`）
+- 進度指示（`QProgressBar`，PDF 生成時顯示）
+- 匯出結果訊息（成功：顯示路徑；失敗：顯示錯誤訊息）
+
+#### 3.4.9 Phase 4 新增測試
+
+測試檔案：`tests/unit/test_report_exporter.py`
+
+```python
+def test_export_csv_creates_file_at_given_path():
+    """export_csv() 應在指定路徑建立 CSV 檔案"""
+
+def test_export_csv_contains_benchmark_summary_section():
+    """CSV 應包含 benchmark_summary 區塊的 fps/latency 欄位"""
+
+def test_export_csv_contains_history_section():
+    """CSV 應包含歷史記錄區塊，行數等於 history_records 筆數"""
+
+def test_export_csv_empty_history_produces_only_summary():
+    """history_records 為空時，CSV 只輸出 Benchmark 摘要區塊"""
+
+def test_export_csv_no_benchmark_result_raises_value_error():
+    """sequential_result 或 parallel_result 為 None 時，應拋出 ValueError"""
+
+def test_export_pdf_creates_file_at_given_path():
+    """export_pdf() 應在指定路徑建立 PDF 檔案（不驗證內容，只驗證存在）"""
+
+def test_export_pdf_auto_creates_parent_directory():
+    """若輸出路徑的父目錄不存在，export_pdf() 應自動建立"""
+
+def test_export_pdf_without_chart_image_does_not_raise():
+    """chart_image_bytes 為 None 時，PDF 匯出不應拋出例外"""
+
+def test_get_timestamp_str_format():
+    """_get_timestamp_str 應回傳 'YYYY-MM-DD HH:MM:SS' 格式的字串"""
+
+def test_report_data_defaults_are_sane():
+    """ReportData 預設值：report_title 非空、generated_at 接近當下時間"""
+```
+
+---
+
+## 4. 測試策略
+
+### 4.1 測試架構
+
+**目標：** 建立 pytest 框架，核心模組達 80% 以上覆蓋率。
+
+**測試目錄結構（建議）：**
+```
+tests/
+├── unit/
+│   ├── test_pipeline_analysis.py      # core/pipeline.py 的 Stage 分析邏輯
+│   ├── test_node_properties.py        # BaseNodeWithProperties 的屬性管理
+│   ├── test_model_node.py             # ModelNode 的驗證與設定
+│   ├── test_inference_pipeline.py     # InferencePipeline 的佇列、FPS 計算
+│   ├── test_postprocessor.py          # PostProcessor 各類型的後處理邏輯
+│   └── test_benchmarker.py            # PerformanceBenchmarker（Phase 1）
+├── integration/
+│   ├── test_pipeline_execution.py     # 完整 Pipeline 執行流程（Mock Hardware）
+│   └── test_mflow_persistence.py      # .mflow 儲存與載入
+└── e2e/
+    └── test_ui_workflow.py            # UI 操作流程（需要顯示環境）
+```
+
+### 4.2 單元測試重點
+
+#### `core/pipeline.py` 測試重點
+
+```python
+# test_pipeline_analysis.py
+
+def test_analyze_returns_empty_for_graph_without_input_output():
+    """無 Input/Output 節點的 Graph 應回傳空 Stage 列表"""
+
+def test_analyze_counts_model_nodes_as_stages():
+    """Stage 數量應等於 ModelNode 數量"""
+
+def test_stage_ordering_by_distance_from_input():
+    """Stage 應依距離 Input 節點的遠近排序"""
+
+def test_validate_pipeline_structure_requires_input_output_model():
+    """缺少 Input、Output、或 Model 節點時，驗證應失敗"""
+
+def test_node_detection_by_identifier():
+    """is_model_node 應能透過 __identifier__ 識別 ModelNode"""
+
+def test_node_detection_by_get_inference_config():
+    """is_model_node 應能透過 get_inference_config 方法識別 ModelNode"""
+```
+
+#### `core/functions/InferencePipeline.py` 測試重點
+
+```python
+# test_inference_pipeline.py
+
+def test_fps_returns_zero_before_first_result():
+    """初始化後 FPS 應為 0.0"""
+
+def test_fps_excludes_processing_status_results():
+    """status='processing' 的結果不應計入 FPS"""
+
+def test_input_queue_drops_oldest_when_full():
+    """輸入佇列滿時，最舊的幀應被捨棄"""
+
+def test_pipeline_stop_gracefully():
+    """stop() 應在 timeout 內完成（< 10 秒）"""
+
+def test_result_callback_called_for_valid_results():
+    """只有有效推論結果應觸發 result_callback"""
+```
+
+#### `core/functions/Multidongle.py` 後處理測試重點
+
+```python
+# test_postprocessor.py
+
+def test_fire_detection_above_threshold_returns_fire():
+    """probability > threshold 應回傳 class_name='Fire'"""
+
+def test_fire_detection_below_threshold_returns_no_fire():
+    """probability <= threshold 應回傳 class_name='No Fire'"""
+
+def test_classification_multiclass_returns_highest_confidence():
+    """多類別分類應回傳最高信心度的類別"""
+
+def test_yolo_v5_returns_object_detection_result():
+    """YOLOv5 後處理應回傳 ObjectDetectionResult 型別"""
+
+def test_yolo_v5_empty_output_returns_zero_boxes():
+    """空推論輸出應回傳 box_count=0"""
+```
+
+### 4.3 整合測試重點
+
+```python
+# test_pipeline_execution.py（使用 Mock Kneron SDK）
+
+def test_pipeline_processes_frames_end_to_end():
+    """影像幀應能完整通過所有 Stage"""
+
+def test_pipeline_restarts_cleanly():
+    """stop() 後重新 initialize() 和 start() 不應有狀態殘留"""
+
+def test_multistage_pipeline_preserves_stage_order():
+    """Stage 0 的結果應先於 Stage 1 的結果產出"""
+```
+
+### 4.4 效能測試目標
+
+| 測試項目 | 目標 |
+|---------|------|
+| UI 互動回應 | < 200ms（節點拖拽、屬性切換） |
+| Pipeline 即時驗證延遲 | < 100ms |
+| PerformanceDashboard 更新 CPU 開銷 | < 5% 推論 FPS 影響 |
+| 應用程式啟動時間（含裝置偵測） | < 10 秒 |
+| Benchmark 一鍵到結果呈現 | < 30 秒 |
+
+### 4.5 Mock 策略
+
+由於核心依賴 Kneron KP SDK（`kp` 模組）需要實際硬體，單元測試和整合測試應使用 Mock：
+
+```python
+# conftest.py
+import pytest
+from unittest.mock import MagicMock, patch
+
+@pytest.fixture
+def mock_kp():
+    """Mock Kneron KP SDK"""
+    with patch('core.functions.Multidongle.kp') as mock:
+        mock.scan_devices.return_value = [MagicMock(product_id=0x720)]
+        mock.load_model.return_value = MagicMock()
+        yield mock
+
+@pytest.fixture
+def mock_multidongle(mock_kp):
+    """Mock MultiDongle with configurable inference results"""
+    from unittest.mock import MagicMock
+    dongle = MagicMock()
+    dongle.get_latest_inference_result.return_value = (0.85, "Fire")
+    dongle.model_input_shape = (224, 224)
+    return dongle
+```
diff --git a/.autoflow/04-architecture/design-doc.md b/.autoflow/04-architecture/design-doc.md
new file mode 100644
index 0000000..b54ec94
--- /dev/null
+++ b/.autoflow/04-architecture/design-doc.md
@@ -0,0 +1,581 @@
+# Design Doc — Cluster4NPU UI
+
+## 作者：Architect Agent
+## 狀態：Draft
+## 最後更新：2026-04-05
+## 版本對應：v0.0.3（developer 分支）
+
+---
+
+## 1. 背景與目標
+
+### 1.1 背景
+
+Cluster4NPU UI 是一個桌面應用程式，讓使用者不需要撰寫程式碼，就能透過視覺化拖拽介面設計並執行 AI 推論 Pipeline，並將工作負載分配到多個 Kneron NPU Dongle（KL520、KL720、KL1080）上平行執行。
+
+現有系統已完成核心 Pipeline 設計器與推論引擎的基礎建設，但缺乏：
+- 效能視覺化（無法直觀看到平行處理的加速效果）
+- 進階裝置管理介面
+- 自動化 Benchmark 系統
+- 優化建議引擎
+
+### 1.2 目標
+
+1. **核心目標**：使任何 AI 應用工程師都能在 5 分鐘內完成 Pipeline 設計並看到推論結果
+2. **差異化目標**：清楚視覺化呈現多 NPU Dongle 平行處理帶來的效能加速（2x、3x、4x）
+3. **工程目標**：提供可擴展的架構，支援 Phase 1-4 的功能迭代
+
+### 1.3 範圍
+
+**本文件涵蓋：**
+- 現有（v0.0.3）核心架構的完整說明
+- Phase 1-3 待開發功能的架構設計方向
+
+**不涵蓋：**
+- 雲端功能、非 Kneron 硬體、模型訓練、行動端
+
+---
+
+## 2. 系統架構總覽
+
+### 2.1 整體分層架構
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    使用者介面層（UI Layer）                │
+│                                                          │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
+│  │ Login Window │  │  Dashboard   │  │  Dialogs     │   │
+│  │   (login.py) │  │(dashboard.py)│  │ (deployment, │   │
+│  └──────────────┘  └──────────────┘  │  performance)│   │
+│                          │           └──────────────┘   │
+│  ┌──────────────────────────────────────────────────┐   │
+│  │              三面板佈局（Three-Panel Layout）        │   │
+│  │  ┌──────────┐  ┌──────────────┐  ┌──────────┐   │   │
+│  │  │  左面板   │  │    中面板    │  │  右面板  │   │   │
+│  │  │ 節點面板  │  │ Pipeline 編輯│  │ 設定/監控│   │   │
+│  │  │(palette) │  │ (NodeGraphQt)│  │(properties│   │   │
+│  │  └──────────┘  └──────────────┘  └──────────┘   │   │
+│  └──────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────┘
+                           │
+┌─────────────────────────────────────────────────────────┐
+│                  應用程式核心層（Core Layer）              │
+│                                                          │
+│  ┌────────────────────┐  ┌──────────────────────────┐   │
+│  │  Pipeline 分析引擎  │  │      節點系統（Nodes）     │   │
+│  │  (pipeline.py)     │  │  (base/input/model/       │   │
+│  │                    │  │   preprocess/postprocess/ │   │
+│  │  - Stage 偵測      │  │   output nodes)           │   │
+│  │  - 結構驗證        │  │                           │   │
+│  │  - 路徑分析        │  │  - 業務屬性管理           │   │
+│  │  - 設定匯出        │  │  - 設定序列化             │   │
+│  └────────────────────┘  └──────────────────────────┘   │
+│                                                          │
+│  ┌──────────────────────────────────────────────────┐   │
+│  │           推論執行層（Inference Execution Layer）    │   │
+│  │                                                   │   │
+│  │  ┌──────────────────────┐  ┌─────────────────┐   │   │
+│  │  │  InferencePipeline   │  │   MultiDongle   │   │   │
+│  │  │                      │  │                 │   │   │
+│  │  │  - 多 Stage 協調      │  │  - NPU 裝置管理  │   │   │
+│  │  │  - 執行緒管理         │  │  - 非同步推論    │   │   │
+│  │  │  - 佇列管理          │  │  - 前後處理      │   │   │
+│  │  │  - FPS 計算          │  │  - 多裝置排程    │   │   │
+│  │  └──────────────────────┘  └─────────────────┘   │   │
+│  └──────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────┘
+                           │
+┌─────────────────────────────────────────────────────────┐
+│               硬體抽象層（Hardware Abstraction Layer）     │
+│                                                          │
+│  ┌──────────────────────────────────────────────────┐   │
+│  │                 Kneron KP SDK                     │   │
+│  │                                                   │   │
+│  │  KL520 Dongle  │  KL720 Dongle  │  KL1080 Dongle │   │
+│  └──────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────┘
+```
+
+### 2.2 模組間依賴關係
+
+```
+main.py
+  └── ui/windows/login.py (DashboardLogin)
+        └── ui/windows/dashboard.py (DashboardWindow)
+              ├── ui/windows/pipeline_editor.py
+              │     └── core/pipeline.py (PipelineAnalyzer)
+              │           └── core/nodes/*.py
+              ├── ui/components/properties_widget.py
+              │     └── core/nodes/*.py
+              └── core/functions/InferencePipeline.py
+                    └── core/functions/Multidongle.py
+                          └── kp (Kneron KP SDK)
+```
+
+---
+
+## 3. 核心元件說明
+
+### 3.1 Pipeline 分析引擎（`core/pipeline.py`）
+
+**職責：** 分析 NodeGraphQt 視覺圖形，識別 Pipeline 的 Stage 結構、驗證合法性、產生執行設定。
+
+**關鍵類別：**
+
+| 類別/函式 | 職責 |
+|---------|------|
+| `PipelineStage` | 代表一個推論 Stage，包含 ModelNode 與可選的 Pre/Postprocess Node |
+| `analyze_pipeline_stages(node_graph)` | 從視覺圖形中識別所有 Stage，依距離排序 |
+| `get_stage_count(node_graph)` | 計算 Pipeline 中的 Stage 數量（用於 UI 顯示） |
+| `validate_pipeline_structure(node_graph)` | 驗證 Pipeline 是否包含必要節點（Input、Model、Output） |
+| `get_pipeline_summary(node_graph)` | 回傳 Pipeline 統計摘要（節點數、Stage 數、驗證結果） |
+
+**設計決策：**
+- 採用多重節點識別策略（`__identifier__`、`type_`、`NODE_NAME`、class 名稱、特定方法的存在）以提高相容性
+- Stage 排序依據：計算各 ModelNode 到輸入節點的最短路徑距離（BFS）
+- 所有圖遍歷方法都包含 defensive exception handling，避免 NodeGraphQt 物件狀態不一致時崩潰
+
+**介面：**
+```python
+# 主要公開介面
+get_stage_count(node_graph: NodeGraph) -> int
+analyze_pipeline_stages(node_graph: NodeGraph) -> List[PipelineStage]
+validate_pipeline_structure(node_graph: NodeGraph) -> Tuple[bool, str]
+get_pipeline_summary(node_graph: NodeGraph) -> Dict[str, Any]
+```
+
+### 3.2 節點系統（`core/nodes/`）
+
+**職責：** 定義 Pipeline 中的各類節點，提供業務屬性管理與設定序列化能力。
+
+**繼承架構：**
+```
+NodeGraphQt.BaseNode
+  └── BaseNodeWithProperties（base_node.py）
+        ├── InputNode（input_node.py）
+        ├── ModelNode（model_node.py）
+        ├── PreprocessNode（preprocess_node.py）
+        ├── PostprocessNode（postprocess_node.py）
+        └── OutputNode（output_node.py）
+```
+
+**`BaseNodeWithProperties` 核心能力：**
+- `create_business_property(name, default, options)` — 建立帶驗證選項的業務屬性
+- `validate_property(name, value)` — 數值範圍、選項列表驗證
+- `get_node_config()` / `load_node_config(config)` — JSON 序列化/還原
+- `create_node_property_widget(node, prop_name, value, options)` — 根據屬性型別自動生成 Qt Widget
+
+**ModelNode 屬性（主要節點）：**
+
+| 屬性 | 型別 | 說明 |
+|------|------|------|
+| `model_path` | file_path | .nef 模型檔案路徑 |
+| `dongle_series` | choice | KL520 / KL720 / KL1080 |
+| `num_dongles` | int (1-16) | 分配給此 Stage 的 Dongle 數量 |
+| `port_id` | string | USB Port ID（或 auto） |
+| `batch_size` | int (1-32) | 推論批次大小 |
+| `max_queue_size` | int (1-100) | 輸入佇列最大長度 |
+
+### 3.3 推論執行引擎（`core/functions/InferencePipeline.py`）
+
+**職責：** 管理多 Stage Pipeline 的生命週期、協調執行緒間資料流、計算效能指標。
+
+**主要資料結構：**
+
+```python
+@dataclass
+class StageConfig:
+    stage_id: str
+    port_ids: List[int]
+    scpu_fw_path: str        # SCPU 韌體路徑
+    ncpu_fw_path: str        # NCPU 韌體路徑
+    model_path: str          # .nef 模型路徑
+    upload_fw: bool          # 是否上傳韌體
+    max_queue_size: int      # 佇列大小（預設 50）
+    multi_series_config: Optional[Dict]  # 多系列模式設定
+    input_preprocessor: Optional[PreProcessor]
+    output_postprocessor: Optional[PostProcessor]
+
+@dataclass
+class PipelineData:
+    data: Any                # 當前資料（影像、中間結果）
+    metadata: Dict[str, Any] # 時間戳、處理資訊
+    stage_results: Dict[str, Any]  # 各 Stage 推論結果
+    pipeline_id: str         # 唯一識別碼
+    timestamp: float
+```
+
+**執行緒模型：**
+
+```
+主執行緒（UI）
+  │
+  ├── InferencePipeline.coordinator_thread（協調器）
+  │     │  從 pipeline_input_queue 取資料
+  │     │  依序分配給各 Stage
+  │     └── 收集結果放入 pipeline_output_queue
+  │
+  ├── PipelineStage[0].worker_thread（Stage 0 工作執行緒）
+  │     └── 從 input_queue 取資料 → MultiDongle 推論 → 放入 output_queue
+  │
+  ├── PipelineStage[1].worker_thread（Stage 1 工作執行緒）
+  │     └── ...
+  │
+  └── stats_thread（效能統計回報）
+```
+
+**FPS 計算方式：** 採用累積式計算（`completed_counter / elapsed_time`），與 Kneron 範例程式的計算邏輯一致，只計算真實推論結果（排除 async/processing 狀態）。
+
+**佇列管理策略：**
+- 輸入佇列滿時：捨棄最舊的幀（為了即時串流的實時性）
+- 輸出佇列上限 50 筆：超出時捨棄最舊的結果，避免記憶體無限增長
+
+### 3.4 硬體抽象層（`core/functions/Multidongle.py`）
+
+**職責：** 封裝 Kneron KP SDK，提供統一的 NPU Dongle 管理介面，支援單裝置與多裝置（multi-series）模式。
+
+**核心抽象類別：**
+
+```python
+class DataProcessor(ABC):
+    def process(self, data: Any, *args, **kwargs) -> Any: ...
+
+class PreProcessor(DataProcessor):
+    # 影像縮放（resize）+ 格式轉換（BGR → BGR565/RGB8888）
+
+class PostProcessor(DataProcessor):
+    # 支援 4 種後處理類型：
+    # - FIRE_DETECTION（火焰分類）
+    # - CLASSIFICATION（一般分類）
+    # - YOLO_V3（物件偵測）
+    # - YOLO_V5（物件偵測，使用參考實作）
+    # - RAW_OUTPUT（原始輸出）
+```
+
+**裝置規格（DongleSeriesSpec）：**
+
+| 系列 | Product ID | GOPS 算力 |
+|------|-----------|---------|
+| KL520 | 0x100 | 2 GOPS |
+| KL720 | 0x720 | 28 GOPS |
+| KL630 | 0x630 | 400 GOPS |
+| KL730 | 0x730 | 1600 GOPS |
+
+**推論結果資料結構：**
+
+```python
+@dataclass
+class ClassificationResult:
+    probability: float
+    class_name: str
+    class_num: int
+    confidence_threshold: float
+
+@dataclass
+class ObjectDetectionResult:
+    class_count: int
+    box_count: int
+    box_list: List[BoundingBox]
+    # Letterbox 映射資訊（用於還原到原始影像座標）
+    model_input_width, model_input_height: int
+    pad_left, pad_top, pad_right, pad_bottom: int
+```
+
+### 3.5 使用者介面層（`ui/`）
+
+**職責：** 呈現視覺化 Pipeline 設計環境，管理節點屬性設定、效能監控顯示。
+
+**主要視窗：**
+- `DashboardLogin`（`ui/windows/login.py`）：啟動畫面、最近專案清單、新建/載入 Pipeline
+- `DashboardWindow`（`ui/windows/dashboard.py`）：主工作介面，三面板佈局
+- `PipelineEditor`（`ui/windows/pipeline_editor.py`）：內嵌 NodeGraphQt 視覺編輯器
+
+**三面板配置：**
+
+| 面板 | 寬度比例 | 主要內容 |
+|------|---------|---------|
+| 左面板 | 25% | 節點面板（拖拽來源）、Pipeline 操作按鈕 |
+| 中面板 | 50% | NodeGraphQt 視覺編輯器、全域狀態列 |
+| 右面板 | 25% | Properties Tab（節點設定）、Performance Tab（效能監控）、Dongles Tab（裝置管理） |
+
+### 3.6 應用程式入口（`main.py`）
+
+**職責：** 應用程式初始化、單一實例保護、Qt 環境設定。
+
+**單一實例機制：** `SingleInstance` 類別採用雙重保護：
+1. Qt `QSharedMemory`（跨平台）
+2. 檔案鎖（Unix: fcntl / Windows: O_CREAT|O_EXCL）
+3. 自動清理 5 分鐘以上的過期鎖定檔案
+
+---
+
+## 4. 資料流
+
+### 4.1 設計階段資料流（Design Time）
+
+```
+使用者拖拽節點
+      │
+      ▼
+NodeGraphQt 視覺圖形
+      │
+      ▼
+core/pipeline.py
+  analyze_pipeline_stages()
+      │
+      ▼
+List[PipelineStage]（邏輯 Stage 列表）
+      │
+      ├──→ UI 顯示 Stage 數量（狀態列）
+      └──→ 驗證錯誤提示（Validation Errors）
+```
+
+### 4.2 執行階段資料流（Runtime）
+
+```
+輸入來源（相機 / 影片 / 圖片）
+      │
+      ▼
+camera_source.py / video_source.py
+      │  numpy.ndarray（BGR 影像幀）
+      ▼
+InferencePipeline.put_data()
+      │
+      ▼
+pipeline_input_queue（Queue, maxsize=100）
+      │
+      ▼
+coordinator_thread（協調器執行緒）
+  建立 PipelineData 包裝器
+      │
+      ▼（依序通過每個 Stage）
+PipelineStage[0].input_queue
+      │
+      ▼
+worker_thread[0]
+  1. input_preprocessor（可選的 Stage 間前處理）
+  2. MultiDongle.preprocess_frame()（BGR → BGR565 格式轉換）
+  3. MultiDongle.put_input()（送入推論佇列）
+  4. MultiDongle.get_latest_inference_result()（非阻塞取結果）
+  5. 更新 PipelineData.stage_results
+      │
+      ▼
+PipelineStage[0].output_queue
+      │
+      ▼（下一個 Stage...）
+      │
+      ▼
+pipeline_output_queue（Queue, maxsize=50）
+      │
+      ├──→ result_callback（UI 更新）
+      └──→ stats_callback（效能統計）
+```
+
+### 4.3 .mflow 檔案格式
+
+Pipeline 儲存為 JSON 格式：
+
+```json
+{
+  "nodes": [
+    {
+      "type": "ModelNode",
+      "name": "Stage 1 Model",
+      "properties": {
+        "model_path": "/path/to/model.nef",
+        "dongle_series": "720",
+        "num_dongles": 2
+      },
+      "position": [100, 200]
+    }
+  ],
+  "connections": [
+    {"from_node": "input_0", "from_port": "output", "to_node": "model_0", "to_port": "input"}
+  ]
+}
+```
+
+---
+
+## 5. 技術決策紀錄（ADR）
+
+### ADR-001：選用 PyQt5 作為 GUI 框架
+
+**決策**：使用 PyQt5（>= 5.15.11）
+
+**原因：**
+- NodeGraphQt 依賴 PyQt5，無法使用其他框架
+- PyQt5 在 Windows 上有成熟的支援
+- 提供豐富的 Widget 與 Signal/Slot 機制
+
+**取捨：**
+- 限制 Python 版本在 3.9–3.11（PyQt5 + Kneron SDK 相容性）
+- PyQt6 不向下相容，短期不考慮遷移
+
+### ADR-002：選用 NodeGraphQt 作為視覺節點編輯器
+
+**決策**：使用 NodeGraphQt（>= 0.6.40）
+
+**原因：**
+- 提供完整的拖拽節點圖形編輯能力，開發成本低
+- 支援節點連接、屬性面板、視覺化輸出
+
+**取捨：**
+- NodeGraphQt 的 UI 客製化能力有限（如節點顏色、形狀）
+- 節點識別採用多重 fallback 機制（透過 `__identifier__`、`NODE_NAME` 等），因 NodeGraphQt 版本差異可能造成 API 不一致
+
+### ADR-003：多執行緒 Pipeline 架構
+
+**決策**：每個 Stage 一個 Worker Thread + 一個 Coordinator Thread
+
+**原因：**
+- 推論為 CPU/硬體密集操作，多執行緒可避免 UI 阻塞
+- 各 Stage 獨立執行緒允許流水線（pipelining）並行，提升吞吐量
+
+**取捨：**
+- 協調器採用循序（sequential）方式傳遞資料，並非真正平行（真正平行需要 DAG 調度器）
+- 使用 `queue.Queue` 進行執行緒間通訊，有固定的記憶體上限
+
+### ADR-004：非阻塞式推論結果取得
+
+**決策**：`MultiDongle.get_latest_inference_result()` 採用非阻塞模式
+
+**原因：**
+- 與 Kneron 範例程式碼（example.py）的設計模式一致
+- 避免推論延遲阻塞整個 Pipeline 執行緒
+
+**取捨：**
+- 結果可能為 None（尚未完成），需要 async/processing 狀態的過濾邏輯
+
+### ADR-005：FPS 計算採用累積式
+
+**決策**：`completed_counter / elapsed_time`（從第一個結果開始計算）
+
+**原因：**
+- 與 Kneron 官方範例的計算方式一致，確保可比性
+- 排除熱機（warm-up）期間的異常低 FPS
+
+**取捨：**
+- 無法反映即時的 FPS 波動（適合穩定場景，不適合延遲敏感場景）
+
+### ADR-006：PyInstaller 打包
+
+**決策**：使用 PyInstaller（`main.spec`）產生獨立可執行檔
+
+**原因：**
+- 目標用戶（系統整合商）可能沒有 Python 環境
+- 簡化部署流程
+
+**取捨：**
+- 打包後的執行檔體積較大
+- Kneron KP SDK 的動態函式庫需要正確包含在打包設定中
+
+---
+
+## 6. 已知限制與技術債
+
+### 6.1 已知 Bug
+
+| Bug | 狀態 | 影響 |
+|-----|------|------|
+| 節點屬性顯示問題 | 未修復（v0.0.2 記錄） | 右面板 Properties Tab 可能顯示錯誤 |
+| 輸出視覺化異常（含後處理結果） | 未修復（v0.0.2 記錄） | 輸出畫面可能不正確 |
+
+### 6.2 技術債
+
+| 項目 | 嚴重度 | 說明 |
+|------|--------|------|
+| 根目錄 debug 腳本未整理 | 低 | `debug_*.py`、`force_cleanup.py` 等應移至 `tools/` |
+| tests/ 命名混亂 | 中 | 42 個腳本缺乏系統性分類，部分非 test_ 開頭 |
+| 缺乏 pytest 測試框架 | 中 | 核心模組（InferencePipeline、MultiDongle）無 pytest 覆蓋 |
+| Coordinator 為循序設計 | 中 | 真正的 Stage 並行需要重構協調器為 DAG 模式 |
+| 節點識別多重 fallback | 低 | 可讀性差，應統一為單一識別策略 |
+| RTSP 串流僅基本支援 | 低 | 完整 RTSP 功能未在當前路線圖中 |
+
+### 6.3 效能限制
+
+- **協調器為循序傳遞**：目前 Coordinator 依序將資料傳給 Stage 0 → Stage 1，無真正的平行推論（真正平行需重構為流水線佇列模式）
+- **FPS 計算不反映即時波動**：累積式 FPS 在長時間執行後準確，但短期波動不可見
+- **輸出佇列上限 50**：高吞吐量場景下可能成為瓶頸
+
+---
+
+## 7. 未來架構演進方向
+
+### Phase 1：效能視覺化（對應 DEVELOPMENT_ROADMAP Phase 1）
+
+**需要新增的架構元件：**
+
+```python
+# 新增模組：core/performance/
+class PerformanceBenchmarker:
+    """自動化效能測試器"""
+    def run_sequential_benchmark(self, pipeline_config) -> BenchmarkResult
+    def run_parallel_benchmark(self, pipeline_config) -> BenchmarkResult
+    def calculate_speedup(self, seq: BenchmarkResult, par: BenchmarkResult) -> float
+
+class PerformanceHistory:
+    """效能歷史記錄（本地 JSON 儲存）"""
+    def record(self, result: BenchmarkResult)
+    def get_history(self, limit: int) -> List[BenchmarkResult]
+```
+
+**UI 層新增：**
+- `ui/components/performance_dashboard.py`：即時 FPS/延遲折線圖（使用 pyqtgraph 或 matplotlib）
+- `ui/dialogs/benchmark_dialog.py`：Benchmark 啟動與結果呈現
+
+**架構考量：**
+- Benchmark 需要能控制 `InferencePipeline` 以單裝置/多裝置模式執行，需要在 `StageConfig` 層級提供模式切換介面
+- 效能圖表更新須在獨立執行緒中產生資料，透過 Qt Signal 傳遞到 UI 執行緒
+
+### Phase 2：裝置管理（對應 DEVELOPMENT_ROADMAP Phase 2）
+
+**需要新增的架構元件：**
+
+```python
+# 強化 core/functions/Multidongle.py
+class DeviceManager:
+    """裝置管理器"""
+    def scan_devices() -> List[DeviceInfo]
+    def get_device_health(device_id: str) -> DeviceHealth
+    def assign_device(device_id: str, stage_id: str)
+    def get_load_balance_recommendation() -> Dict[str, str]
+
+@dataclass
+class DeviceInfo:
+    device_id: str
+    series: str        # KL520/KL720/KL1080
+    status: str        # online/offline/busy
+    gops: int          # 算力（來自 DongleSeriesSpec）
+    assigned_stage: Optional[str]
+```
+
+**UI 層新增：**
+- `ui/components/device_management_panel.py`：裝置狀態儀表板
+
+### Phase 3：優化引擎（對應 DEVELOPMENT_ROADMAP Phase 3）
+
+**需要新增的架構元件：**
+
+```python
+# 新增模組：core/optimization/
+class OptimizationEngine:
+    def analyze_pipeline(self, stats: PipelineStats) -> List[OptimizationSuggestion]
+    def predict_performance(self, config: PipelineConfig) -> PerformancePrediction
+
+@dataclass
+class OptimizationSuggestion:
+    type: str          # "rebalance_devices" | "remove_redundant_node" | ...
+    description: str
+    estimated_improvement: float  # 預估效能提升 %
+    action: Callable   # 可執行的改善動作
+```
+
+### 架構演進的長期考量
+
+1. **Coordinator 重構**：當前循序協調器在多 Stage Pipeline 中形成瓶頸。長期應重構為流水線（pipeline）模式，讓 Stage N+1 在 Stage N 處理下一幀時就開始處理上一幀的結果。
+
+2. **測試架構建立**：建立 pytest 測試框架，核心模組需達到 80% 以上覆蓋率（特別是 `InferencePipeline` 的佇列邏輯、`pipeline.py` 的 Stage 分析邏輯）。
+
+3. **型別標註完善**：目前部分模組缺乏完整型別標註，建議逐步加入 mypy 靜態分析。
diff --git a/.autoflow/progress.md b/.autoflow/progress.md
new file mode 100644
index 0000000..3db1176
--- /dev/null
+++ b/.autoflow/progress.md
@@ -0,0 +1,39 @@
+# 專案進度 — Cluster4NPU UI
+
+## 目的：接入既有專案 → 文件補齊 → Phase 1 開發
+## 當前階段：Phase 1 開發完成，待執行測試
+## 當前狀態：進行中
+## 最後更新：2026-04-05
+
+## 進度表
+
+| 階段 | 狀態 | 完成時間 | 備註 |
+|------|------|----------|------|
+| 專案接入 | ✅ 已完成 | 2026-04-05 | 本地路徑 |
+| 專案健檢 | ✅ 已完成 | 2026-04-05 | 見 00-onboarding/health-check.md |
+| PRD 產出 | ✅ 已完成 | 2026-04-05 | 02-prd/PRD.md |
+| Design Doc 產出 | ✅ 已完成 | 2026-04-05 | 04-architecture/design-doc.md |
+| TDD 產出 | ✅ 已完成 | 2026-04-05 | 04-architecture/TDD.md |
+| 交叉審閱 | ✅ 已完成 | 2026-04-05 | PM 審閱 TDD，缺口已補充 |
+| TDD 補充（Phase 4 功能 11） | ✅ 已完成 | 2026-04-05 | reportlab PDF + csv 標準庫 |
+| Phase 1 後端實作 | ✅ Review 通過 | 2026-04-05 | PerformanceBenchmarker + PerformanceHistory（31 tests） |
+| Phase 1 UI 實作 | ✅ Review 通過 | 2026-04-05 | PerformanceDashboard + BenchmarkDialog（58 tests total） |
+| Phase 1 整合到 dashboard | ✅ Review 通過 | 2026-04-05 | dashboard.py 整合完成 |
+| Phase 2 後端實作 | ✅ Review 通過 | 2026-04-05 | DeviceManager + BottleneckAlert（94 tests） |
+| Phase 2 UI 實作 | ✅ Review 通過 | 2026-04-05 | DeviceManagementPanel，已整合到 dashboard |
+| Phase 3 開發 | ✅ Review 通過 | 2026-04-06 | OptimizationEngine + TemplateManager（154 tests） |
+| Phase 4 開發 | ✅ Review 通過 | 2026-04-06 | ReportExporter + ExportReportDialog（192 tests） |
+
+## 當前待辦
+
+- [ ] 執行 Phase 1 整合測試確認所有元件協同運作
+- [ ] 決定是否繼續 Phase 2
+
+## 未解決問題
+
+- 無
+
+## 重要決策紀錄
+
+- 程式碼來源：本地路徑（非 GitHub）
+- 文件補齊策略：從程式碼反向整理，不補設計稿（無現有 UI 截圖或 Wireframe）
diff --git a/pyproject.toml b/pyproject.toml
index a60e5ba..5583b21 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -8,3 +8,11 @@ dependencies = [
     "nodegraphqt>=0.6.40",
     "pyqt5>=5.15.11",
 ]
+
+[tool.pytest.ini_options]
+testpaths = ["tests/unit"]
+pythonpath = ["."]
+addopts = "--import-mode=importlib"
+python_files = ["test_*.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*", "should_*"]
diff --git a/tests/conftest.py b/tests/conftest.py
new file mode 100644
index 0000000..727c401
--- /dev/null
+++ b/tests/conftest.py
@@ -0,0 +1,46 @@
+"""
+tests/conftest.py — 單元測試環境設定。
+
+此 conftest.py 位於 tests/ 目錄（非 Python 套件），
+可在 root __init__.py 被觸發前完成 Mock 注入。
+
+在沒有 Kneron NPU 硬體、PyQt5、NodeGraphQt 的環境下，
+仍可測試 core/performance/ 的純 Python 邏輯。
+"""
+import sys
+from unittest.mock import MagicMock
+
+
+def _install_mock(name: str) -> None:
+    """若模組尚未存在，安裝空 MagicMock 作為替代。"""
+    if name not in sys.modules:
+        sys.modules[name] = MagicMock()
+
+
+# Kneron KP SDK（需要硬體驅動程式）
+_install_mock("kp")
+
+# NumPy（可能未安裝）
+try:
+    import numpy  # noqa: F401
+except ImportError:
+    _install_mock("numpy")
+
+# PyQt5 相關模組（需要 GUI 環境）
+for _mod in [
+    "PyQt5",
+    "PyQt5.QtWidgets",
+    "PyQt5.QtCore",
+    "PyQt5.QtGui",
+    "PyQt5.QtChart",
+]:
+    _install_mock(_mod)
+
+# NodeGraphQt（依賴 PyQt5）
+_install_mock("NodeGraphQt")
+_install_mock("NodeGraphQt.constants")
+_install_mock("NodeGraphQt.base")
+_install_mock("NodeGraphQt.base.node")
+
+# OpenCV（可能未安裝）
+_install_mock("cv2")