Refactor workers to use backend interfaces for quantization, compilation, and evaluation; add optional flags for simulation in request schemas and update documentation accordingly.

This commit is contained in:
warrenchen 2026-02-06 08:24:08 +00:00
parent bc98456d74
commit fdebf4db5d
17 changed files with 353 additions and 53 deletions

View File

@ -27,6 +27,28 @@ ONNX → BIE → NEF。系統以 Scheduler 為控制面,搭配 Worker Pool 與
7) NEF Worker 執行完成
8) Scheduler 標記 COMPLETED
## Worker API Flags可選
這些旗標用於控制 evaluator / simulator 步驟。皆有預設值,不填不影響既有流程。
- ONNX `/api/onnx/process`
- `enable_evaluate` (default: `false`): 是否執行 IP evaluator原 Web GUI 流程為 OFF
- `enable_sim_fp` (default: `false`): 是否執行浮點 E2E 模擬(尚未接線)
- BIE `/api/bie/process`
- `enable_sim_fixed` (default: `false`): 是否執行定點 E2E 模擬(尚未接線)
- NEF `/api/nef/process`
- `enable_sim_hw` (default: `false`): 是否執行硬體 E2E 模擬(尚未接線)
## 流程預設開關對照(原 Web GUI vs 現在 Workers
| 步驟 | 原 Web GUI 預設 | 現在 Workers 預設 | 開關 |
|---|---|---|---|
| ONNX 轉換/最佳化 | ON | ON | 無 |
| IP Evaluator | OFF | OFF | `enable_evaluate` |
| FP E2E 模擬 | OFF | OFF | `enable_sim_fp` |
| BIE 量化 | ON | ON | 無 |
| Fixed-Point E2E 模擬 | OFF | OFF | `enable_sim_fixed` |
| NEF Compile | ON | ON | 無 |
| HW E2E 模擬 | OFF | OFF | `enable_sim_hw` |
## 非目標
- 不做任務持久化
- 不做 crash 後 resume

View File

@ -165,16 +165,34 @@ error:
- 輸入:工作目錄下的唯一檔案(不假設檔名 / 副檔名)
- 輸出:`out.onnx`
- 輸出位置:同一工作目錄
- 可選旗標:
- `enable_evaluate` (default: `false`):是否執行 IP evaluator原 Web GUI 流程為 OFF
- `enable_sim_fp` (default: `false`):是否執行浮點 E2E 模擬(尚未接線)
### 4.1.3 BIE Worker
- 輸入:`out.onnx` + `ref_images/*`
- 輸出:`out.bie`
- 輸出位置:同一工作目錄
- 可選旗標:
- `enable_sim_fixed` (default: `false`):是否執行定點 E2E 模擬(尚未接線)
### 4.1.4 NEF Worker
- 輸入:`out.bie`
- 輸出:`out.nef`
- 輸出位置:同一工作目錄
- 可選旗標:
- `enable_sim_hw` (default: `false`):是否執行硬體 E2E 模擬(尚未接線)
### 4.1.6 流程預設開關對照(原 Web GUI vs 現在 Workers
| 步驟 | 原 Web GUI 預設 | 現在 Workers 預設 | 開關 |
|---|---|---|---|
| ONNX 轉換/最佳化 | ON | ON | 無 |
| IP Evaluator | OFF | OFF | `enable_evaluate` |
| FP E2E 模擬 | OFF | OFF | `enable_sim_fp` |
| BIE 量化 | ON | ON | 無 |
| Fixed-Point E2E 模擬 | OFF | OFF | `enable_sim_fixed` |
| NEF Compile | ON | ON | 無 |
| HW E2E 模擬 | OFF | OFF | `enable_sim_hw` |
### 4.1.5 Core / Toolchain 路徑一致性
- Worker 需將工作目錄 path 傳給 core

View File

@ -245,7 +245,41 @@ If a file currently mixes multiple module responsibilities:
- Move high-risk dependencies first (prebuilt calls, sys_flow usage).
- After each phase, re-check boundaries and adjust.
## 6) Minimum Viable API Proposal
## 7) Current Structure and Replacement Strategy (As-Is)
Based on the refactor just completed, the effective call chain is:
```
workers (ONNX/BIE/NEF)
-> backends (interfaces + Kneron implementations)
-> ktc (toolchain python API)
-> vendor sys_flow / libs / libs_V2 / prebuilt binaries
```
### 7.1 What this means today
- Workers only depend on **backend interfaces**. They no longer call `ktc.ModelConfig` directly.
- Kneron specifics are concentrated in backend implementations.
- `ktc` still wraps the Kneron toolchain and binaries; that dependency remains, but it is **now isolated**.
### 7.2 How to replace later
1) **Replace backend implementations** (lowest-risk)
- Keep backend interfaces stable.
- Swap `Kneron*Backend` for `Your*Backend` without touching workers.
2) **Keep backend layer, but replace `ktc` calls**
- Modify `Kneron*Backend` to call your own library instead of `ktc`.
- Workers stay unchanged; only backend code moves.
3) **Introduce multiple backends**
- Add `get_*_backend(name=...)` selection based on config/env.
- Allows mixed runs: Kneron for NEF, OSS for ONNX, etc.
### 7.3 Where to implement replacements
- `services/backends/quantization.py`
- `services/backends/compiler.py`
- `services/backends/evaluator.py`
- `services/backends/simulator.py`
## 8) Minimum Viable API Proposal
Keep it minimal to avoid churn:
```python
@ -260,6 +294,24 @@ class CompilerBackend:
Then the pipeline is just a pure composition of these two + ONNX ops.
## 9) What This Enables
- Replace ONNX converters / optimizers without touching quantization.
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
- Swap in future Kneron versions only inside backend adapters.
- Experiment with alternative quantization or compiler backends.
---
## 10) Next Steps (if you want)
I can draft the following next:
1) A small refactor plan with concrete file edits and minimal API changes.
2) A diagram (Mermaid) of the new modular flow.
3) A compatibility matrix (current vs target dependencies per module).
Tell me which one you want, and Ill prepare it.
Then the pipeline is just a pure composition of these two + ONNX ops.
## 7) What This Enables
- Replace ONNX converters / optimizers without touching quantization.
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.

45
docs/refactor_progress.md Normal file
View File

@ -0,0 +1,45 @@
# Refactor Progress Log
## 2026-02-05
- Started modularization refactor per `docs/flow_modularization_notes.md`.
- Goal: introduce backend interfaces, decouple ONNX evaluation, keep behavior stable.
### Planned Steps
1) Create backend interfaces (quantization/compiler, optional evaluator/simulator).
2) Update ONNX/BIE/NEF workers to use backends and make eval optional.
3) Review boundaries and document issues.
### Issues / Risks
- None yet.
## 2026-02-05 Update
- Added backend interfaces under `services/backends`.
- ONNX worker now makes IP evaluation optional via `parameters.enable_evaluate`.
- BIE/NEF workers now call backend interfaces instead of direct `ModelConfig` usage.
### Issues / Risks
- `services/workers/onnx/core.py` now sets `eval_report` to empty string when disabled; check callers if they rely on non-empty.
- Quantization backend supports optional `onnx_model` to avoid duplicate optimization.
## 2026-02-05 Update 2
- Added explicit request flags for evaluator/simulator toggles in worker schemas:
- ONNX: `enable_evaluate`, `enable_sim_fp`
- BIE: `enable_sim_fixed`
- NEF: `enable_sim_hw`
### Issues / Risks
- Simulator flags are defined but not yet wired to execution paths.
## 2026-02-05 Update 3
- Documented worker API flags in `README.md` and `docs/Design.md`.
## 2026-02-05 Update 4
- Set `enable_evaluate` default to `false` to match original Web GUI flow.
- Documented original Web GUI ON/OFF expectations in `README.md` and `docs/Design.md`.
## 2026-02-05 Update 5
- Added ON/OFF comparison table for original Web GUI vs current workers in `README.md` and `docs/Design.md`.
## 2026-02-05 Update 6
- Default `enable_evaluate` in `process_onnx_core` set to `False` to match Web GUI defaults.
- Full worker test set passed (onnx/bie/nef/e2e/e2e-tflite).

View File

@ -0,0 +1 @@
"""Backend interfaces and implementations."""

View File

@ -0,0 +1,26 @@
from __future__ import annotations
from typing import Protocol
class CompilerBackend(Protocol):
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
"""Compile BIE into NEF and return the generated NEF path."""
class KneronCompilerBackend:
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
import ktc
km = ktc.ModelConfig(
kwargs["model_id"],
kwargs["version"],
kwargs["platform"],
bie_path=bie_path,
)
return ktc.compile([km], output_dir=output_dir or None)
def get_compiler_backend(name: str | None = None) -> CompilerBackend:
_ = name
return KneronCompilerBackend()

View File

@ -0,0 +1,26 @@
from __future__ import annotations
from typing import Protocol
class EvaluatorBackend(Protocol):
def evaluate(self, onnx_path: str, **kwargs) -> str:
"""Run IP evaluation and return a report string."""
class KneronEvaluatorBackend:
def evaluate(self, onnx_path: str, **kwargs) -> str:
import ktc
km = ktc.ModelConfig(
kwargs["model_id"],
kwargs["version"],
kwargs["platform"],
onnx_path=onnx_path,
)
return km.evaluate()
def get_evaluator_backend(name: str | None = None) -> EvaluatorBackend:
_ = name
return KneronEvaluatorBackend()

View File

@ -0,0 +1,46 @@
from __future__ import annotations
from typing import Dict, Protocol
class QuantizationBackend(Protocol):
def analyze(
self,
onnx_path: str,
input_mapping: Dict,
output_dir: str,
**kwargs,
) -> str:
"""Run quantization and return the generated BIE path."""
class KneronQuantizationBackend:
def analyze(
self,
onnx_path: str,
input_mapping: Dict,
output_dir: str,
**kwargs,
) -> str:
import ktc
model = kwargs.get("onnx_model")
if model is None:
import onnx
model = onnx.load(onnx_path)
model = ktc.onnx_optimizer.onnx2onnx_flow(model, eliminate_tail=True, opt_matmul=True)
km = ktc.ModelConfig(
kwargs["model_id"],
kwargs["version"],
kwargs["platform"],
onnx_model=model,
)
return km.analysis(input_mapping, output_dir=output_dir)
def get_quantization_backend(name: str | None = None) -> QuantizationBackend:
# Placeholder for future backend selection logic.
_ = name
return KneronQuantizationBackend()

View File

@ -0,0 +1,28 @@
from __future__ import annotations
from typing import Protocol, Sequence
class SimulatorBackend(Protocol):
def simulate(self, input_data: Sequence, **kwargs):
"""Run E2E simulation and return results."""
class KneronSimulatorBackend:
def simulate(self, input_data: Sequence, **kwargs):
import ktc
return ktc.kneron_inference(
input_data,
onnx_file=kwargs.get("onnx_file"),
bie_file=kwargs.get("bie_file"),
nef_file=kwargs.get("nef_file"),
input_names=kwargs.get("input_names"),
platform=kwargs.get("platform"),
model_id=kwargs.get("model_id"),
)
def get_simulator_backend(name: str | None = None) -> SimulatorBackend:
_ = name
return KneronSimulatorBackend()

View File

@ -46,13 +46,6 @@ def process_bie_core(
input_node_height = input_node.type.tensor_type.shape.dim[2].dim_value
input_node_width = input_node.type.tensor_type.shape.dim[3].dim_value
km = ktc.ModelConfig(
parameters["model_id"],
parameters["version"],
parameters["platform"],
onnx_model=model,
)
img_list = []
for dir_path, _, file_names in os.walk(data_dir):
for file_name in file_names:
@ -66,7 +59,18 @@ def process_bie_core(
)
img_list.append(img_data)
bie_model_path = km.analysis({input_node_name: img_list}, output_dir=output_dir or ".")
from services.backends.quantization import get_quantization_backend
backend = get_quantization_backend()
bie_model_path = backend.analyze(
onnx_file_path,
{input_node_name: img_list},
output_dir or ".",
onnx_model=model,
model_id=parameters["model_id"],
version=parameters["version"],
platform=parameters["platform"],
)
if os.path.abspath(bie_model_path) != os.path.abspath(output_path):
# Move to avoid keeping duplicate large binaries on disk.

View File

@ -64,6 +64,10 @@ class BIEProcessRequest(BaseModel):
version: str = Field(..., regex=r'^[0-9a-fA-F]{4}$')
platform: str = Field(..., regex=r'^(520|720|530|630|730)$')
data_dir: str = Field(..., min_length=1)
enable_sim_fixed: bool = Field(
False,
description="Run fixed-point E2E simulation after quantization (not yet wired).",
)
class TaskStatusResponse(BaseModel):
task_id: str

View File

@ -23,16 +23,16 @@ def process_nef_core(
os.environ.setdefault("KTC_WORKDIR", work_dir)
os.environ.setdefault("KTC_SCRIPT_RES", res_dir)
import ktc
from services.backends.compiler import get_compiler_backend
km = ktc.ModelConfig(
parameters["model_id"],
parameters["version"],
parameters["platform"],
bie_path=bie_file_path,
backend = get_compiler_backend()
nef_model_path = backend.compile(
bie_file_path,
output_dir or None,
model_id=parameters["model_id"],
version=parameters["version"],
platform=parameters["platform"],
)
nef_model_path = ktc.compile([km], output_dir=output_dir or None)
if os.path.abspath(nef_model_path) != os.path.abspath(output_path):
# Move to avoid keeping duplicate large binaries on disk.
shutil.move(str(nef_model_path), output_path)

View File

@ -63,6 +63,10 @@ class NEFProcessRequest(BaseModel):
model_id: int = Field(..., ge=1, le=65535)
version: str = Field(..., regex=r'^[0-9a-fA-F]{4}$')
platform: str = Field(..., regex=r'^(520|720|530|630|730)$')
enable_sim_hw: bool = Field(
False,
description="Run hardware E2E simulation after compilation (not yet wired).",
)
class TaskStatusResponse(BaseModel):
task_id: str

View File

@ -36,13 +36,17 @@ def process_onnx_core(
model = ktc.onnx_optimizer.onnx2onnx_flow(model, eliminate_tail=True, opt_matmul=True)
onnx.save(model, output_path)
km = ktc.ModelConfig(
int(parameters["model_id"]),
str(parameters["version"]),
str(parameters["platform"]),
onnx_model=model,
eval_result = ""
if parameters.get("enable_evaluate", False):
from services.backends.evaluator import get_evaluator_backend
evaluator = get_evaluator_backend()
evaluate_result = evaluator.evaluate(
output_path,
model_id=int(parameters["model_id"]),
version=str(parameters["version"]),
platform=str(parameters["platform"]),
)
evaluate_result = km.evaluate()
eval_result = evaluate_result.split(",")[0]
return {

View File

@ -72,6 +72,14 @@ class ONNXProcessRequest(BaseModel):
model_id: int = Field(..., ge=1, le=65535)
version: str = Field(..., regex=r'^[0-9a-fA-F]{4}$')
platform: str = Field(..., regex=r'^(520|720|530|630|730)$')
enable_evaluate: bool = Field(
False,
description="Run IP evaluator (toolchain) after ONNX optimization.",
)
enable_sim_fp: bool = Field(
False,
description="Run floating-point E2E simulation (not yet wired).",
)
class TaskStatusResponse(BaseModel):
task_id: str

View File

@ -43,7 +43,13 @@ def test_worker_flow_e2e_uses_single_workdir():
work_input_file = work_inputs[0]
onnx_output = work_dir / "out.onnx"
onnx_params = {"model_id": 10, "version": "e2e", "platform": "520", "work_dir": str(work_dir)}
onnx_params = {
"model_id": 10,
"version": "e2e",
"platform": "520",
"work_dir": str(work_dir),
"enable_evaluate": False,
}
onnx_result = process_onnx_core(
{"file_path": str(work_input_file)},
str(onnx_output),

View File

@ -43,7 +43,13 @@ def test_worker_flow_e2e_tflite_uses_single_workdir():
work_input_file = work_inputs[0]
onnx_output = work_dir / "out.onnx"
onnx_params = {"model_id": 20, "version": "e2e-tflite", "platform": "520", "work_dir": str(work_dir)}
onnx_params = {
"model_id": 20,
"version": "e2e-tflite",
"platform": "520",
"work_dir": str(work_dir),
"enable_evaluate": False,
}
onnx_result = process_onnx_core(
{"file_path": str(work_input_file)},
str(onnx_output),