Refactor workers to use backend interfaces for quantization, compilation, and evaluation; add optional flags for simulation in request schemas and update documentation accordingly.
This commit is contained in:
parent
bc98456d74
commit
fdebf4db5d
22
README.md
22
README.md
@ -27,6 +27,28 @@ ONNX → BIE → NEF。系統以 Scheduler 為控制面,搭配 Worker Pool 與
|
||||
7) NEF Worker 執行完成
|
||||
8) Scheduler 標記 COMPLETED
|
||||
|
||||
## Worker API Flags(可選)
|
||||
這些旗標用於控制 evaluator / simulator 步驟。皆有預設值,不填不影響既有流程。
|
||||
|
||||
- ONNX `/api/onnx/process`
|
||||
- `enable_evaluate` (default: `false`): 是否執行 IP evaluator(原 Web GUI 流程為 OFF)
|
||||
- `enable_sim_fp` (default: `false`): 是否執行浮點 E2E 模擬(尚未接線)
|
||||
- BIE `/api/bie/process`
|
||||
- `enable_sim_fixed` (default: `false`): 是否執行定點 E2E 模擬(尚未接線)
|
||||
- NEF `/api/nef/process`
|
||||
- `enable_sim_hw` (default: `false`): 是否執行硬體 E2E 模擬(尚未接線)
|
||||
|
||||
## 流程預設開關對照(原 Web GUI vs 現在 Workers)
|
||||
| 步驟 | 原 Web GUI 預設 | 現在 Workers 預設 | 開關 |
|
||||
|---|---|---|---|
|
||||
| ONNX 轉換/最佳化 | ON | ON | 無 |
|
||||
| IP Evaluator | OFF | OFF | `enable_evaluate` |
|
||||
| FP E2E 模擬 | OFF | OFF | `enable_sim_fp` |
|
||||
| BIE 量化 | ON | ON | 無 |
|
||||
| Fixed-Point E2E 模擬 | OFF | OFF | `enable_sim_fixed` |
|
||||
| NEF Compile | ON | ON | 無 |
|
||||
| HW E2E 模擬 | OFF | OFF | `enable_sim_hw` |
|
||||
|
||||
## 非目標
|
||||
- 不做任務持久化
|
||||
- 不做 crash 後 resume
|
||||
|
||||
@ -165,16 +165,34 @@ error:
|
||||
- 輸入:工作目錄下的唯一檔案(不假設檔名 / 副檔名)
|
||||
- 輸出:`out.onnx`
|
||||
- 輸出位置:同一工作目錄
|
||||
- 可選旗標:
|
||||
- `enable_evaluate` (default: `false`):是否執行 IP evaluator(原 Web GUI 流程為 OFF)
|
||||
- `enable_sim_fp` (default: `false`):是否執行浮點 E2E 模擬(尚未接線)
|
||||
|
||||
### 4.1.3 BIE Worker
|
||||
- 輸入:`out.onnx` + `ref_images/*`
|
||||
- 輸出:`out.bie`
|
||||
- 輸出位置:同一工作目錄
|
||||
- 可選旗標:
|
||||
- `enable_sim_fixed` (default: `false`):是否執行定點 E2E 模擬(尚未接線)
|
||||
|
||||
### 4.1.4 NEF Worker
|
||||
- 輸入:`out.bie`
|
||||
- 輸出:`out.nef`
|
||||
- 輸出位置:同一工作目錄
|
||||
- 可選旗標:
|
||||
- `enable_sim_hw` (default: `false`):是否執行硬體 E2E 模擬(尚未接線)
|
||||
|
||||
### 4.1.6 流程預設開關對照(原 Web GUI vs 現在 Workers)
|
||||
| 步驟 | 原 Web GUI 預設 | 現在 Workers 預設 | 開關 |
|
||||
|---|---|---|---|
|
||||
| ONNX 轉換/最佳化 | ON | ON | 無 |
|
||||
| IP Evaluator | OFF | OFF | `enable_evaluate` |
|
||||
| FP E2E 模擬 | OFF | OFF | `enable_sim_fp` |
|
||||
| BIE 量化 | ON | ON | 無 |
|
||||
| Fixed-Point E2E 模擬 | OFF | OFF | `enable_sim_fixed` |
|
||||
| NEF Compile | ON | ON | 無 |
|
||||
| HW E2E 模擬 | OFF | OFF | `enable_sim_hw` |
|
||||
|
||||
### 4.1.5 Core / Toolchain 路徑一致性
|
||||
- Worker 需將工作目錄 path 傳給 core
|
||||
|
||||
@ -245,7 +245,41 @@ If a file currently mixes multiple module responsibilities:
|
||||
- Move high-risk dependencies first (prebuilt calls, sys_flow usage).
|
||||
- After each phase, re-check boundaries and adjust.
|
||||
|
||||
## 6) Minimum Viable API Proposal
|
||||
## 7) Current Structure and Replacement Strategy (As-Is)
|
||||
Based on the refactor just completed, the effective call chain is:
|
||||
|
||||
```
|
||||
workers (ONNX/BIE/NEF)
|
||||
-> backends (interfaces + Kneron implementations)
|
||||
-> ktc (toolchain python API)
|
||||
-> vendor sys_flow / libs / libs_V2 / prebuilt binaries
|
||||
```
|
||||
|
||||
### 7.1 What this means today
|
||||
- Workers only depend on **backend interfaces**. They no longer call `ktc.ModelConfig` directly.
|
||||
- Kneron specifics are concentrated in backend implementations.
|
||||
- `ktc` still wraps the Kneron toolchain and binaries; that dependency remains, but it is **now isolated**.
|
||||
|
||||
### 7.2 How to replace later
|
||||
1) **Replace backend implementations** (lowest-risk)
|
||||
- Keep backend interfaces stable.
|
||||
- Swap `Kneron*Backend` for `Your*Backend` without touching workers.
|
||||
|
||||
2) **Keep backend layer, but replace `ktc` calls**
|
||||
- Modify `Kneron*Backend` to call your own library instead of `ktc`.
|
||||
- Workers stay unchanged; only backend code moves.
|
||||
|
||||
3) **Introduce multiple backends**
|
||||
- Add `get_*_backend(name=...)` selection based on config/env.
|
||||
- Allows mixed runs: Kneron for NEF, OSS for ONNX, etc.
|
||||
|
||||
### 7.3 Where to implement replacements
|
||||
- `services/backends/quantization.py`
|
||||
- `services/backends/compiler.py`
|
||||
- `services/backends/evaluator.py`
|
||||
- `services/backends/simulator.py`
|
||||
|
||||
## 8) Minimum Viable API Proposal
|
||||
Keep it minimal to avoid churn:
|
||||
|
||||
```python
|
||||
@ -260,6 +294,24 @@ class CompilerBackend:
|
||||
|
||||
Then the pipeline is just a pure composition of these two + ONNX ops.
|
||||
|
||||
## 9) What This Enables
|
||||
- Replace ONNX converters / optimizers without touching quantization.
|
||||
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
|
||||
- Swap in future Kneron versions only inside backend adapters.
|
||||
- Experiment with alternative quantization or compiler backends.
|
||||
|
||||
---
|
||||
|
||||
## 10) Next Steps (if you want)
|
||||
I can draft the following next:
|
||||
1) A small refactor plan with concrete file edits and minimal API changes.
|
||||
2) A diagram (Mermaid) of the new modular flow.
|
||||
3) A compatibility matrix (current vs target dependencies per module).
|
||||
|
||||
Tell me which one you want, and I’ll prepare it.
|
||||
|
||||
Then the pipeline is just a pure composition of these two + ONNX ops.
|
||||
|
||||
## 7) What This Enables
|
||||
- Replace ONNX converters / optimizers without touching quantization.
|
||||
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
|
||||
|
||||
45
docs/refactor_progress.md
Normal file
45
docs/refactor_progress.md
Normal file
@ -0,0 +1,45 @@
|
||||
# Refactor Progress Log
|
||||
|
||||
## 2026-02-05
|
||||
- Started modularization refactor per `docs/flow_modularization_notes.md`.
|
||||
- Goal: introduce backend interfaces, decouple ONNX evaluation, keep behavior stable.
|
||||
|
||||
### Planned Steps
|
||||
1) Create backend interfaces (quantization/compiler, optional evaluator/simulator).
|
||||
2) Update ONNX/BIE/NEF workers to use backends and make eval optional.
|
||||
3) Review boundaries and document issues.
|
||||
|
||||
### Issues / Risks
|
||||
- None yet.
|
||||
|
||||
## 2026-02-05 Update
|
||||
- Added backend interfaces under `services/backends`.
|
||||
- ONNX worker now makes IP evaluation optional via `parameters.enable_evaluate`.
|
||||
- BIE/NEF workers now call backend interfaces instead of direct `ModelConfig` usage.
|
||||
|
||||
### Issues / Risks
|
||||
- `services/workers/onnx/core.py` now sets `eval_report` to empty string when disabled; check callers if they rely on non-empty.
|
||||
- Quantization backend supports optional `onnx_model` to avoid duplicate optimization.
|
||||
|
||||
## 2026-02-05 Update 2
|
||||
- Added explicit request flags for evaluator/simulator toggles in worker schemas:
|
||||
- ONNX: `enable_evaluate`, `enable_sim_fp`
|
||||
- BIE: `enable_sim_fixed`
|
||||
- NEF: `enable_sim_hw`
|
||||
|
||||
### Issues / Risks
|
||||
- Simulator flags are defined but not yet wired to execution paths.
|
||||
|
||||
## 2026-02-05 Update 3
|
||||
- Documented worker API flags in `README.md` and `docs/Design.md`.
|
||||
|
||||
## 2026-02-05 Update 4
|
||||
- Set `enable_evaluate` default to `false` to match original Web GUI flow.
|
||||
- Documented original Web GUI ON/OFF expectations in `README.md` and `docs/Design.md`.
|
||||
|
||||
## 2026-02-05 Update 5
|
||||
- Added ON/OFF comparison table for original Web GUI vs current workers in `README.md` and `docs/Design.md`.
|
||||
|
||||
## 2026-02-05 Update 6
|
||||
- Default `enable_evaluate` in `process_onnx_core` set to `False` to match Web GUI defaults.
|
||||
- Full worker test set passed (onnx/bie/nef/e2e/e2e-tflite).
|
||||
1
services/backends/__init__.py
Normal file
1
services/backends/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
"""Backend interfaces and implementations."""
|
||||
26
services/backends/compiler.py
Normal file
26
services/backends/compiler.py
Normal file
@ -0,0 +1,26 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
|
||||
class CompilerBackend(Protocol):
|
||||
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
|
||||
"""Compile BIE into NEF and return the generated NEF path."""
|
||||
|
||||
|
||||
class KneronCompilerBackend:
|
||||
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
|
||||
import ktc
|
||||
|
||||
km = ktc.ModelConfig(
|
||||
kwargs["model_id"],
|
||||
kwargs["version"],
|
||||
kwargs["platform"],
|
||||
bie_path=bie_path,
|
||||
)
|
||||
return ktc.compile([km], output_dir=output_dir or None)
|
||||
|
||||
|
||||
def get_compiler_backend(name: str | None = None) -> CompilerBackend:
|
||||
_ = name
|
||||
return KneronCompilerBackend()
|
||||
26
services/backends/evaluator.py
Normal file
26
services/backends/evaluator.py
Normal file
@ -0,0 +1,26 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
|
||||
class EvaluatorBackend(Protocol):
|
||||
def evaluate(self, onnx_path: str, **kwargs) -> str:
|
||||
"""Run IP evaluation and return a report string."""
|
||||
|
||||
|
||||
class KneronEvaluatorBackend:
|
||||
def evaluate(self, onnx_path: str, **kwargs) -> str:
|
||||
import ktc
|
||||
|
||||
km = ktc.ModelConfig(
|
||||
kwargs["model_id"],
|
||||
kwargs["version"],
|
||||
kwargs["platform"],
|
||||
onnx_path=onnx_path,
|
||||
)
|
||||
return km.evaluate()
|
||||
|
||||
|
||||
def get_evaluator_backend(name: str | None = None) -> EvaluatorBackend:
|
||||
_ = name
|
||||
return KneronEvaluatorBackend()
|
||||
46
services/backends/quantization.py
Normal file
46
services/backends/quantization.py
Normal file
@ -0,0 +1,46 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Dict, Protocol
|
||||
|
||||
|
||||
class QuantizationBackend(Protocol):
|
||||
def analyze(
|
||||
self,
|
||||
onnx_path: str,
|
||||
input_mapping: Dict,
|
||||
output_dir: str,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
"""Run quantization and return the generated BIE path."""
|
||||
|
||||
|
||||
class KneronQuantizationBackend:
|
||||
def analyze(
|
||||
self,
|
||||
onnx_path: str,
|
||||
input_mapping: Dict,
|
||||
output_dir: str,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
import ktc
|
||||
|
||||
model = kwargs.get("onnx_model")
|
||||
if model is None:
|
||||
import onnx
|
||||
|
||||
model = onnx.load(onnx_path)
|
||||
model = ktc.onnx_optimizer.onnx2onnx_flow(model, eliminate_tail=True, opt_matmul=True)
|
||||
|
||||
km = ktc.ModelConfig(
|
||||
kwargs["model_id"],
|
||||
kwargs["version"],
|
||||
kwargs["platform"],
|
||||
onnx_model=model,
|
||||
)
|
||||
return km.analysis(input_mapping, output_dir=output_dir)
|
||||
|
||||
|
||||
def get_quantization_backend(name: str | None = None) -> QuantizationBackend:
|
||||
# Placeholder for future backend selection logic.
|
||||
_ = name
|
||||
return KneronQuantizationBackend()
|
||||
28
services/backends/simulator.py
Normal file
28
services/backends/simulator.py
Normal file
@ -0,0 +1,28 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol, Sequence
|
||||
|
||||
|
||||
class SimulatorBackend(Protocol):
|
||||
def simulate(self, input_data: Sequence, **kwargs):
|
||||
"""Run E2E simulation and return results."""
|
||||
|
||||
|
||||
class KneronSimulatorBackend:
|
||||
def simulate(self, input_data: Sequence, **kwargs):
|
||||
import ktc
|
||||
|
||||
return ktc.kneron_inference(
|
||||
input_data,
|
||||
onnx_file=kwargs.get("onnx_file"),
|
||||
bie_file=kwargs.get("bie_file"),
|
||||
nef_file=kwargs.get("nef_file"),
|
||||
input_names=kwargs.get("input_names"),
|
||||
platform=kwargs.get("platform"),
|
||||
model_id=kwargs.get("model_id"),
|
||||
)
|
||||
|
||||
|
||||
def get_simulator_backend(name: str | None = None) -> SimulatorBackend:
|
||||
_ = name
|
||||
return KneronSimulatorBackend()
|
||||
@ -46,13 +46,6 @@ def process_bie_core(
|
||||
input_node_height = input_node.type.tensor_type.shape.dim[2].dim_value
|
||||
input_node_width = input_node.type.tensor_type.shape.dim[3].dim_value
|
||||
|
||||
km = ktc.ModelConfig(
|
||||
parameters["model_id"],
|
||||
parameters["version"],
|
||||
parameters["platform"],
|
||||
onnx_model=model,
|
||||
)
|
||||
|
||||
img_list = []
|
||||
for dir_path, _, file_names in os.walk(data_dir):
|
||||
for file_name in file_names:
|
||||
@ -66,7 +59,18 @@ def process_bie_core(
|
||||
)
|
||||
img_list.append(img_data)
|
||||
|
||||
bie_model_path = km.analysis({input_node_name: img_list}, output_dir=output_dir or ".")
|
||||
from services.backends.quantization import get_quantization_backend
|
||||
|
||||
backend = get_quantization_backend()
|
||||
bie_model_path = backend.analyze(
|
||||
onnx_file_path,
|
||||
{input_node_name: img_list},
|
||||
output_dir or ".",
|
||||
onnx_model=model,
|
||||
model_id=parameters["model_id"],
|
||||
version=parameters["version"],
|
||||
platform=parameters["platform"],
|
||||
)
|
||||
|
||||
if os.path.abspath(bie_model_path) != os.path.abspath(output_path):
|
||||
# Move to avoid keeping duplicate large binaries on disk.
|
||||
|
||||
@ -64,6 +64,10 @@ class BIEProcessRequest(BaseModel):
|
||||
version: str = Field(..., regex=r'^[0-9a-fA-F]{4}$')
|
||||
platform: str = Field(..., regex=r'^(520|720|530|630|730)$')
|
||||
data_dir: str = Field(..., min_length=1)
|
||||
enable_sim_fixed: bool = Field(
|
||||
False,
|
||||
description="Run fixed-point E2E simulation after quantization (not yet wired).",
|
||||
)
|
||||
|
||||
class TaskStatusResponse(BaseModel):
|
||||
task_id: str
|
||||
|
||||
@ -23,16 +23,16 @@ def process_nef_core(
|
||||
os.environ.setdefault("KTC_WORKDIR", work_dir)
|
||||
os.environ.setdefault("KTC_SCRIPT_RES", res_dir)
|
||||
|
||||
import ktc
|
||||
from services.backends.compiler import get_compiler_backend
|
||||
|
||||
km = ktc.ModelConfig(
|
||||
parameters["model_id"],
|
||||
parameters["version"],
|
||||
parameters["platform"],
|
||||
bie_path=bie_file_path,
|
||||
backend = get_compiler_backend()
|
||||
nef_model_path = backend.compile(
|
||||
bie_file_path,
|
||||
output_dir or None,
|
||||
model_id=parameters["model_id"],
|
||||
version=parameters["version"],
|
||||
platform=parameters["platform"],
|
||||
)
|
||||
|
||||
nef_model_path = ktc.compile([km], output_dir=output_dir or None)
|
||||
if os.path.abspath(nef_model_path) != os.path.abspath(output_path):
|
||||
# Move to avoid keeping duplicate large binaries on disk.
|
||||
shutil.move(str(nef_model_path), output_path)
|
||||
|
||||
@ -63,6 +63,10 @@ class NEFProcessRequest(BaseModel):
|
||||
model_id: int = Field(..., ge=1, le=65535)
|
||||
version: str = Field(..., regex=r'^[0-9a-fA-F]{4}$')
|
||||
platform: str = Field(..., regex=r'^(520|720|530|630|730)$')
|
||||
enable_sim_hw: bool = Field(
|
||||
False,
|
||||
description="Run hardware E2E simulation after compilation (not yet wired).",
|
||||
)
|
||||
|
||||
class TaskStatusResponse(BaseModel):
|
||||
task_id: str
|
||||
|
||||
@ -36,14 +36,18 @@ def process_onnx_core(
|
||||
model = ktc.onnx_optimizer.onnx2onnx_flow(model, eliminate_tail=True, opt_matmul=True)
|
||||
onnx.save(model, output_path)
|
||||
|
||||
km = ktc.ModelConfig(
|
||||
int(parameters["model_id"]),
|
||||
str(parameters["version"]),
|
||||
str(parameters["platform"]),
|
||||
onnx_model=model,
|
||||
)
|
||||
evaluate_result = km.evaluate()
|
||||
eval_result = evaluate_result.split(",")[0]
|
||||
eval_result = ""
|
||||
if parameters.get("enable_evaluate", False):
|
||||
from services.backends.evaluator import get_evaluator_backend
|
||||
|
||||
evaluator = get_evaluator_backend()
|
||||
evaluate_result = evaluator.evaluate(
|
||||
output_path,
|
||||
model_id=int(parameters["model_id"]),
|
||||
version=str(parameters["version"]),
|
||||
platform=str(parameters["platform"]),
|
||||
)
|
||||
eval_result = evaluate_result.split(",")[0]
|
||||
|
||||
return {
|
||||
"file_path": output_path,
|
||||
|
||||
@ -72,6 +72,14 @@ class ONNXProcessRequest(BaseModel):
|
||||
model_id: int = Field(..., ge=1, le=65535)
|
||||
version: str = Field(..., regex=r'^[0-9a-fA-F]{4}$')
|
||||
platform: str = Field(..., regex=r'^(520|720|530|630|730)$')
|
||||
enable_evaluate: bool = Field(
|
||||
False,
|
||||
description="Run IP evaluator (toolchain) after ONNX optimization.",
|
||||
)
|
||||
enable_sim_fp: bool = Field(
|
||||
False,
|
||||
description="Run floating-point E2E simulation (not yet wired).",
|
||||
)
|
||||
|
||||
class TaskStatusResponse(BaseModel):
|
||||
task_id: str
|
||||
|
||||
@ -43,7 +43,13 @@ def test_worker_flow_e2e_uses_single_workdir():
|
||||
work_input_file = work_inputs[0]
|
||||
|
||||
onnx_output = work_dir / "out.onnx"
|
||||
onnx_params = {"model_id": 10, "version": "e2e", "platform": "520", "work_dir": str(work_dir)}
|
||||
onnx_params = {
|
||||
"model_id": 10,
|
||||
"version": "e2e",
|
||||
"platform": "520",
|
||||
"work_dir": str(work_dir),
|
||||
"enable_evaluate": False,
|
||||
}
|
||||
onnx_result = process_onnx_core(
|
||||
{"file_path": str(work_input_file)},
|
||||
str(onnx_output),
|
||||
|
||||
@ -43,7 +43,13 @@ def test_worker_flow_e2e_tflite_uses_single_workdir():
|
||||
work_input_file = work_inputs[0]
|
||||
|
||||
onnx_output = work_dir / "out.onnx"
|
||||
onnx_params = {"model_id": 20, "version": "e2e-tflite", "platform": "520", "work_dir": str(work_dir)}
|
||||
onnx_params = {
|
||||
"model_id": 20,
|
||||
"version": "e2e-tflite",
|
||||
"platform": "520",
|
||||
"work_dir": str(work_dir),
|
||||
"enable_evaluate": False,
|
||||
}
|
||||
onnx_result = process_onnx_core(
|
||||
{"file_path": str(work_input_file)},
|
||||
str(onnx_output),
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user