12 KiB
Toolchain Flow Modularization Notes (Draft)
Based on docs/manual_v0.17.2.pdf and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow.
1) Project Plan Review (Your Proposal)
Your plan is sound and low-risk:
- 拆步驟成模組、先完成完整流程 → 保持可用性、可回歸。
- 逐一檢討模組是否可改寫/重建 → 聚焦風險最高的依賴點。
This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps.
2) Manual-Driven Flow Stages (v0.17.2)
From the manual, the official workflow breaks down cleanly into these steps:
ONNX Workflow
- A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite)
- B. ONNX optimization (general optimizer)
- C. Opset upgrade (if needed)
- D. IP evaluation (performance / support check)
- E. E2E simulator check (floating point)
BIE Workflow
- F. Quantization (analysis → produce BIE)
- G. E2E simulator check (fixed point)
NEF Workflow
- H. Batch compile (BIE → NEF)
- I. E2E simulator check (hardware)
- J. NEF combine (optional)
This mapping is consistent with current repo services:
services/workers/onnx/core.py: A/B (+ D currently viaevaluate)services/workers/bie/core.py: F (+ optional G if you add)services/workers/nef/core.py: H (+ optional I if you add)
2.1 Flow Diagram (Mermaid)
flowchart TD
A[Format Model<br/>Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX]
B --> C[ONNX Optimize / Opset Upgrade / Graph Edits]
C --> D{IP Evaluation?}
D -- optional --> E[IP Evaluator Report]
C --> F{E2E FP Sim?}
F -- optional --> G[Float E2E Simulator Check]
C --> H[Quantization / Analysis]
H --> I[BIE Output]
I --> J{E2E Fixed-Point Sim?}
J -- optional --> K[Fixed-Point E2E Check]
I --> L[Compile]
L --> M[NEF Output]
M --> N{E2E Hardware Sim?}
N -- optional --> O[Hardware E2E Check]
M --> P{NEF Combine?}
P -- optional --> Q[Combined NEF]
subgraph OSS-Friendly
A
B
C
end
subgraph Kneron-Dependent
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
end
3) Recommended Module Split (Initial)
A clean split with minimal coupling and clear replacement points:
3.1 Format & Graph Layer (OSS-Friendly)
-
FormatConverters
- Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX
- Pure Python; use
libs/ONNX_Convertor+libs/kneronnxopt
-
OnnxGraphOps
- optimize (onnx2onnx), opset upgrade, graph editing, shape fixes
- Pure Python, independent from toolchain binaries
3.2 Validation & Simulation Layer (Kneron-Dependent)
-
IPEvaluator
ModelConfig.evaluate()(toolchain evaluator)- Coupled to
sys_flow+ prebuilt binaries - Should be optional plug-in, not hard-dependency of ONNX conversion
-
E2ESimulator
- float/fixed/hardware validation (kneron_inference)
- Coupled to Kneron libs; keep as plugin backend
3.3 Quantization & Compile Layer (Kneron-Dependent)
-
QuantizationBackend (BIE)
- Current:
ModelConfig.analysis()→ sys_flow binaries - Make a backend interface with a Kneron implementation
- Current:
-
CompilerBackend (NEF)
- Current:
ktc.compile()→ prebuilt compiler - Same backend interface style; Kneron impl for now
- Current:
3.4 Packaging/Orchestration Layer
- Pipeline Orchestrator
- Defines the sequence and exchange formats (ONNX, BIE, NEF)
- Should not import Kneron libs directly; only through backend interfaces
3.1 Module Dependency / Replaceability Matrix
| Module | Inputs | Outputs | Current Dependency | Replaceability | Notes |
|---|---|---|---|---|---|
| FormatConverters | model files | ONNX | libs/ONNX_Convertor (OSS) |
High | Already OSS; keep isolated. |
| OnnxGraphOps | ONNX | ONNX | libs/kneronnxopt, onnx |
High | Pure Python, safe to refactor. |
| IPEvaluator | ONNX/BIE | report | sys_flow + prebuilt bins |
Low | Optional plugin; avoid hard-depend in ONNX flow. |
| E2ESimulator FP | ONNX | results | Kneron inference libs | Low | Optional; keep plugin backend. |
| QuantizationBackend | ONNX + inputs | BIE | sys_flow + prebuilt bins |
Low | Core Kneron dependency. |
| E2ESimulator Fixed | BIE | results | Kneron inference libs | Low | Optional; can be skipped in web flow. |
| CompilerBackend | BIE | NEF | compiler/* prebuilt |
Low | Core Kneron dependency. |
| E2ESimulator HW | NEF | results | Kneron inference libs | Low | Optional; likely external toolchain use. |
| NEFCombine | NEF list | NEF | Kneron utils | Medium | Small wrapper; keep separate. |
| Pipeline Orchestrator | modules | end-to-end | None (pure) | High | Ownable; should be OSS-only. |
4) Key Risks & Coupling Points (Observed in Repo)
ktc.toolchaincallssys_flow/sys_flow_v2(hard dependency on prebuilt binaries).ktc.ModelConfig.evaluate/analysis/compileare all Kneron-specific.services/workers/onnx/core.pycallsevaluate()by default → this ties ONNX flow to Kneron.
5) Suggested Refactor Sequence (Low Disruption)
Phase 1: Interface Extraction
- Introduce two small interfaces:
QuantizationBackend(BIE)CompilerBackend(NEF)
- Wrap existing Kneron calls as default implementations.
Phase 2: ONNX Flow Decoupling
- Make
IPEvaluatoroptional in ONNX flow. - Keep current behavior by default but allow bypass.
Phase 3: Modular Pipeline Assembly
- Build a pipeline that composes:
- conversion → optimization → (optional evaluator)
- quantization backend
- compiler backend
Phase 4: Replaceability Audit
- For each module, decide if:
- can be OSS (conversion/optimization)
- must remain Kneron backend (quantization/compile)
- can be partially replaced (simulation/eval)
5.1 Concrete Refactor Plan (Minimal Interface Changes)
Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping.
Step 1: Introduce backend interfaces (no behavior change)
Create simple interfaces and wrappers.
- New files:
services/backends/quantization.pyservices/backends/compiler.py- (optional)
services/backends/evaluator.py - (optional)
services/backends/simulator.py
Minimal interface (example):
class QuantizationBackend:
def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ...
class CompilerBackend:
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ...
Implement Kneron-backed versions wrapping existing calls:
KneronQuantizationBackend→ktc.ModelConfig(...).analysis(...)KneronCompilerBackend→ktc.compile(...)
Step 2: Decouple ONNX flow from evaluator (optional switch)
Modify services/workers/onnx/core.py:
- Add parameter
enable_evaluate(default true to preserve behavior). - Guard
km.evaluate()behind the flag.
Step 3: Replace direct calls in BIE/NEF workers
Modify:
services/workers/bie/core.pyto useQuantizationBackend.services/workers/nef/core.pyto useCompilerBackend.
Step 4: Optional simulator integration
Add optional steps to workers:
enable_sim_fpin ONNX flow.enable_sim_fixedin BIE flow.enable_sim_hwin NEF flow. These should call a simulator backend; default off.
Step 5: Pipeline orchestrator (optional)
Add a thin orchestrator module to compose stages:
services/pipeline/toolchain_pipeline.py- Allows swapping backends from config/env.
File Touch List (Minimal)
services/workers/onnx/core.py(optional eval toggle)services/workers/bie/core.py(use QuantizationBackend)services/workers/nef/core.py(use CompilerBackend)services/backends/quantization.py(new)services/backends/compiler.py(new)- (optional)
services/backends/evaluator.py(new) - (optional)
services/backends/simulator.py(new)
6) Coupling Rules / Extraction Guidelines
Goal: keep module boundaries stable so future swapping does not cascade changes.
6.1 Horizontal vs Vertical Coupling (Rule of Thumb)
- Horizontal coupling = avoid (core A directly importing core B).
- Vertical references = allowed (shared types/config/IO schemas used by all).
Keep shared references narrowly scoped to:
- Interface contracts (Protocol / abstract base classes)
- Common data structures (DTOs / results / error types)
- Configuration schemas and environment keys
6.2 What Belongs in Shared vs Module
Put in shared only if it remains stable across backend swaps.
- If replacing a backend requires changing the code, it does not belong in shared.
Examples
- Shared:
QuantizationBackendinterface,CompilerBackendinterface,PipelineResult - Module: Kneron-specific env setup, sys_flow invocation, output file moving rules
6.3 Extracting Logic from Combined Files
If a file currently mixes multiple module responsibilities:
- Extract module-specific logic into the owning module.
- Keep shared file limited to interface + cross-cutting types.
Decision test
- Would this code change if we swapped Kneron backend with OSS backend?
- Yes → belongs to module backend
- No → can live in shared
6.4 Incremental Refactor Guidance
- Don’t attempt perfect separation in one pass.
- Move high-risk dependencies first (prebuilt calls, sys_flow usage).
- After each phase, re-check boundaries and adjust.
7) Current Structure and Replacement Strategy (As-Is)
Based on the refactor just completed, the effective call chain is:
workers (ONNX/BIE/NEF)
-> backends (interfaces + Kneron implementations)
-> ktc (toolchain python API)
-> vendor sys_flow / libs / libs_V2 / prebuilt binaries
7.1 What this means today
- Workers only depend on backend interfaces. They no longer call
ktc.ModelConfigdirectly. - Kneron specifics are concentrated in backend implementations.
ktcstill wraps the Kneron toolchain and binaries; that dependency remains, but it is now isolated.
7.2 How to replace later
-
Replace backend implementations (lowest-risk)
- Keep backend interfaces stable.
- Swap
Kneron*BackendforYour*Backendwithout touching workers.
-
Keep backend layer, but replace
ktccalls- Modify
Kneron*Backendto call your own library instead ofktc. - Workers stay unchanged; only backend code moves.
- Modify
-
Introduce multiple backends
- Add
get_*_backend(name=...)selection based on config/env. - Allows mixed runs: Kneron for NEF, OSS for ONNX, etc.
- Add
7.3 Where to implement replacements
services/backends/quantization.pyservices/backends/compiler.pyservices/backends/evaluator.pyservices/backends/simulator.py
8) Minimum Viable API Proposal
Keep it minimal to avoid churn:
class QuantizationBackend:
def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str:
"""Return BIE path"""
class CompilerBackend:
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
"""Return NEF path"""
Then the pipeline is just a pure composition of these two + ONNX ops.
9) What This Enables
- Replace ONNX converters / optimizers without touching quantization.
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
- Swap in future Kneron versions only inside backend adapters.
- Experiment with alternative quantization or compiler backends.
10) Next Steps (if you want)
I can draft the following next:
- A small refactor plan with concrete file edits and minimal API changes.
- A diagram (Mermaid) of the new modular flow.
- A compatibility matrix (current vs target dependencies per module).
Tell me which one you want, and I’ll prepare it.
Then the pipeline is just a pure composition of these two + ONNX ops.
7) What This Enables
- Replace ONNX converters / optimizers without touching quantization.
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
- Swap in future Kneron versions only inside backend adapters.
- Experiment with alternative quantization or compiler backends.
Next Steps (if you want)
I can draft the following next:
- A small refactor plan with concrete file edits and minimal API changes.
- A diagram (Mermaid) of the new modular flow.
- A compatibility matrix (current vs target dependencies per module).
Tell me which one you want, and I’ll prepare it.