kneron_model_converter/docs/flow_modularization_notes.md

330 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Toolchain Flow Modularization Notes (Draft)
Based on `docs/manual_v0.17.2.pdf` and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow.
## 1) Project Plan Review (Your Proposal)
Your plan is sound and low-risk:
1) **拆步驟成模組、先完成完整流程** → 保持可用性、可回歸。
2) **逐一檢討模組是否可改寫/重建** → 聚焦風險最高的依賴點。
This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps.
## 2) Manual-Driven Flow Stages (v0.17.2)
From the manual, the official workflow breaks down cleanly into these steps:
**ONNX Workflow**
- A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite)
- B. ONNX optimization (general optimizer)
- C. Opset upgrade (if needed)
- D. IP evaluation (performance / support check)
- E. E2E simulator check (floating point)
**BIE Workflow**
- F. Quantization (analysis → produce BIE)
- G. E2E simulator check (fixed point)
**NEF Workflow**
- H. Batch compile (BIE → NEF)
- I. E2E simulator check (hardware)
- J. NEF combine (optional)
This mapping is consistent with current repo services:
- `services/workers/onnx/core.py`: A/B (+ D currently via `evaluate`)
- `services/workers/bie/core.py`: F (+ optional G if you add)
- `services/workers/nef/core.py`: H (+ optional I if you add)
## 2.1 Flow Diagram (Mermaid)
```mermaid
flowchart TD
A[Format Model<br/>Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX]
B --> C[ONNX Optimize / Opset Upgrade / Graph Edits]
C --> D{IP Evaluation?}
D -- optional --> E[IP Evaluator Report]
C --> F{E2E FP Sim?}
F -- optional --> G[Float E2E Simulator Check]
C --> H[Quantization / Analysis]
H --> I[BIE Output]
I --> J{E2E Fixed-Point Sim?}
J -- optional --> K[Fixed-Point E2E Check]
I --> L[Compile]
L --> M[NEF Output]
M --> N{E2E Hardware Sim?}
N -- optional --> O[Hardware E2E Check]
M --> P{NEF Combine?}
P -- optional --> Q[Combined NEF]
subgraph OSS-Friendly
A
B
C
end
subgraph Kneron-Dependent
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
end
```
## 3) Recommended Module Split (Initial)
A clean split with minimal coupling and clear replacement points:
### 3.1 Format & Graph Layer (OSS-Friendly)
1. **FormatConverters**
- Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX
- Pure Python; use `libs/ONNX_Convertor` + `libs/kneronnxopt`
2. **OnnxGraphOps**
- optimize (onnx2onnx), opset upgrade, graph editing, shape fixes
- Pure Python, independent from toolchain binaries
### 3.2 Validation & Simulation Layer (Kneron-Dependent)
3. **IPEvaluator**
- `ModelConfig.evaluate()` (toolchain evaluator)
- Coupled to `sys_flow` + prebuilt binaries
- Should be optional plug-in, not hard-dependency of ONNX conversion
4. **E2ESimulator**
- float/fixed/hardware validation (kneron_inference)
- Coupled to Kneron libs; keep as plugin backend
### 3.3 Quantization & Compile Layer (Kneron-Dependent)
5. **QuantizationBackend** (BIE)
- Current: `ModelConfig.analysis()` → sys_flow binaries
- Make a backend interface with a Kneron implementation
6. **CompilerBackend** (NEF)
- Current: `ktc.compile()` → prebuilt compiler
- Same backend interface style; Kneron impl for now
### 3.4 Packaging/Orchestration Layer
7. **Pipeline Orchestrator**
- Defines the sequence and exchange formats (ONNX, BIE, NEF)
- Should not import Kneron libs directly; only through backend interfaces
## 3.1 Module Dependency / Replaceability Matrix
| Module | Inputs | Outputs | Current Dependency | Replaceability | Notes |
|---|---|---|---|---|---|
| FormatConverters | model files | ONNX | `libs/ONNX_Convertor` (OSS) | High | Already OSS; keep isolated. |
| OnnxGraphOps | ONNX | ONNX | `libs/kneronnxopt`, onnx | High | Pure Python, safe to refactor. |
| IPEvaluator | ONNX/BIE | report | `sys_flow` + prebuilt bins | Low | Optional plugin; avoid hard-depend in ONNX flow. |
| E2ESimulator FP | ONNX | results | Kneron inference libs | Low | Optional; keep plugin backend. |
| QuantizationBackend | ONNX + inputs | BIE | `sys_flow` + prebuilt bins | Low | Core Kneron dependency. |
| E2ESimulator Fixed | BIE | results | Kneron inference libs | Low | Optional; can be skipped in web flow. |
| CompilerBackend | BIE | NEF | `compiler/*` prebuilt | Low | Core Kneron dependency. |
| E2ESimulator HW | NEF | results | Kneron inference libs | Low | Optional; likely external toolchain use. |
| NEFCombine | NEF list | NEF | Kneron utils | Medium | Small wrapper; keep separate. |
| Pipeline Orchestrator | modules | end-to-end | None (pure) | High | Ownable; should be OSS-only. |
## 4) Key Risks & Coupling Points (Observed in Repo)
- `ktc.toolchain` calls `sys_flow` / `sys_flow_v2` (hard dependency on prebuilt binaries).
- `ktc.ModelConfig.evaluate/analysis/compile` are all Kneron-specific.
- `services/workers/onnx/core.py` calls `evaluate()` by default → this ties ONNX flow to Kneron.
## 5) Suggested Refactor Sequence (Low Disruption)
**Phase 1: Interface Extraction**
- Introduce two small interfaces:
- `QuantizationBackend` (BIE)
- `CompilerBackend` (NEF)
- Wrap existing Kneron calls as default implementations.
**Phase 2: ONNX Flow Decoupling**
- Make `IPEvaluator` optional in ONNX flow.
- Keep current behavior by default but allow bypass.
**Phase 3: Modular Pipeline Assembly**
- Build a pipeline that composes:
- conversion → optimization → (optional evaluator)
- quantization backend
- compiler backend
**Phase 4: Replaceability Audit**
- For each module, decide if:
- can be OSS (conversion/optimization)
- must remain Kneron backend (quantization/compile)
- can be partially replaced (simulation/eval)
## 5.1 Concrete Refactor Plan (Minimal Interface Changes)
Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping.
### Step 1: Introduce backend interfaces (no behavior change)
Create simple interfaces and wrappers.
- New files:
- `services/backends/quantization.py`
- `services/backends/compiler.py`
- (optional) `services/backends/evaluator.py`
- (optional) `services/backends/simulator.py`
Minimal interface (example):
```python
class QuantizationBackend:
def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ...
class CompilerBackend:
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ...
```
Implement Kneron-backed versions wrapping existing calls:
- `KneronQuantizationBackend``ktc.ModelConfig(...).analysis(...)`
- `KneronCompilerBackend``ktc.compile(...)`
### Step 2: Decouple ONNX flow from evaluator (optional switch)
Modify `services/workers/onnx/core.py`:
- Add parameter `enable_evaluate` (default true to preserve behavior).
- Guard `km.evaluate()` behind the flag.
### Step 3: Replace direct calls in BIE/NEF workers
Modify:
- `services/workers/bie/core.py` to use `QuantizationBackend`.
- `services/workers/nef/core.py` to use `CompilerBackend`.
### Step 4: Optional simulator integration
Add optional steps to workers:
- `enable_sim_fp` in ONNX flow.
- `enable_sim_fixed` in BIE flow.
- `enable_sim_hw` in NEF flow.
These should call a simulator backend; default off.
### Step 5: Pipeline orchestrator (optional)
Add a thin orchestrator module to compose stages:
- `services/pipeline/toolchain_pipeline.py`
- Allows swapping backends from config/env.
### File Touch List (Minimal)
1) `services/workers/onnx/core.py` (optional eval toggle)
2) `services/workers/bie/core.py` (use QuantizationBackend)
3) `services/workers/nef/core.py` (use CompilerBackend)
4) `services/backends/quantization.py` (new)
5) `services/backends/compiler.py` (new)
6) (optional) `services/backends/evaluator.py` (new)
7) (optional) `services/backends/simulator.py` (new)
## 6) Coupling Rules / Extraction Guidelines
Goal: keep module boundaries stable so future swapping does not cascade changes.
### 6.1 Horizontal vs Vertical Coupling (Rule of Thumb)
- **Horizontal coupling = avoid** (core A directly importing core B).
- **Vertical references = allowed** (shared types/config/IO schemas used by all).
Keep shared references narrowly scoped to:
- Interface contracts (Protocol / abstract base classes)
- Common data structures (DTOs / results / error types)
- Configuration schemas and environment keys
### 6.2 What Belongs in Shared vs Module
**Put in shared only if it remains stable across backend swaps.**
- If replacing a backend requires changing the code, it does *not* belong in shared.
**Examples**
- Shared: `QuantizationBackend` interface, `CompilerBackend` interface, `PipelineResult`
- Module: Kneron-specific env setup, sys_flow invocation, output file moving rules
### 6.3 Extracting Logic from Combined Files
If a file currently mixes multiple module responsibilities:
- Extract module-specific logic into the owning module.
- Keep shared file limited to interface + cross-cutting types.
**Decision test**
- *Would this code change if we swapped Kneron backend with OSS backend?*
- Yes → belongs to module backend
- No → can live in shared
### 6.4 Incremental Refactor Guidance
- Dont attempt perfect separation in one pass.
- Move high-risk dependencies first (prebuilt calls, sys_flow usage).
- After each phase, re-check boundaries and adjust.
## 7) Current Structure and Replacement Strategy (As-Is)
Based on the refactor just completed, the effective call chain is:
```
workers (ONNX/BIE/NEF)
-> backends (interfaces + Kneron implementations)
-> ktc (toolchain python API)
-> vendor sys_flow / libs / libs_V2 / prebuilt binaries
```
### 7.1 What this means today
- Workers only depend on **backend interfaces**. They no longer call `ktc.ModelConfig` directly.
- Kneron specifics are concentrated in backend implementations.
- `ktc` still wraps the Kneron toolchain and binaries; that dependency remains, but it is **now isolated**.
### 7.2 How to replace later
1) **Replace backend implementations** (lowest-risk)
- Keep backend interfaces stable.
- Swap `Kneron*Backend` for `Your*Backend` without touching workers.
2) **Keep backend layer, but replace `ktc` calls**
- Modify `Kneron*Backend` to call your own library instead of `ktc`.
- Workers stay unchanged; only backend code moves.
3) **Introduce multiple backends**
- Add `get_*_backend(name=...)` selection based on config/env.
- Allows mixed runs: Kneron for NEF, OSS for ONNX, etc.
### 7.3 Where to implement replacements
- `services/backends/quantization.py`
- `services/backends/compiler.py`
- `services/backends/evaluator.py`
- `services/backends/simulator.py`
## 8) Minimum Viable API Proposal
Keep it minimal to avoid churn:
```python
class QuantizationBackend:
def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str:
"""Return BIE path"""
class CompilerBackend:
def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
"""Return NEF path"""
```
Then the pipeline is just a pure composition of these two + ONNX ops.
## 9) What This Enables
- Replace ONNX converters / optimizers without touching quantization.
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
- Swap in future Kneron versions only inside backend adapters.
- Experiment with alternative quantization or compiler backends.
---
## 10) Next Steps (if you want)
I can draft the following next:
1) A small refactor plan with concrete file edits and minimal API changes.
2) A diagram (Mermaid) of the new modular flow.
3) A compatibility matrix (current vs target dependencies per module).
Tell me which one you want, and Ill prepare it.
Then the pipeline is just a pure composition of these two + ONNX ops.
## 7) What This Enables
- Replace ONNX converters / optimizers without touching quantization.
- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
- Swap in future Kneron versions only inside backend adapters.
- Experiment with alternative quantization or compiler backends.
---
## Next Steps (if you want)
I can draft the following next:
1) A small refactor plan with concrete file edits and minimal API changes.
2) A diagram (Mermaid) of the new modular flow.
3) A compatibility matrix (current vs target dependencies per module).
Tell me which one you want, and Ill prepare it.