From bc98456d74ff284d2c00975fc83008c004bb8a71 Mon Sep 17 00:00:00 2001 From: warrenchen Date: Thu, 5 Feb 2026 17:08:17 +0000 Subject: [PATCH] Add initial draft of Toolchain Flow Modularization Notes --- docs/flow_modularization_notes.md | 277 ++++++++++++++++++++++++++++++ 1 file changed, 277 insertions(+) create mode 100644 docs/flow_modularization_notes.md diff --git a/docs/flow_modularization_notes.md b/docs/flow_modularization_notes.md new file mode 100644 index 0000000..83d6d63 --- /dev/null +++ b/docs/flow_modularization_notes.md @@ -0,0 +1,277 @@ +# Toolchain Flow Modularization Notes (Draft) + +Based on `docs/manual_v0.17.2.pdf` and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow. + +## 1) Project Plan Review (Your Proposal) +Your plan is sound and low-risk: +1) **拆步驟成模組、先完成完整流程** → 保持可用性、可回歸。 +2) **逐一檢討模組是否可改寫/重建** → 聚焦風險最高的依賴點。 + +This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps. + +## 2) Manual-Driven Flow Stages (v0.17.2) +From the manual, the official workflow breaks down cleanly into these steps: + +**ONNX Workflow** +- A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite) +- B. ONNX optimization (general optimizer) +- C. Opset upgrade (if needed) +- D. IP evaluation (performance / support check) +- E. E2E simulator check (floating point) + +**BIE Workflow** +- F. Quantization (analysis → produce BIE) +- G. E2E simulator check (fixed point) + +**NEF Workflow** +- H. Batch compile (BIE → NEF) +- I. E2E simulator check (hardware) +- J. NEF combine (optional) + +This mapping is consistent with current repo services: +- `services/workers/onnx/core.py`: A/B (+ D currently via `evaluate`) +- `services/workers/bie/core.py`: F (+ optional G if you add) +- `services/workers/nef/core.py`: H (+ optional I if you add) + +## 2.1 Flow Diagram (Mermaid) +```mermaid +flowchart TD + A[Format Model
Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX] + B --> C[ONNX Optimize / Opset Upgrade / Graph Edits] + C --> D{IP Evaluation?} + D -- optional --> E[IP Evaluator Report] + C --> F{E2E FP Sim?} + F -- optional --> G[Float E2E Simulator Check] + C --> H[Quantization / Analysis] + H --> I[BIE Output] + I --> J{E2E Fixed-Point Sim?} + J -- optional --> K[Fixed-Point E2E Check] + I --> L[Compile] + L --> M[NEF Output] + M --> N{E2E Hardware Sim?} + N -- optional --> O[Hardware E2E Check] + M --> P{NEF Combine?} + P -- optional --> Q[Combined NEF] + + subgraph OSS-Friendly + A + B + C + end + + subgraph Kneron-Dependent + D + E + F + G + H + I + J + K + L + M + N + O + P + Q + end +``` + +## 3) Recommended Module Split (Initial) +A clean split with minimal coupling and clear replacement points: + +### 3.1 Format & Graph Layer (OSS-Friendly) +1. **FormatConverters** + - Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX + - Pure Python; use `libs/ONNX_Convertor` + `libs/kneronnxopt` + +2. **OnnxGraphOps** + - optimize (onnx2onnx), opset upgrade, graph editing, shape fixes + - Pure Python, independent from toolchain binaries + +### 3.2 Validation & Simulation Layer (Kneron-Dependent) +3. **IPEvaluator** + - `ModelConfig.evaluate()` (toolchain evaluator) + - Coupled to `sys_flow` + prebuilt binaries + - Should be optional plug-in, not hard-dependency of ONNX conversion + +4. **E2ESimulator** + - float/fixed/hardware validation (kneron_inference) + - Coupled to Kneron libs; keep as plugin backend + +### 3.3 Quantization & Compile Layer (Kneron-Dependent) +5. **QuantizationBackend** (BIE) + - Current: `ModelConfig.analysis()` → sys_flow binaries + - Make a backend interface with a Kneron implementation + +6. **CompilerBackend** (NEF) + - Current: `ktc.compile()` → prebuilt compiler + - Same backend interface style; Kneron impl for now + +### 3.4 Packaging/Orchestration Layer +7. **Pipeline Orchestrator** + - Defines the sequence and exchange formats (ONNX, BIE, NEF) + - Should not import Kneron libs directly; only through backend interfaces + +## 3.1 Module Dependency / Replaceability Matrix +| Module | Inputs | Outputs | Current Dependency | Replaceability | Notes | +|---|---|---|---|---|---| +| FormatConverters | model files | ONNX | `libs/ONNX_Convertor` (OSS) | High | Already OSS; keep isolated. | +| OnnxGraphOps | ONNX | ONNX | `libs/kneronnxopt`, onnx | High | Pure Python, safe to refactor. | +| IPEvaluator | ONNX/BIE | report | `sys_flow` + prebuilt bins | Low | Optional plugin; avoid hard-depend in ONNX flow. | +| E2ESimulator FP | ONNX | results | Kneron inference libs | Low | Optional; keep plugin backend. | +| QuantizationBackend | ONNX + inputs | BIE | `sys_flow` + prebuilt bins | Low | Core Kneron dependency. | +| E2ESimulator Fixed | BIE | results | Kneron inference libs | Low | Optional; can be skipped in web flow. | +| CompilerBackend | BIE | NEF | `compiler/*` prebuilt | Low | Core Kneron dependency. | +| E2ESimulator HW | NEF | results | Kneron inference libs | Low | Optional; likely external toolchain use. | +| NEFCombine | NEF list | NEF | Kneron utils | Medium | Small wrapper; keep separate. | +| Pipeline Orchestrator | modules | end-to-end | None (pure) | High | Ownable; should be OSS-only. | + +## 4) Key Risks & Coupling Points (Observed in Repo) +- `ktc.toolchain` calls `sys_flow` / `sys_flow_v2` (hard dependency on prebuilt binaries). +- `ktc.ModelConfig.evaluate/analysis/compile` are all Kneron-specific. +- `services/workers/onnx/core.py` calls `evaluate()` by default → this ties ONNX flow to Kneron. + +## 5) Suggested Refactor Sequence (Low Disruption) +**Phase 1: Interface Extraction** +- Introduce two small interfaces: + - `QuantizationBackend` (BIE) + - `CompilerBackend` (NEF) +- Wrap existing Kneron calls as default implementations. + +**Phase 2: ONNX Flow Decoupling** +- Make `IPEvaluator` optional in ONNX flow. +- Keep current behavior by default but allow bypass. + +**Phase 3: Modular Pipeline Assembly** +- Build a pipeline that composes: + - conversion → optimization → (optional evaluator) + - quantization backend + - compiler backend + +**Phase 4: Replaceability Audit** +- For each module, decide if: + - can be OSS (conversion/optimization) + - must remain Kneron backend (quantization/compile) + - can be partially replaced (simulation/eval) + +## 5.1 Concrete Refactor Plan (Minimal Interface Changes) +Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping. + +### Step 1: Introduce backend interfaces (no behavior change) +Create simple interfaces and wrappers. +- New files: + - `services/backends/quantization.py` + - `services/backends/compiler.py` + - (optional) `services/backends/evaluator.py` + - (optional) `services/backends/simulator.py` + +Minimal interface (example): +```python +class QuantizationBackend: + def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ... + +class CompilerBackend: + def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ... +``` + +Implement Kneron-backed versions wrapping existing calls: +- `KneronQuantizationBackend` → `ktc.ModelConfig(...).analysis(...)` +- `KneronCompilerBackend` → `ktc.compile(...)` + +### Step 2: Decouple ONNX flow from evaluator (optional switch) +Modify `services/workers/onnx/core.py`: +- Add parameter `enable_evaluate` (default true to preserve behavior). +- Guard `km.evaluate()` behind the flag. + +### Step 3: Replace direct calls in BIE/NEF workers +Modify: +- `services/workers/bie/core.py` to use `QuantizationBackend`. +- `services/workers/nef/core.py` to use `CompilerBackend`. + +### Step 4: Optional simulator integration +Add optional steps to workers: +- `enable_sim_fp` in ONNX flow. +- `enable_sim_fixed` in BIE flow. +- `enable_sim_hw` in NEF flow. +These should call a simulator backend; default off. + +### Step 5: Pipeline orchestrator (optional) +Add a thin orchestrator module to compose stages: +- `services/pipeline/toolchain_pipeline.py` +- Allows swapping backends from config/env. + +### File Touch List (Minimal) +1) `services/workers/onnx/core.py` (optional eval toggle) +2) `services/workers/bie/core.py` (use QuantizationBackend) +3) `services/workers/nef/core.py` (use CompilerBackend) +4) `services/backends/quantization.py` (new) +5) `services/backends/compiler.py` (new) +6) (optional) `services/backends/evaluator.py` (new) +7) (optional) `services/backends/simulator.py` (new) + +## 6) Coupling Rules / Extraction Guidelines +Goal: keep module boundaries stable so future swapping does not cascade changes. + +### 6.1 Horizontal vs Vertical Coupling (Rule of Thumb) +- **Horizontal coupling = avoid** (core A directly importing core B). +- **Vertical references = allowed** (shared types/config/IO schemas used by all). + +Keep shared references narrowly scoped to: +- Interface contracts (Protocol / abstract base classes) +- Common data structures (DTOs / results / error types) +- Configuration schemas and environment keys + +### 6.2 What Belongs in Shared vs Module +**Put in shared only if it remains stable across backend swaps.** +- If replacing a backend requires changing the code, it does *not* belong in shared. + +**Examples** +- Shared: `QuantizationBackend` interface, `CompilerBackend` interface, `PipelineResult` +- Module: Kneron-specific env setup, sys_flow invocation, output file moving rules + +### 6.3 Extracting Logic from Combined Files +If a file currently mixes multiple module responsibilities: +- Extract module-specific logic into the owning module. +- Keep shared file limited to interface + cross-cutting types. + +**Decision test** +- *Would this code change if we swapped Kneron backend with OSS backend?* + - Yes → belongs to module backend + - No → can live in shared + +### 6.4 Incremental Refactor Guidance +- Don’t attempt perfect separation in one pass. +- Move high-risk dependencies first (prebuilt calls, sys_flow usage). +- After each phase, re-check boundaries and adjust. + +## 6) Minimum Viable API Proposal +Keep it minimal to avoid churn: + +```python +class QuantizationBackend: + def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: + """Return BIE path""" + +class CompilerBackend: + def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: + """Return NEF path""" +``` + +Then the pipeline is just a pure composition of these two + ONNX ops. + +## 7) What This Enables +- Replace ONNX converters / optimizers without touching quantization. +- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries. +- Swap in future Kneron versions only inside backend adapters. +- Experiment with alternative quantization or compiler backends. + +--- + +## Next Steps (if you want) +I can draft the following next: +1) A small refactor plan with concrete file edits and minimal API changes. +2) A diagram (Mermaid) of the new modular flow. +3) A compatibility matrix (current vs target dependencies per module). + +Tell me which one you want, and I’ll prepare it.