# Toolchain Flow Modularization Notes (Draft) Based on `docs/manual_v0.17.2.pdf` and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow. ## 1) Project Plan Review (Your Proposal) Your plan is sound and low-risk: 1) **拆步驟成模組、先完成完整流程** → 保持可用性、可回歸。 2) **逐一檢討模組是否可改寫/重建** → 聚焦風險最高的依賴點。 This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps. ## 2) Manual-Driven Flow Stages (v0.17.2) From the manual, the official workflow breaks down cleanly into these steps: **ONNX Workflow** - A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite) - B. ONNX optimization (general optimizer) - C. Opset upgrade (if needed) - D. IP evaluation (performance / support check) - E. E2E simulator check (floating point) **BIE Workflow** - F. Quantization (analysis → produce BIE) - G. E2E simulator check (fixed point) **NEF Workflow** - H. Batch compile (BIE → NEF) - I. E2E simulator check (hardware) - J. NEF combine (optional) This mapping is consistent with current repo services: - `services/workers/onnx/core.py`: A/B (+ D currently via `evaluate`) - `services/workers/bie/core.py`: F (+ optional G if you add) - `services/workers/nef/core.py`: H (+ optional I if you add) ## 2.1 Flow Diagram (Mermaid) ```mermaid flowchart TD A[Format Model
Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX] B --> C[ONNX Optimize / Opset Upgrade / Graph Edits] C --> D{IP Evaluation?} D -- optional --> E[IP Evaluator Report] C --> F{E2E FP Sim?} F -- optional --> G[Float E2E Simulator Check] C --> H[Quantization / Analysis] H --> I[BIE Output] I --> J{E2E Fixed-Point Sim?} J -- optional --> K[Fixed-Point E2E Check] I --> L[Compile] L --> M[NEF Output] M --> N{E2E Hardware Sim?} N -- optional --> O[Hardware E2E Check] M --> P{NEF Combine?} P -- optional --> Q[Combined NEF] subgraph OSS-Friendly A B C end subgraph Kneron-Dependent D E F G H I J K L M N O P Q end ``` ## 3) Recommended Module Split (Initial) A clean split with minimal coupling and clear replacement points: ### 3.1 Format & Graph Layer (OSS-Friendly) 1. **FormatConverters** - Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX - Pure Python; use `libs/ONNX_Convertor` + `libs/kneronnxopt` 2. **OnnxGraphOps** - optimize (onnx2onnx), opset upgrade, graph editing, shape fixes - Pure Python, independent from toolchain binaries ### 3.2 Validation & Simulation Layer (Kneron-Dependent) 3. **IPEvaluator** - `ModelConfig.evaluate()` (toolchain evaluator) - Coupled to `sys_flow` + prebuilt binaries - Should be optional plug-in, not hard-dependency of ONNX conversion 4. **E2ESimulator** - float/fixed/hardware validation (kneron_inference) - Coupled to Kneron libs; keep as plugin backend ### 3.3 Quantization & Compile Layer (Kneron-Dependent) 5. **QuantizationBackend** (BIE) - Current: `ModelConfig.analysis()` → sys_flow binaries - Make a backend interface with a Kneron implementation 6. **CompilerBackend** (NEF) - Current: `ktc.compile()` → prebuilt compiler - Same backend interface style; Kneron impl for now ### 3.4 Packaging/Orchestration Layer 7. **Pipeline Orchestrator** - Defines the sequence and exchange formats (ONNX, BIE, NEF) - Should not import Kneron libs directly; only through backend interfaces ## 3.1 Module Dependency / Replaceability Matrix | Module | Inputs | Outputs | Current Dependency | Replaceability | Notes | |---|---|---|---|---|---| | FormatConverters | model files | ONNX | `libs/ONNX_Convertor` (OSS) | High | Already OSS; keep isolated. | | OnnxGraphOps | ONNX | ONNX | `libs/kneronnxopt`, onnx | High | Pure Python, safe to refactor. | | IPEvaluator | ONNX/BIE | report | `sys_flow` + prebuilt bins | Low | Optional plugin; avoid hard-depend in ONNX flow. | | E2ESimulator FP | ONNX | results | Kneron inference libs | Low | Optional; keep plugin backend. | | QuantizationBackend | ONNX + inputs | BIE | `sys_flow` + prebuilt bins | Low | Core Kneron dependency. | | E2ESimulator Fixed | BIE | results | Kneron inference libs | Low | Optional; can be skipped in web flow. | | CompilerBackend | BIE | NEF | `compiler/*` prebuilt | Low | Core Kneron dependency. | | E2ESimulator HW | NEF | results | Kneron inference libs | Low | Optional; likely external toolchain use. | | NEFCombine | NEF list | NEF | Kneron utils | Medium | Small wrapper; keep separate. | | Pipeline Orchestrator | modules | end-to-end | None (pure) | High | Ownable; should be OSS-only. | ## 4) Key Risks & Coupling Points (Observed in Repo) - `ktc.toolchain` calls `sys_flow` / `sys_flow_v2` (hard dependency on prebuilt binaries). - `ktc.ModelConfig.evaluate/analysis/compile` are all Kneron-specific. - `services/workers/onnx/core.py` calls `evaluate()` by default → this ties ONNX flow to Kneron. ## 5) Suggested Refactor Sequence (Low Disruption) **Phase 1: Interface Extraction** - Introduce two small interfaces: - `QuantizationBackend` (BIE) - `CompilerBackend` (NEF) - Wrap existing Kneron calls as default implementations. **Phase 2: ONNX Flow Decoupling** - Make `IPEvaluator` optional in ONNX flow. - Keep current behavior by default but allow bypass. **Phase 3: Modular Pipeline Assembly** - Build a pipeline that composes: - conversion → optimization → (optional evaluator) - quantization backend - compiler backend **Phase 4: Replaceability Audit** - For each module, decide if: - can be OSS (conversion/optimization) - must remain Kneron backend (quantization/compile) - can be partially replaced (simulation/eval) ## 5.1 Concrete Refactor Plan (Minimal Interface Changes) Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping. ### Step 1: Introduce backend interfaces (no behavior change) Create simple interfaces and wrappers. - New files: - `services/backends/quantization.py` - `services/backends/compiler.py` - (optional) `services/backends/evaluator.py` - (optional) `services/backends/simulator.py` Minimal interface (example): ```python class QuantizationBackend: def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ... class CompilerBackend: def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ... ``` Implement Kneron-backed versions wrapping existing calls: - `KneronQuantizationBackend` → `ktc.ModelConfig(...).analysis(...)` - `KneronCompilerBackend` → `ktc.compile(...)` ### Step 2: Decouple ONNX flow from evaluator (optional switch) Modify `services/workers/onnx/core.py`: - Add parameter `enable_evaluate` (default true to preserve behavior). - Guard `km.evaluate()` behind the flag. ### Step 3: Replace direct calls in BIE/NEF workers Modify: - `services/workers/bie/core.py` to use `QuantizationBackend`. - `services/workers/nef/core.py` to use `CompilerBackend`. ### Step 4: Optional simulator integration Add optional steps to workers: - `enable_sim_fp` in ONNX flow. - `enable_sim_fixed` in BIE flow. - `enable_sim_hw` in NEF flow. These should call a simulator backend; default off. ### Step 5: Pipeline orchestrator (optional) Add a thin orchestrator module to compose stages: - `services/pipeline/toolchain_pipeline.py` - Allows swapping backends from config/env. ### File Touch List (Minimal) 1) `services/workers/onnx/core.py` (optional eval toggle) 2) `services/workers/bie/core.py` (use QuantizationBackend) 3) `services/workers/nef/core.py` (use CompilerBackend) 4) `services/backends/quantization.py` (new) 5) `services/backends/compiler.py` (new) 6) (optional) `services/backends/evaluator.py` (new) 7) (optional) `services/backends/simulator.py` (new) ## 6) Coupling Rules / Extraction Guidelines Goal: keep module boundaries stable so future swapping does not cascade changes. ### 6.1 Horizontal vs Vertical Coupling (Rule of Thumb) - **Horizontal coupling = avoid** (core A directly importing core B). - **Vertical references = allowed** (shared types/config/IO schemas used by all). Keep shared references narrowly scoped to: - Interface contracts (Protocol / abstract base classes) - Common data structures (DTOs / results / error types) - Configuration schemas and environment keys ### 6.2 What Belongs in Shared vs Module **Put in shared only if it remains stable across backend swaps.** - If replacing a backend requires changing the code, it does *not* belong in shared. **Examples** - Shared: `QuantizationBackend` interface, `CompilerBackend` interface, `PipelineResult` - Module: Kneron-specific env setup, sys_flow invocation, output file moving rules ### 6.3 Extracting Logic from Combined Files If a file currently mixes multiple module responsibilities: - Extract module-specific logic into the owning module. - Keep shared file limited to interface + cross-cutting types. **Decision test** - *Would this code change if we swapped Kneron backend with OSS backend?* - Yes → belongs to module backend - No → can live in shared ### 6.4 Incremental Refactor Guidance - Don’t attempt perfect separation in one pass. - Move high-risk dependencies first (prebuilt calls, sys_flow usage). - After each phase, re-check boundaries and adjust. ## 7) Current Structure and Replacement Strategy (As-Is) Based on the refactor just completed, the effective call chain is: ``` workers (ONNX/BIE/NEF) -> backends (interfaces + Kneron implementations) -> ktc (toolchain python API) -> vendor sys_flow / libs / libs_V2 / prebuilt binaries ``` ### 7.1 What this means today - Workers only depend on **backend interfaces**. They no longer call `ktc.ModelConfig` directly. - Kneron specifics are concentrated in backend implementations. - `ktc` still wraps the Kneron toolchain and binaries; that dependency remains, but it is **now isolated**. ### 7.2 How to replace later 1) **Replace backend implementations** (lowest-risk) - Keep backend interfaces stable. - Swap `Kneron*Backend` for `Your*Backend` without touching workers. 2) **Keep backend layer, but replace `ktc` calls** - Modify `Kneron*Backend` to call your own library instead of `ktc`. - Workers stay unchanged; only backend code moves. 3) **Introduce multiple backends** - Add `get_*_backend(name=...)` selection based on config/env. - Allows mixed runs: Kneron for NEF, OSS for ONNX, etc. ### 7.3 Where to implement replacements - `services/backends/quantization.py` - `services/backends/compiler.py` - `services/backends/evaluator.py` - `services/backends/simulator.py` ## 8) Minimum Viable API Proposal Keep it minimal to avoid churn: ```python class QuantizationBackend: def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: """Return BIE path""" class CompilerBackend: def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: """Return NEF path""" ``` Then the pipeline is just a pure composition of these two + ONNX ops. ## 9) What This Enables - Replace ONNX converters / optimizers without touching quantization. - Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries. - Swap in future Kneron versions only inside backend adapters. - Experiment with alternative quantization or compiler backends. --- ## 10) Next Steps (if you want) I can draft the following next: 1) A small refactor plan with concrete file edits and minimal API changes. 2) A diagram (Mermaid) of the new modular flow. 3) A compatibility matrix (current vs target dependencies per module). Tell me which one you want, and I’ll prepare it. Then the pipeline is just a pure composition of these two + ONNX ops. ## 7) What This Enables - Replace ONNX converters / optimizers without touching quantization. - Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries. - Swap in future Kneron versions only inside backend adapters. - Experiment with alternative quantization or compiler backends. --- ## Next Steps (if you want) I can draft the following next: 1) A small refactor plan with concrete file edits and minimal API changes. 2) A diagram (Mermaid) of the new modular flow. 3) A compatibility matrix (current vs target dependencies per module). Tell me which one you want, and I’ll prepare it.