From bc98456d74ff284d2c00975fc83008c004bb8a71 Mon Sep 17 00:00:00 2001
From: warrenchen <warrenchen@innovedus.com>
Date: Thu, 5 Feb 2026 17:08:17 +0000
Subject: [PATCH] Add initial draft of Toolchain Flow Modularization Notes

---
 docs/flow_modularization_notes.md | 277 ++++++++++++++++++++++++++++++
 1 file changed, 277 insertions(+)
 create mode 100644 docs/flow_modularization_notes.md
diff --git a/docs/flow_modularization_notes.md b/docs/flow_modularization_notes.md
new file mode 100644
index 0000000..83d6d63
--- /dev/null
+++ b/docs/flow_modularization_notes.md
@@ -0,0 +1,277 @@
+# Toolchain Flow Modularization Notes (Draft)
+
+Based on `docs/manual_v0.17.2.pdf` and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow.
+
+## 1) Project Plan Review (Your Proposal)
+Your plan is sound and low-risk:
+1) **拆步驟成模組、先完成完整流程** → 保持可用性、可回歸。
+2) **逐一檢討模組是否可改寫/重建** → 聚焦風險最高的依賴點。
+
+This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps.
+
+## 2) Manual-Driven Flow Stages (v0.17.2)
+From the manual, the official workflow breaks down cleanly into these steps:
+
+**ONNX Workflow**
+- A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite)
+- B. ONNX optimization (general optimizer)
+- C. Opset upgrade (if needed)
+- D. IP evaluation (performance / support check)
+- E. E2E simulator check (floating point)
+
+**BIE Workflow**
+- F. Quantization (analysis → produce BIE)
+- G. E2E simulator check (fixed point)
+
+**NEF Workflow**
+- H. Batch compile (BIE → NEF)
+- I. E2E simulator check (hardware)
+- J. NEF combine (optional)
+
+This mapping is consistent with current repo services:
+- `services/workers/onnx/core.py`: A/B (+ D currently via `evaluate`)
+- `services/workers/bie/core.py`: F (+ optional G if you add)
+- `services/workers/nef/core.py`: H (+ optional I if you add)
+
+## 2.1 Flow Diagram (Mermaid)
+```mermaid
+flowchart TD
+  A[Format Model<br/>Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX]
+  B --> C[ONNX Optimize / Opset Upgrade / Graph Edits]
+  C --> D{IP Evaluation?}
+  D -- optional --> E[IP Evaluator Report]
+  C --> F{E2E FP Sim?}
+  F -- optional --> G[Float E2E Simulator Check]
+  C --> H[Quantization / Analysis]
+  H --> I[BIE Output]
+  I --> J{E2E Fixed-Point Sim?}
+  J -- optional --> K[Fixed-Point E2E Check]
+  I --> L[Compile]
+  L --> M[NEF Output]
+  M --> N{E2E Hardware Sim?}
+  N -- optional --> O[Hardware E2E Check]
+  M --> P{NEF Combine?}
+  P -- optional --> Q[Combined NEF]
+
+  subgraph OSS-Friendly
+    A
+    B
+    C
+  end
+
+  subgraph Kneron-Dependent
+    D
+    E
+    F
+    G
+    H
+    I
+    J
+    K
+    L
+    M
+    N
+    O
+    P
+    Q
+  end
+```
+
+## 3) Recommended Module Split (Initial)
+A clean split with minimal coupling and clear replacement points:
+
+### 3.1 Format & Graph Layer (OSS-Friendly)
+1. **FormatConverters**
+   - Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX
+   - Pure Python; use `libs/ONNX_Convertor` + `libs/kneronnxopt`
+
+2. **OnnxGraphOps**
+   - optimize (onnx2onnx), opset upgrade, graph editing, shape fixes
+   - Pure Python, independent from toolchain binaries
+
+### 3.2 Validation & Simulation Layer (Kneron-Dependent)
+3. **IPEvaluator**
+   - `ModelConfig.evaluate()` (toolchain evaluator)
+   - Coupled to `sys_flow` + prebuilt binaries
+   - Should be optional plug-in, not hard-dependency of ONNX conversion
+
+4. **E2ESimulator**
+   - float/fixed/hardware validation (kneron_inference)
+   - Coupled to Kneron libs; keep as plugin backend
+
+### 3.3 Quantization & Compile Layer (Kneron-Dependent)
+5. **QuantizationBackend** (BIE)
+   - Current: `ModelConfig.analysis()` → sys_flow binaries
+   - Make a backend interface with a Kneron implementation
+
+6. **CompilerBackend** (NEF)
+   - Current: `ktc.compile()` → prebuilt compiler
+   - Same backend interface style; Kneron impl for now
+
+### 3.4 Packaging/Orchestration Layer
+7. **Pipeline Orchestrator**
+   - Defines the sequence and exchange formats (ONNX, BIE, NEF)
+   - Should not import Kneron libs directly; only through backend interfaces
+
+## 3.1 Module Dependency / Replaceability Matrix
+| Module | Inputs | Outputs | Current Dependency | Replaceability | Notes |
+|---|---|---|---|---|---|
+| FormatConverters | model files | ONNX | `libs/ONNX_Convertor` (OSS) | High | Already OSS; keep isolated. |
+| OnnxGraphOps | ONNX | ONNX | `libs/kneronnxopt`, onnx | High | Pure Python, safe to refactor. |
+| IPEvaluator | ONNX/BIE | report | `sys_flow` + prebuilt bins | Low | Optional plugin; avoid hard-depend in ONNX flow. |
+| E2ESimulator FP | ONNX | results | Kneron inference libs | Low | Optional; keep plugin backend. |
+| QuantizationBackend | ONNX + inputs | BIE | `sys_flow` + prebuilt bins | Low | Core Kneron dependency. |
+| E2ESimulator Fixed | BIE | results | Kneron inference libs | Low | Optional; can be skipped in web flow. |
+| CompilerBackend | BIE | NEF | `compiler/*` prebuilt | Low | Core Kneron dependency. |
+| E2ESimulator HW | NEF | results | Kneron inference libs | Low | Optional; likely external toolchain use. |
+| NEFCombine | NEF list | NEF | Kneron utils | Medium | Small wrapper; keep separate. |
+| Pipeline Orchestrator | modules | end-to-end | None (pure) | High | Ownable; should be OSS-only. |
+
+## 4) Key Risks & Coupling Points (Observed in Repo)
+- `ktc.toolchain` calls `sys_flow` / `sys_flow_v2` (hard dependency on prebuilt binaries).
+- `ktc.ModelConfig.evaluate/analysis/compile` are all Kneron-specific.
+- `services/workers/onnx/core.py` calls `evaluate()` by default → this ties ONNX flow to Kneron.
+
+## 5) Suggested Refactor Sequence (Low Disruption)
+**Phase 1: Interface Extraction**
+- Introduce two small interfaces:
+  - `QuantizationBackend` (BIE)
+  - `CompilerBackend` (NEF)
+- Wrap existing Kneron calls as default implementations.
+
+**Phase 2: ONNX Flow Decoupling**
+- Make `IPEvaluator` optional in ONNX flow.
+- Keep current behavior by default but allow bypass.
+
+**Phase 3: Modular Pipeline Assembly**
+- Build a pipeline that composes:
+  - conversion → optimization → (optional evaluator)
+  - quantization backend
+  - compiler backend
+
+**Phase 4: Replaceability Audit**
+- For each module, decide if:
+  - can be OSS (conversion/optimization)
+  - must remain Kneron backend (quantization/compile)
+  - can be partially replaced (simulation/eval)
+
+## 5.1 Concrete Refactor Plan (Minimal Interface Changes)
+Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping.
+
+### Step 1: Introduce backend interfaces (no behavior change)
+Create simple interfaces and wrappers.
+- New files:
+  - `services/backends/quantization.py`
+  - `services/backends/compiler.py`
+  - (optional) `services/backends/evaluator.py`
+  - (optional) `services/backends/simulator.py`
+
+Minimal interface (example):
+```python
+class QuantizationBackend:
+    def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ...
+
+class CompilerBackend:
+    def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ...
+```
+
+Implement Kneron-backed versions wrapping existing calls:
+- `KneronQuantizationBackend` → `ktc.ModelConfig(...).analysis(...)`
+- `KneronCompilerBackend` → `ktc.compile(...)`
+
+### Step 2: Decouple ONNX flow from evaluator (optional switch)
+Modify `services/workers/onnx/core.py`:
+- Add parameter `enable_evaluate` (default true to preserve behavior).
+- Guard `km.evaluate()` behind the flag.
+
+### Step 3: Replace direct calls in BIE/NEF workers
+Modify:
+- `services/workers/bie/core.py` to use `QuantizationBackend`.
+- `services/workers/nef/core.py` to use `CompilerBackend`.
+
+### Step 4: Optional simulator integration
+Add optional steps to workers:
+- `enable_sim_fp` in ONNX flow.
+- `enable_sim_fixed` in BIE flow.
+- `enable_sim_hw` in NEF flow.
+These should call a simulator backend; default off.
+
+### Step 5: Pipeline orchestrator (optional)
+Add a thin orchestrator module to compose stages:
+- `services/pipeline/toolchain_pipeline.py`
+- Allows swapping backends from config/env.
+
+### File Touch List (Minimal)
+1) `services/workers/onnx/core.py` (optional eval toggle)
+2) `services/workers/bie/core.py` (use QuantizationBackend)
+3) `services/workers/nef/core.py` (use CompilerBackend)
+4) `services/backends/quantization.py` (new)
+5) `services/backends/compiler.py` (new)
+6) (optional) `services/backends/evaluator.py` (new)
+7) (optional) `services/backends/simulator.py` (new)
+
+## 6) Coupling Rules / Extraction Guidelines
+Goal: keep module boundaries stable so future swapping does not cascade changes.
+
+### 6.1 Horizontal vs Vertical Coupling (Rule of Thumb)
+- **Horizontal coupling = avoid** (core A directly importing core B).
+- **Vertical references = allowed** (shared types/config/IO schemas used by all).
+
+Keep shared references narrowly scoped to:
+- Interface contracts (Protocol / abstract base classes)
+- Common data structures (DTOs / results / error types)
+- Configuration schemas and environment keys
+
+### 6.2 What Belongs in Shared vs Module
+**Put in shared only if it remains stable across backend swaps.**
+- If replacing a backend requires changing the code, it does *not* belong in shared.
+
+**Examples**
+- Shared: `QuantizationBackend` interface, `CompilerBackend` interface, `PipelineResult`
+- Module: Kneron-specific env setup, sys_flow invocation, output file moving rules
+
+### 6.3 Extracting Logic from Combined Files
+If a file currently mixes multiple module responsibilities:
+- Extract module-specific logic into the owning module.
+- Keep shared file limited to interface + cross-cutting types.
+
+**Decision test**
+- *Would this code change if we swapped Kneron backend with OSS backend?*  
+  - Yes → belongs to module backend  
+  - No → can live in shared
+
+### 6.4 Incremental Refactor Guidance
+- Don’t attempt perfect separation in one pass.
+- Move high-risk dependencies first (prebuilt calls, sys_flow usage).
+- After each phase, re-check boundaries and adjust.
+
+## 6) Minimum Viable API Proposal
+Keep it minimal to avoid churn:
+
+```python
+class QuantizationBackend:
+    def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str:
+        """Return BIE path"""
+
+class CompilerBackend:
+    def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
+        """Return NEF path"""
+```
+
+Then the pipeline is just a pure composition of these two + ONNX ops.
+
+## 7) What This Enables
+- Replace ONNX converters / optimizers without touching quantization.
+- Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
+- Swap in future Kneron versions only inside backend adapters.
+- Experiment with alternative quantization or compiler backends.
+
+---
+
+## Next Steps (if you want)
+I can draft the following next:
+1) A small refactor plan with concrete file edits and minimal API changes.
+2) A diagram (Mermaid) of the new modular flow.
+3) A compatibility matrix (current vs target dependencies per module).
+
+Tell me which one you want, and I’ll prepare it.