kneron_model_converter/docs/flow_modularization_notes.md

12 KiB
Raw Permalink Blame History

Toolchain Flow Modularization Notes (Draft)

Based on docs/manual_v0.17.2.pdf and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow.

1) Project Plan Review (Your Proposal)

Your plan is sound and low-risk:

  1. 拆步驟成模組、先完成完整流程 → 保持可用性、可回歸。
  2. 逐一檢討模組是否可改寫/重建 → 聚焦風險最高的依賴點。

This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps.

2) Manual-Driven Flow Stages (v0.17.2)

From the manual, the official workflow breaks down cleanly into these steps:

ONNX Workflow

  • A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite)
  • B. ONNX optimization (general optimizer)
  • C. Opset upgrade (if needed)
  • D. IP evaluation (performance / support check)
  • E. E2E simulator check (floating point)

BIE Workflow

  • F. Quantization (analysis → produce BIE)
  • G. E2E simulator check (fixed point)

NEF Workflow

  • H. Batch compile (BIE → NEF)
  • I. E2E simulator check (hardware)
  • J. NEF combine (optional)

This mapping is consistent with current repo services:

  • services/workers/onnx/core.py: A/B (+ D currently via evaluate)
  • services/workers/bie/core.py: F (+ optional G if you add)
  • services/workers/nef/core.py: H (+ optional I if you add)

2.1 Flow Diagram (Mermaid)

flowchart TD
  A[Format Model<br/>Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX]
  B --> C[ONNX Optimize / Opset Upgrade / Graph Edits]
  C --> D{IP Evaluation?}
  D -- optional --> E[IP Evaluator Report]
  C --> F{E2E FP Sim?}
  F -- optional --> G[Float E2E Simulator Check]
  C --> H[Quantization / Analysis]
  H --> I[BIE Output]
  I --> J{E2E Fixed-Point Sim?}
  J -- optional --> K[Fixed-Point E2E Check]
  I --> L[Compile]
  L --> M[NEF Output]
  M --> N{E2E Hardware Sim?}
  N -- optional --> O[Hardware E2E Check]
  M --> P{NEF Combine?}
  P -- optional --> Q[Combined NEF]

  subgraph OSS-Friendly
    A
    B
    C
  end

  subgraph Kneron-Dependent
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
  end

A clean split with minimal coupling and clear replacement points:

3.1 Format & Graph Layer (OSS-Friendly)

  1. FormatConverters

    • Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX
    • Pure Python; use libs/ONNX_Convertor + libs/kneronnxopt
  2. OnnxGraphOps

    • optimize (onnx2onnx), opset upgrade, graph editing, shape fixes
    • Pure Python, independent from toolchain binaries

3.2 Validation & Simulation Layer (Kneron-Dependent)

  1. IPEvaluator

    • ModelConfig.evaluate() (toolchain evaluator)
    • Coupled to sys_flow + prebuilt binaries
    • Should be optional plug-in, not hard-dependency of ONNX conversion
  2. E2ESimulator

    • float/fixed/hardware validation (kneron_inference)
    • Coupled to Kneron libs; keep as plugin backend

3.3 Quantization & Compile Layer (Kneron-Dependent)

  1. QuantizationBackend (BIE)

    • Current: ModelConfig.analysis() → sys_flow binaries
    • Make a backend interface with a Kneron implementation
  2. CompilerBackend (NEF)

    • Current: ktc.compile() → prebuilt compiler
    • Same backend interface style; Kneron impl for now

3.4 Packaging/Orchestration Layer

  1. Pipeline Orchestrator
    • Defines the sequence and exchange formats (ONNX, BIE, NEF)
    • Should not import Kneron libs directly; only through backend interfaces

3.1 Module Dependency / Replaceability Matrix

Module Inputs Outputs Current Dependency Replaceability Notes
FormatConverters model files ONNX libs/ONNX_Convertor (OSS) High Already OSS; keep isolated.
OnnxGraphOps ONNX ONNX libs/kneronnxopt, onnx High Pure Python, safe to refactor.
IPEvaluator ONNX/BIE report sys_flow + prebuilt bins Low Optional plugin; avoid hard-depend in ONNX flow.
E2ESimulator FP ONNX results Kneron inference libs Low Optional; keep plugin backend.
QuantizationBackend ONNX + inputs BIE sys_flow + prebuilt bins Low Core Kneron dependency.
E2ESimulator Fixed BIE results Kneron inference libs Low Optional; can be skipped in web flow.
CompilerBackend BIE NEF compiler/* prebuilt Low Core Kneron dependency.
E2ESimulator HW NEF results Kneron inference libs Low Optional; likely external toolchain use.
NEFCombine NEF list NEF Kneron utils Medium Small wrapper; keep separate.
Pipeline Orchestrator modules end-to-end None (pure) High Ownable; should be OSS-only.

4) Key Risks & Coupling Points (Observed in Repo)

  • ktc.toolchain calls sys_flow / sys_flow_v2 (hard dependency on prebuilt binaries).
  • ktc.ModelConfig.evaluate/analysis/compile are all Kneron-specific.
  • services/workers/onnx/core.py calls evaluate() by default → this ties ONNX flow to Kneron.

5) Suggested Refactor Sequence (Low Disruption)

Phase 1: Interface Extraction

  • Introduce two small interfaces:
    • QuantizationBackend (BIE)
    • CompilerBackend (NEF)
  • Wrap existing Kneron calls as default implementations.

Phase 2: ONNX Flow Decoupling

  • Make IPEvaluator optional in ONNX flow.
  • Keep current behavior by default but allow bypass.

Phase 3: Modular Pipeline Assembly

  • Build a pipeline that composes:
    • conversion → optimization → (optional evaluator)
    • quantization backend
    • compiler backend

Phase 4: Replaceability Audit

  • For each module, decide if:
    • can be OSS (conversion/optimization)
    • must remain Kneron backend (quantization/compile)
    • can be partially replaced (simulation/eval)

5.1 Concrete Refactor Plan (Minimal Interface Changes)

Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping.

Step 1: Introduce backend interfaces (no behavior change)

Create simple interfaces and wrappers.

  • New files:
    • services/backends/quantization.py
    • services/backends/compiler.py
    • (optional) services/backends/evaluator.py
    • (optional) services/backends/simulator.py

Minimal interface (example):

class QuantizationBackend:
    def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ...

class CompilerBackend:
    def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ...

Implement Kneron-backed versions wrapping existing calls:

  • KneronQuantizationBackendktc.ModelConfig(...).analysis(...)
  • KneronCompilerBackendktc.compile(...)

Step 2: Decouple ONNX flow from evaluator (optional switch)

Modify services/workers/onnx/core.py:

  • Add parameter enable_evaluate (default true to preserve behavior).
  • Guard km.evaluate() behind the flag.

Step 3: Replace direct calls in BIE/NEF workers

Modify:

  • services/workers/bie/core.py to use QuantizationBackend.
  • services/workers/nef/core.py to use CompilerBackend.

Step 4: Optional simulator integration

Add optional steps to workers:

  • enable_sim_fp in ONNX flow.
  • enable_sim_fixed in BIE flow.
  • enable_sim_hw in NEF flow. These should call a simulator backend; default off.

Step 5: Pipeline orchestrator (optional)

Add a thin orchestrator module to compose stages:

  • services/pipeline/toolchain_pipeline.py
  • Allows swapping backends from config/env.

File Touch List (Minimal)

  1. services/workers/onnx/core.py (optional eval toggle)
  2. services/workers/bie/core.py (use QuantizationBackend)
  3. services/workers/nef/core.py (use CompilerBackend)
  4. services/backends/quantization.py (new)
  5. services/backends/compiler.py (new)
  6. (optional) services/backends/evaluator.py (new)
  7. (optional) services/backends/simulator.py (new)

6) Coupling Rules / Extraction Guidelines

Goal: keep module boundaries stable so future swapping does not cascade changes.

6.1 Horizontal vs Vertical Coupling (Rule of Thumb)

  • Horizontal coupling = avoid (core A directly importing core B).
  • Vertical references = allowed (shared types/config/IO schemas used by all).

Keep shared references narrowly scoped to:

  • Interface contracts (Protocol / abstract base classes)
  • Common data structures (DTOs / results / error types)
  • Configuration schemas and environment keys

6.2 What Belongs in Shared vs Module

Put in shared only if it remains stable across backend swaps.

  • If replacing a backend requires changing the code, it does not belong in shared.

Examples

  • Shared: QuantizationBackend interface, CompilerBackend interface, PipelineResult
  • Module: Kneron-specific env setup, sys_flow invocation, output file moving rules

6.3 Extracting Logic from Combined Files

If a file currently mixes multiple module responsibilities:

  • Extract module-specific logic into the owning module.
  • Keep shared file limited to interface + cross-cutting types.

Decision test

  • Would this code change if we swapped Kneron backend with OSS backend?
    • Yes → belongs to module backend
    • No → can live in shared

6.4 Incremental Refactor Guidance

  • Dont attempt perfect separation in one pass.
  • Move high-risk dependencies first (prebuilt calls, sys_flow usage).
  • After each phase, re-check boundaries and adjust.

7) Current Structure and Replacement Strategy (As-Is)

Based on the refactor just completed, the effective call chain is:

workers (ONNX/BIE/NEF)
  -> backends (interfaces + Kneron implementations)
    -> ktc (toolchain python API)
      -> vendor sys_flow / libs / libs_V2 / prebuilt binaries

7.1 What this means today

  • Workers only depend on backend interfaces. They no longer call ktc.ModelConfig directly.
  • Kneron specifics are concentrated in backend implementations.
  • ktc still wraps the Kneron toolchain and binaries; that dependency remains, but it is now isolated.

7.2 How to replace later

  1. Replace backend implementations (lowest-risk)

    • Keep backend interfaces stable.
    • Swap Kneron*Backend for Your*Backend without touching workers.
  2. Keep backend layer, but replace ktc calls

    • Modify Kneron*Backend to call your own library instead of ktc.
    • Workers stay unchanged; only backend code moves.
  3. Introduce multiple backends

    • Add get_*_backend(name=...) selection based on config/env.
    • Allows mixed runs: Kneron for NEF, OSS for ONNX, etc.

7.3 Where to implement replacements

  • services/backends/quantization.py
  • services/backends/compiler.py
  • services/backends/evaluator.py
  • services/backends/simulator.py

8) Minimum Viable API Proposal

Keep it minimal to avoid churn:

class QuantizationBackend:
    def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str:
        """Return BIE path"""

class CompilerBackend:
    def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
        """Return NEF path"""

Then the pipeline is just a pure composition of these two + ONNX ops.

9) What This Enables

  • Replace ONNX converters / optimizers without touching quantization.
  • Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
  • Swap in future Kneron versions only inside backend adapters.
  • Experiment with alternative quantization or compiler backends.

10) Next Steps (if you want)

I can draft the following next:

  1. A small refactor plan with concrete file edits and minimal API changes.
  2. A diagram (Mermaid) of the new modular flow.
  3. A compatibility matrix (current vs target dependencies per module).

Tell me which one you want, and Ill prepare it.

Then the pipeline is just a pure composition of these two + ONNX ops.

7) What This Enables

  • Replace ONNX converters / optimizers without touching quantization.
  • Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
  • Swap in future Kneron versions only inside backend adapters.
  • Experiment with alternative quantization or compiler backends.

Next Steps (if you want)

I can draft the following next:

  1. A small refactor plan with concrete file edits and minimal API changes.
  2. A diagram (Mermaid) of the new modular flow.
  3. A compatibility matrix (current vs target dependencies per module).

Tell me which one you want, and Ill prepare it.