warrenchen fdebf4db5d Refactor workers to use backend interfaces for quantization, compilation, and evaluation; add optional flags for simulation in request schemas and update documentation accordingly.

2026-02-06 08:24:08 +00:00

12 KiB

Raw Permalink Blame History

Toolchain Flow Modularization Notes (Draft)

Based on docs/manual_v0.17.2.pdf and the current repo structure, this note outlines a modular decomposition plan, risks, and a staged engineering approach to reduce Kneron prebuilt coupling while preserving the end-to-end flow.

1) Project Plan Review (Your Proposal)

Your plan is sound and low-risk:

拆步驟成模組、先完成完整流程 → 保持可用性、可回歸。
逐一檢討模組是否可改寫/重建 → 聚焦風險最高的依賴點。

This sequencing avoids a “big bang rewrite.” It also lets you replace a single module without breaking downstream steps.

2) Manual-Driven Flow Stages (v0.17.2)

From the manual, the official workflow breaks down cleanly into these steps:

ONNX Workflow

A. Model conversion to ONNX (Keras / PyTorch / Caffe / TFLite)
B. ONNX optimization (general optimizer)
C. Opset upgrade (if needed)
D. IP evaluation (performance / support check)
E. E2E simulator check (floating point)

BIE Workflow

F. Quantization (analysis → produce BIE)
G. E2E simulator check (fixed point)

NEF Workflow

H. Batch compile (BIE → NEF)
I. E2E simulator check (hardware)
J. NEF combine (optional)

This mapping is consistent with current repo services:

services/workers/onnx/core.py: A/B (+ D currently via evaluate)
services/workers/bie/core.py: F (+ optional G if you add)
services/workers/nef/core.py: H (+ optional I if you add)

2.1 Flow Diagram (Mermaid)

flowchart TD
  A[Format Model<br/>Keras/TFLite/Caffe/PyTorch] --> B[Convert to ONNX]
  B --> C[ONNX Optimize / Opset Upgrade / Graph Edits]
  C --> D{IP Evaluation?}
  D -- optional --> E[IP Evaluator Report]
  C --> F{E2E FP Sim?}
  F -- optional --> G[Float E2E Simulator Check]
  C --> H[Quantization / Analysis]
  H --> I[BIE Output]
  I --> J{E2E Fixed-Point Sim?}
  J -- optional --> K[Fixed-Point E2E Check]
  I --> L[Compile]
  L --> M[NEF Output]
  M --> N{E2E Hardware Sim?}
  N -- optional --> O[Hardware E2E Check]
  M --> P{NEF Combine?}
  P -- optional --> Q[Combined NEF]

  subgraph OSS-Friendly
    A
    B
    C
  end

  subgraph Kneron-Dependent
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
  end

3) Recommended Module Split (Initial)

A clean split with minimal coupling and clear replacement points:

3.1 Format & Graph Layer (OSS-Friendly)

FormatConverters
- Keras→ONNX, TFLite→ONNX, Caffe→ONNX, PyTorch-exported ONNX
- Pure Python; use libs/ONNX_Convertor + libs/kneronnxopt
OnnxGraphOps
- optimize (onnx2onnx), opset upgrade, graph editing, shape fixes
- Pure Python, independent from toolchain binaries

3.2 Validation & Simulation Layer (Kneron-Dependent)

IPEvaluator
- ModelConfig.evaluate() (toolchain evaluator)
- Coupled to sys_flow + prebuilt binaries
- Should be optional plug-in, not hard-dependency of ONNX conversion
E2ESimulator
- float/fixed/hardware validation (kneron_inference)
- Coupled to Kneron libs; keep as plugin backend

3.3 Quantization & Compile Layer (Kneron-Dependent)

QuantizationBackend (BIE)
- Current: ModelConfig.analysis() → sys_flow binaries
- Make a backend interface with a Kneron implementation
CompilerBackend (NEF)
- Current: ktc.compile() → prebuilt compiler
- Same backend interface style; Kneron impl for now

3.4 Packaging/Orchestration Layer

Pipeline Orchestrator
- Defines the sequence and exchange formats (ONNX, BIE, NEF)
- Should not import Kneron libs directly; only through backend interfaces

3.1 Module Dependency / Replaceability Matrix

Module	Inputs	Outputs	Current Dependency	Replaceability	Notes
FormatConverters	model files	ONNX	`libs/ONNX_Convertor` (OSS)	High	Already OSS; keep isolated.
OnnxGraphOps	ONNX	ONNX	`libs/kneronnxopt`, onnx	High	Pure Python, safe to refactor.
IPEvaluator	ONNX/BIE	report	`sys_flow` + prebuilt bins	Low	Optional plugin; avoid hard-depend in ONNX flow.
E2ESimulator FP	ONNX	results	Kneron inference libs	Low	Optional; keep plugin backend.
QuantizationBackend	ONNX + inputs	BIE	`sys_flow` + prebuilt bins	Low	Core Kneron dependency.
E2ESimulator Fixed	BIE	results	Kneron inference libs	Low	Optional; can be skipped in web flow.
CompilerBackend	BIE	NEF	`compiler/*` prebuilt	Low	Core Kneron dependency.
E2ESimulator HW	NEF	results	Kneron inference libs	Low	Optional; likely external toolchain use.
NEFCombine	NEF list	NEF	Kneron utils	Medium	Small wrapper; keep separate.
Pipeline Orchestrator	modules	end-to-end	None (pure)	High	Ownable; should be OSS-only.

4) Key Risks & Coupling Points (Observed in Repo)

ktc.toolchain calls sys_flow / sys_flow_v2 (hard dependency on prebuilt binaries).
ktc.ModelConfig.evaluate/analysis/compile are all Kneron-specific.
services/workers/onnx/core.py calls evaluate() by default → this ties ONNX flow to Kneron.

5) Suggested Refactor Sequence (Low Disruption)

Phase 1: Interface Extraction

Introduce two small interfaces:
- QuantizationBackend (BIE)
- CompilerBackend (NEF)
Wrap existing Kneron calls as default implementations.

Phase 2: ONNX Flow Decoupling

Make IPEvaluator optional in ONNX flow.
Keep current behavior by default but allow bypass.

Phase 3: Modular Pipeline Assembly

Build a pipeline that composes:
- conversion → optimization → (optional evaluator)
- quantization backend
- compiler backend

Phase 4: Replaceability Audit

For each module, decide if:
- can be OSS (conversion/optimization)
- must remain Kneron backend (quantization/compile)
- can be partially replaced (simulation/eval)

5.1 Concrete Refactor Plan (Minimal Interface Changes)

Goal: preserve current behavior but make evaluation/simulation optional and enable backend swapping.

Step 1: Introduce backend interfaces (no behavior change)

Create simple interfaces and wrappers.

New files:
- services/backends/quantization.py
- services/backends/compiler.py
- (optional) services/backends/evaluator.py
- (optional) services/backends/simulator.py

Minimal interface (example):

class QuantizationBackend:
    def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str: ...

class CompilerBackend:
    def compile(self, bie_path: str, output_dir: str, **kwargs) -> str: ...

Implement Kneron-backed versions wrapping existing calls:

KneronQuantizationBackend → ktc.ModelConfig(...).analysis(...)
KneronCompilerBackend → ktc.compile(...)

Step 2: Decouple ONNX flow from evaluator (optional switch)

Modify services/workers/onnx/core.py:

Add parameter enable_evaluate (default true to preserve behavior).
Guard km.evaluate() behind the flag.

Step 3: Replace direct calls in BIE/NEF workers

Modify:

services/workers/bie/core.py to use QuantizationBackend.
services/workers/nef/core.py to use CompilerBackend.

Step 4: Optional simulator integration

Add optional steps to workers:

enable_sim_fp in ONNX flow.
enable_sim_fixed in BIE flow.
enable_sim_hw in NEF flow. These should call a simulator backend; default off.

Step 5: Pipeline orchestrator (optional)

Add a thin orchestrator module to compose stages:

services/pipeline/toolchain_pipeline.py
Allows swapping backends from config/env.

File Touch List (Minimal)

services/workers/onnx/core.py (optional eval toggle)
services/workers/bie/core.py (use QuantizationBackend)
services/workers/nef/core.py (use CompilerBackend)
services/backends/quantization.py (new)
services/backends/compiler.py (new)
(optional) services/backends/evaluator.py (new)
(optional) services/backends/simulator.py (new)

6) Coupling Rules / Extraction Guidelines

Goal: keep module boundaries stable so future swapping does not cascade changes.

6.1 Horizontal vs Vertical Coupling (Rule of Thumb)

Horizontal coupling = avoid (core A directly importing core B).
Vertical references = allowed (shared types/config/IO schemas used by all).

Keep shared references narrowly scoped to:

Interface contracts (Protocol / abstract base classes)
Common data structures (DTOs / results / error types)
Configuration schemas and environment keys

6.2 What Belongs in Shared vs Module

Put in shared only if it remains stable across backend swaps.

If replacing a backend requires changing the code, it does not belong in shared.

Examples

Shared: QuantizationBackend interface, CompilerBackend interface, PipelineResult
Module: Kneron-specific env setup, sys_flow invocation, output file moving rules

6.3 Extracting Logic from Combined Files

If a file currently mixes multiple module responsibilities:

Extract module-specific logic into the owning module.
Keep shared file limited to interface + cross-cutting types.

Decision test

Would this code change if we swapped Kneron backend with OSS backend?
- Yes → belongs to module backend
- No → can live in shared

6.4 Incremental Refactor Guidance

Don’t attempt perfect separation in one pass.
Move high-risk dependencies first (prebuilt calls, sys_flow usage).
After each phase, re-check boundaries and adjust.

7) Current Structure and Replacement Strategy (As-Is)

Based on the refactor just completed, the effective call chain is:

workers (ONNX/BIE/NEF)
  -> backends (interfaces + Kneron implementations)
    -> ktc (toolchain python API)
      -> vendor sys_flow / libs / libs_V2 / prebuilt binaries

7.1 What this means today

Workers only depend on backend interfaces. They no longer call ktc.ModelConfig directly.
Kneron specifics are concentrated in backend implementations.
ktc still wraps the Kneron toolchain and binaries; that dependency remains, but it is now isolated.

7.2 How to replace later

Replace backend implementations (lowest-risk)
- Keep backend interfaces stable.
- Swap Kneron*Backend for Your*Backend without touching workers.
Keep backend layer, but replace ktc calls
- Modify Kneron*Backend to call your own library instead of ktc.
- Workers stay unchanged; only backend code moves.
Introduce multiple backends
- Add get_*_backend(name=...) selection based on config/env.
- Allows mixed runs: Kneron for NEF, OSS for ONNX, etc.

7.3 Where to implement replacements

services/backends/quantization.py
services/backends/compiler.py
services/backends/evaluator.py
services/backends/simulator.py

8) Minimum Viable API Proposal

Keep it minimal to avoid churn:

class QuantizationBackend:
    def analyze(self, onnx_path: str, input_mapping: dict, output_dir: str, **kwargs) -> str:
        """Return BIE path"""

class CompilerBackend:
    def compile(self, bie_path: str, output_dir: str, **kwargs) -> str:
        """Return NEF path"""

Then the pipeline is just a pure composition of these two + ONNX ops.

9) What This Enables

Replace ONNX converters / optimizers without touching quantization.
Run ONNX flow in pure OSS environments (CI, dev) without Kneron binaries.
Swap in future Kneron versions only inside backend adapters.
Experiment with alternative quantization or compiler backends.

10) Next Steps (if you want)

I can draft the following next:

A small refactor plan with concrete file edits and minimal API changes.
A diagram (Mermaid) of the new modular flow.
A compatibility matrix (current vs target dependencies per module).

Tell me which one you want, and I’ll prepare it.