feat: add golf dataset, kneron configs, and tools

- Add golf1/2/4/7/8 dataset classes for semantic segmentation - Add kneron-specific configs (meconfig series, kn_stdc1_golf4class) - Organize scripts into tools/check/ and tools/kneron/ - Add kneron_preprocessing module - Update README with quick-start guide - Update .gitignore to exclude data dirs, onnx, nef outputs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
doc: Update stdc_step_by_step.md
2026-03-18 13:14:30 +08:00 · 2022-05-05 22:09:25 +08:00 · 2022-05-05 22:09:25 +08:00 · 2022-05-05 22:09:25 +08:00 · 2022-05-05 22:09:25 +08:00 · 2022-05-05 22:09:25 +08:00
64 changed files with 7906 additions and 500 deletions
--- a/.gitignore
+++ b/.gitignore
@ -117,3 +117,20 @@ mmseg/.mim

 # Pytorch
 *.pth
+
+# ONNX / NEF compiled outputs
+*.onnx
+*.nef
+batch_compile_out/
+conbinenef/
+
+# Local data directories
+data4/
+data50/
+data512/
+data724362/
+testdata/
+
+# Misc
+envs.txt
+.claude/
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@ -1,7 +1,29 @@
 stages:
-  - linting
+  - init
+  - test

 lint:
-  stage: linting
+  stage: init
  script:
    - flake8
+    - interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-regex "__repr__" --fail-under 50 mmseg
+
+
+build:
+  stage: init
+  script:
+    - python setup.py check -m -s
+    - python -m pip install -e .
+
+unit-test:
+  stage: test
+  script:
+    - python -m coverage run --branch --source mmseg -m pytest tests/
+    - python -m coverage xml
+    - python -m coverage report -m
+  coverage: '/TOTAL.*\s([.\d]+)%/'
+
+integration-test:
+  stage: test
+  script:
+    - echo "[WIP] This job examines integration tests (typically Kneron's)."
--- a/README.md
+++ b/README.md
@ -1,70 +1,62 @@
-# Kneron AI Training/Deployment Platform (mmsegmentation-based)
+# STDC GolfAce — Semantic Segmentation on Kneron

+## 快速開始

-## Introduction
+### 環境安裝

-  [kneron-mmsegmentation](https://github.com/kneron/kneron-mmsegmentation) is a platform built upon the well-known [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) for mmsegmentation. If you are looking for original mmsegmentation document, please visit [mmsegmentation docs](https://mmsegmentation.readthedocs.io/en/latest/) for detailed mmsegmentation usage.
+```bash
+# 建立與啟動 conda 環境
+conda create -n stdc_golface python=3.8 -y
+conda activate stdc_golface

-  In this repository, we provide an end-to-end training/deployment flow to realize on Kneron's AI accelerators: 
+# 安裝 PyTorch + CUDA 11.3
+conda install pytorch=1.11.0 torchvision=0.12.0 torchaudio cudatoolkit=11.3 -c pytorch -y

-  1. **Training/Evalulation:**
-      - Modified model configuration file and verified for Kneron hardware platform 
-      - Please see [Overview of Benchmark and Model Zoo](#Overview-of-Benchmark-and-Model-Zoo) for Kneron-Verified model list
-  2. **Converting to ONNX:** 
-      - tools/pytorch2onnx_kneron.py (beta)
-      - Export *optimized* and *Kneron-toolchain supported* onnx
-          - Automatically modify model for arbitrary data normalization preprocess
-  3. **Evaluation**
-      - tools/test_kneron.py (beta)
-      - Evaluate the model with *pytorch checkpoint, onnx, and kneron-nef*
-  4. **Testing**
-      - inference_kn (beta)
-      - Verify the converted [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) model on Kneron USB accelerator with this API
-  5. **Converting Kneron-NEF:** (toolchain feature)
-     - Convert the trained pytorch model to [Kneron-NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) model, which could be used on Kneron hardware platform.
+# 安裝 mmcv-full
+pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html

-## License
+# 安裝專案
+pip install -e .

-This project is released under the [Apache 2.0 license](LICENSE).
+# 安裝工具套件
+pip install opencv-python tqdm matplotlib cityscapesscripts yapf==0.31.0
+```

-## Changelog
+### 資料準備

-N/A
+1. 使用 **Roboflow** 匯出資料集，格式選擇 `Semantic Segmentation Masks`
+2. 使用 `seg2city.py` 將 Roboflow 格式轉換為 Cityscapes 格式
+3. 將轉換後的資料放至 `data/cityscapes/`

-## Overview of Benchmark and Kneron Model Zoo
+### 訓練與測試

-| Backbone | Crop Size | Mem (GB) | mIoU | Config | Download |
-|:--------:|:---------:|:--------:|:----:|:------:|:--------:|
-| STDC 1   | 512x1024  | 7.15     | 69.29|[config](https://github.com/kneron/kneron-mmsegmentation/tree/master/configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py)|[model](https://github.com/kneron/Model_Zoo/blob/main/mmsegmentation/stdc_1/latest.zip)
+```bash
+# 訓練
+python tools/train.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py

-NOTE: The performance may slightly differ from the original implementation since the input size is smaller.
+# 測試（輸出視覺化結果）
+python tools/test.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
+    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth \
+    --show-dir work_dirs/vis_results
+```

-## Installation
- Please refer to the Step 1 of [docs_kneron/stdc_step_by_step.md#step-1-environment](docs_kneron/stdc_step_by_step.md) for installation.
- Please refer to [Kneron PLUS - Python: Installation](http://doc.kneron.com/docs/#plus_python/introduction/install_dependency/) for the environment setup for Kneron USB accelerator.
+### 轉換 ONNX / NEF（Kneron Toolchain）

-## Getting Started
-### Tutorial - Kneron Edition
- [STDC-Seg: Step-By-Step](docs_kneron/stdc_step_by_step.md): A tutorial for users to get started easily. To see detailed documents, please see below.
+```bash
+# 啟動 Docker（WSL 環境）
+docker run --rm -it \
+    -v $(wslpath -u 'C:\Users\rd_de\stdc_git'):/workspace/stdc_git \
+    kneron/toolchain:latest

-### Documents - Kneron Edition
- [Kneron ONNX Export] (under development)
- [Kneron Inference] (under development)
- [Kneron Toolchain Step-By-Step (YOLOv3)](http://doc.kneron.com/docs/#toolchain/yolo_example/)
- [Kneron Toolchain Manual](http://doc.kneron.com/docs/#toolchain/manual/#0-overview)
+# 轉換 ONNX
+python tools/pytorch2onnx_kneron.py \
+    configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
+    --checkpoint work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth \
+    --output-file work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx \
+    --verify

-### Original mmsegmentation Documents
- [Original mmsegmentation getting started](https://github.com/open-mmlab/mmsegmentation#getting-started): It is recommended to read the original mmsegmentation getting started documents for other mmsegmentation operations.
- [Original mmsegmentation readthedoc](https://mmsegmentation.readthedocs.io/en/latest/): Original mmsegmentation documents.
+# 將 NEF 複製到本機
+docker cp <container_id>:/data1/kneron_flow/models_630.nef \
+    "C:\Users\rd_de\stdc_git\work_dirs\nef\models_630.nef"
+```

-## Contributing
-[kneron-mmsegmentation](https://github.com/kneron/kneron-mmsegmentation) a platform built upon [OpenMMLab-mmsegmentation](https://github.com/open-mmlab/mmsegmentation)
-
- For issues regarding to the original [mmsegmentation](https://github.com/open-mmlab/mmsegmentation):
-We appreciate all contributions to improve [OpenMMLab-mmsegmentation](https://github.com/open-mmlab/mmsegmentation). Ongoing projects can be found in out [GitHub Projects](https://github.com/open-mmlab/mmsegmentation/projects). Welcome community users to participate in these projects. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
-
- For issues regarding to this repository [kneron-mmsegmentation](https://github.com/kneron/kneron-mmsegmentation): Welcome to leave the comment or submit pull requests here to improve kneron-mmsegmentation
-
-
-## Related Projects
- [kneron-mmdetection](https://github.com/kneron/kneron-mmdetection): Kneron training/deployment platform on [OpenMMLab - mmdetection](https://github.com/open-mmlab/mmdetection) object detection toolbox
--- a/configs/_base_/datasets/kn_cityscapes.py
+++ b/configs/_base_/datasets/kn_cityscapes.py
@ -1,5 +1,6 @@
 # dataset settings
-dataset_type = 'CityscapesDataset'
+#dataset_type = 'CityscapesDataset'
+dataset_type = 'GolfDataset'
 data_root = 'data/cityscapes/'
 img_norm_cfg = dict(
    mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
--- a/configs/_base_/datasets/kn_cityscapes1.py
+++ b/configs/_base_/datasets/kn_cityscapes1.py
@ -0,0 +1,70 @@
+# dataset settings
+dataset_type = 'GolfDataset'
+data_root = 'data/cityscapes0/'  # ✅ 你的資料根目錄
+
+img_norm_cfg = dict(
+    mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (512, 1024)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline
+    ),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline
+    ),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline
+    )
+)
+
+# ✅ 類別與對應的調色盤（不傳給 dataset，用於繪圖/推論可視化）
+classes = ('car', 'grass', 'people', 'road')
+palette = [
+    [246, 14, 135],   # car
+    [233, 81, 78],    # grass
+    [220, 148, 21],   # people
+    [207, 215, 220],  # road
+]
--- a/configs/_base_/datasets/kn_cityscapes2.py
+++ b/configs/_base_/datasets/kn_cityscapes2.py
@ -0,0 +1,71 @@
+# dataset settings
+dataset_type = 'GolfDataset'
+data_root = 'data/cityscapes0/'  # ✅ 你的資料根目錄
+
+img_norm_cfg = dict(
+    mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline
+    ),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline
+    ),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline
+    )
+)
+
+# ✅ 類別與對應的調色盤（不傳給 dataset，用於繪圖/推論可視化）
+classes = ('car', 'grass', 'people', 'road')
+palette = [
+    [246, 14, 135],   # car
+    [233, 81, 78],    # grass
+    [220, 148, 21],   # people
+    [207, 215, 220],  # road
+]
+
--- a/configs/_base_/schedules/schedule_2k.py
+++ b/configs/_base_/schedules/schedule_2k.py
@ -0,0 +1,22 @@
+# optimizer
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+
+# optimizer config
+optimizer_config = dict()
+
+# learning policy
+lr_config = dict(
+    policy='poly',
+    power=0.9,
+    min_lr=1e-4,
+    by_epoch=False
+)
+
+# runtime settings
+runner = dict(type='IterBasedRunner', max_iters=2000)
+
+# checkpoint 每 2000 次儲存一次（最後一次）
+checkpoint_config = dict(by_epoch=False, interval=2000)
+
+# 評估設定，每 2000 次執行一次 mIoU 評估
+evaluation = dict(interval=2000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/kn_stdc1_golf4class.py
+++ b/configs/stdc/kn_stdc1_golf4class.py
@ -0,0 +1,193 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+# ---------------- 模型設定 ---------------- #
+norm_cfg = dict(type='BN', requires_grad=True)
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False,
+            init_cfg=dict(
+                type='Pretrained',
+                checkpoint='https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+            )
+        ),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)
+    ),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=4,  # ✅ 四類
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)
+    ),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=4,  # ✅
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)
+        ),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=4,  # ✅
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)
+        ),
+        dict(
+            type='STDCHead',
+            in_channels=256,
+            channels=64,
+            num_convs=1,
+            num_classes=4,  # ✅ 最重要
+            boundary_threshold=0.1,
+            in_index=0,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=True,
+            loss_decode=[
+                dict(
+                    type='CrossEntropyLoss',
+                    loss_name='loss_ce',
+                    use_sigmoid=True,
+                    loss_weight=1.0),
+                dict(
+                    type='DiceLoss',
+                    loss_name='loss_dice',
+                    loss_weight=1.0)
+            ]
+        )
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+# ---------------- 資料集設定 ---------------- #
+dataset_type = 'GolfDataset'
+data_root = 'data/cityscapes/'
+img_norm_cfg = dict(
+    mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (512, 1024)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1024, 512),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline
+    ),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline
+    ),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline
+    )
+)
+
+# ---------------- 額外設定 ---------------- #
+log_config = dict(
+    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+checkpoint_config = dict(by_epoch=False, interval=1000)
+evaluation = dict(interval=2000, metric='mIoU', pre_eval=True)
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(
+    policy='poly',
+    power=0.9,
+    min_lr=0.0001,
+    by_epoch=False,
+    warmup='linear',
+    warmup_iters=1000)
+runner = dict(type='IterBasedRunner', max_iters=20000)
+cudnn_benchmark = True
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+work_dir = './work_dirs/kn_stdc1_golf4class'
+gpu_ids = [0]
+
+# ✅ 可選：僅供視覺化或 post-processing 用，不會傳給 dataset
+classes = ('car', 'grass', 'people', 'road')
+palette = [
+    [246, 14, 135],   # car
+    [233, 81, 78],    # grass
+    [220, 148, 21],   # people
+    [207, 215, 220],  # road
+]
--- a/configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py
+++ b/configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py
@ -1,14 +1,17 @@
 checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'  # noqa
 _base_ = [
-    '../_base_/models/stdc.py', '../_base_/datasets/kn_cityscapes.py',
-    '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py'
+    '../_base_/models/stdc.py', '../_base_/datasets/kn_cityscapes2.py',
+    '../_base_/default_runtime.py', '../_base_/schedules/schedule_2k.py'
 ]
 lr_config = dict(warmup='linear', warmup_iters=1000)
 data = dict(
-    samples_per_gpu=12,
-    workers_per_gpu=4,
+    samples_per_gpu=2,
+    workers_per_gpu=2,
 )
 model = dict(
    backbone=dict(
        backbone_cfg=dict(
            init_cfg=dict(type='Pretrained', checkpoint=checkpoint))))
+
+
+
--- a/configs/stdc/meconfig.py
+++ b/configs/stdc/meconfig.py
@ -0,0 +1,137 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=4,
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=4,
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=4,
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+dataset_type = 'GolfDataset'
+data_root = 'data/cityscapes0/'
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=80000)
+checkpoint_config = dict(by_epoch=False, interval=2000)
+evaluation = dict(interval=2000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/meconfig1.py
+++ b/configs/stdc/meconfig1.py
@ -0,0 +1,146 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=1,  # ✅ 只分類 grass
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=1,
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=1,
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+# ✅ 更新為你新的 dataset 類別
+dataset_type = 'GrassOnlyDataset'
+data_root = 'data/cityscapes/'
+
+# ✅ 加入 classes 與 palette 定義
+classes = ('grass',)
+palette = [[0, 128, 0]]
+
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=80000)
+checkpoint_config = dict(by_epoch=False, interval=2000)
+evaluation = dict(interval=2000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/meconfig2.py
+++ b/configs/stdc/meconfig2.py
@ -0,0 +1,149 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=2,  # ✅ grass + road
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=2,
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=2,
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+# ✅ 使用 Golf2Dataset (草地與道路)
+dataset_type = 'Golf2Dataset'
+data_root = 'data/cityscapes/'
+
+# ✅ 類別與對應顏色
+classes = ('grass', 'road')
+palette = [
+    [0, 255, 0],     # grass
+    [255, 165, 0],   # road
+]
+
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=80000)
+checkpoint_config = dict(by_epoch=False, interval=2000)
+evaluation = dict(interval=2000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/meconfig4.py
+++ b/configs/stdc/meconfig4.py
@ -0,0 +1,151 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=4,  # ✅ 改為 4 類
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=4,  # ✅ 改為 4 類
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=4,  # ✅ 改為 4 類
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+# ✅ 新 dataset 類別
+dataset_type = 'Golf4Dataset'
+data_root = 'data/cityscapes/'
+
+# ✅ 類別與配色
+classes = ('car', 'grass', 'people', 'road')
+palette = [
+    [0, 0, 128],     # car
+    [0, 255, 0],     # grass
+    [255, 0, 0],     # people
+    [255, 165, 0],   # road
+]
+
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=80000)
+checkpoint_config = dict(by_epoch=False, interval=2000)
+evaluation = dict(interval=2000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/meconfig7.py
+++ b/configs/stdc/meconfig7.py
@ -0,0 +1,137 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=7,
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=7,
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=7,
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+dataset_type = 'Golf8Dataset'
+data_root = 'data/cityscapes/'
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=320000)
+checkpoint_config = dict(by_epoch=False, interval=32000)
+evaluation = dict(interval=32000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/meconfig8.py
+++ b/configs/stdc/meconfig8.py
@ -0,0 +1,137 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/stdc/stdc1_20220308-5368626c.pth'
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=8,  # ✅ 改為 8 類
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=8,  # ✅ 改為 8 類
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=8,  # ✅ 改為 8 類
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+dataset_type = 'Golf8Dataset'  # ✅ 使用 Golf8Dataset
+data_root = 'data/cityscapes/'
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)])
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=320000)
+checkpoint_config = dict(by_epoch=False, interval=32000)
+evaluation = dict(interval=32000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/meconfig8_finetune.py
+++ b/configs/stdc/meconfig8_finetune.py
@ -0,0 +1,147 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=8,  # ✅ 8 類
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=8,  # ✅ 8 類
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=8,  # ✅ 8 類
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+dataset_type = 'Golf8Dataset'
+data_root = 'data/cityscapes/'
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)]
+)
+
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+
+# ✅ Fine-tune 用設定
+load_from = 'C:/Users/rd_de/kneronstdc/work_dirs/meconfig8/0619/latest.pth'
+resume_from = None
+
+
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+# ✅ Fine-tune 推薦學習率
+optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=160000)
+
+checkpoint_config = dict(by_epoch=False, interval=16000)
+evaluation = dict(interval=16000, metric='mIoU', pre_eval=True)
--- a/configs/stdc/test.py
+++ b/configs/stdc/test.py
@ -0,0 +1,147 @@
+norm_cfg = dict(type='BN', requires_grad=True)
+
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='STDCContextPathNet',
+        backbone_cfg=dict(
+            type='STDCNet',
+            stdc_type='STDCNet1',
+            in_channels=3,
+            channels=(32, 64, 256, 512, 1024),
+            bottleneck_type='cat',
+            num_convs=4,
+            norm_cfg=norm_cfg,
+            act_cfg=dict(type='ReLU'),
+            with_final_conv=False),
+        last_in_channels=(1024, 512),
+        out_channels=128,
+        ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
+    decode_head=dict(
+        type='FCNHead',
+        in_channels=256,
+        channels=256,
+        num_convs=1,
+        num_classes=8,  # ✅ 8 類
+        in_index=3,
+        concat_input=False,
+        dropout_ratio=0.1,
+        norm_cfg=norm_cfg,
+        align_corners=True,
+        sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    auxiliary_head=[
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=8,  # ✅ 8 類
+            in_index=2,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+        dict(
+            type='FCNHead',
+            in_channels=128,
+            channels=64,
+            num_convs=1,
+            num_classes=8,  # ✅ 8 類
+            in_index=1,
+            norm_cfg=norm_cfg,
+            concat_input=False,
+            align_corners=False,
+            sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
+            loss_decode=dict(
+                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    ],
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole')
+)
+
+dataset_type = 'Golf8Dataset'
+data_root = 'data/cityscapes/'
+img_norm_cfg = dict(mean=[128., 128., 128.], std=[256., 256., 256.], to_rgb=True)
+crop_size = (360, 720)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(724, 362), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(724, 362),
+        flip=False,
+        transforms=[
+            dict(type='Resize', img_scale=(724, 362), keep_ratio=False),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/train',
+        ann_dir='gtFine/train',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='leftImg8bit/test',
+        ann_dir='gtFine/test',
+        pipeline=test_pipeline)
+)
+
+log_config = dict(
+    interval=50,
+    hooks=[dict(type='TextLoggerHook', by_epoch=False)]
+)
+
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+
+# ✅ Fine-tune 用設定
+load_from = 'C:/Users/rd_de/kneronstdc/work_dirs/meconfig8/0619/latest.pth'
+resume_from = None
+
+
+workflow = [('train', 1)]
+cudnn_benchmark = True
+
+# ✅ Fine-tune 推薦學習率
+optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+runner = dict(type='IterBasedRunner', max_iters=160)
+
+checkpoint_config = dict(by_epoch=False, interval=16)
+evaluation = dict(interval=16, metric='mIoU', pre_eval=True)
--- a/docs_kneron/stdc_step_by_step.md
+++ b/docs_kneron/stdc_step_by_step.md
@ -1,439 +1,449 @@
-# Step 1: Environment
-
-## Step 1-1: Prerequisites
-
- Python 3.6+
- PyTorch 1.3+ (We recommend you installing PyTorch using Conda following the [Official PyTorch Installation Instruction](https://pytorch.org/))
- (Optional) CUDA 9.2+ (If you installed PyTorch with cuda using Conda following the [Official PyTorch Installation Instruction](https://pytorch.org/), you can skip CUDA installation)
- (Optional, used to build from source) GCC 5+
- [mmcv-full](https://mmcv.readthedocs.io/en/latest/#installation) (Note: not `mmcv`!)
-
-**Note:** You need to run `pip uninstall mmcv` first if you have `mmcv` installed.
-If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
-
-## Step 1-2: Install kneron-mmsegmentation
-
-### Step 1-2-1: Install PyTorch
-
-You can follow [Official PyTorch Installation Instruction](https://pytorch.org/) to install PyTorch using Conda:
-
-```shell
-conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
-```
-
-### Step 1-2-2: Install mmcv-full
-
-We recommend you installing mmcv-full using pip:
-
-```shell
-pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
-```
-
-Please replace `cu113` and `torch1.11.0` in the url to your desired one. For example, to install the `mmcv-full` with `CUDA 11.1` and `PyTorch 1.9.0`, use the following command:
-
-```shell
-pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
-```
-
-If you see error messages while installing mmcv-full, please check if your installation instruction matches your installed version of PyTorch and Cuda, and see [MMCV pip Installation Instruction](https://github.com/open-mmlab/mmcv#install-with-pip) for different versions of MMCV compatible to different PyTorch and CUDA versions.
-
-### Step 1-2-3: Clone kneron-mmsegmentation Repository
-
-```shell
-git clone https://github.com/kneron/kneron-mmsegmentation.git
-cd kneron-mmsegmentation
-```
-
-### Step 1-2-4: Install Required Python Libraries for Building and Installing kneron-mmsegmentation
-
-```shell
-pip install -r requirements_kneron.txt
-pip install -v -e .  # or "python setup.py develop"
-```
-
-# Step 2: Training Models on Standard Datasets 
-
-kneron-mmsegmentation provides many existing and existing semantic segmentation models in [Model Zoo](https://mmsegmentation.readthedocs.io/en/latest/model_zoo.html), and supports several standard datasets like CityScapes, Pascal Context, Coco Stuff, ADE20K, etc. Here we demonstrate how to train *STDC-Seg*, a semantic segmentation algorithm, on *CityScapes*, a well-known semantic segmentation dataset.
-
-## Step 2-1: Download CityScapes Dataset
-
-1. Go to [CityScapes Official Website](https://www.cityscapes-dataset.com) and click *Download* link on the top of the page. If you're not logged in, it will navigate you to login page.
-2. If it is the first time you visiting CityScapes website, to download CityScapes dataset, you have to register an account.
-3. Click the *Register* link and it will navigate you to the registeration page.
-4.  Fill in all the *required* fields, accept the terms and conditions, and click the *Register* button. If everything goes well, you will see *Registration Successful* on the page and recieve a registration confirmation mail in your email inbox.
-5.  Click on the link provided in the confirmation mail, login with your newly registered account and password, and you should be able to download the CityScapes dataset.
-6. Download *leftImg8bit_trainvaltest.zip* (images) and *gtFine_trainvaltest.zip* (labels) and place them onto your server.
-
-## Step 2-2: Dataset Preparation
-
-We suggest that you extract the zipped files to somewhere outside the project directory and symlink (`ln`) the dataset root to `kneron-mmsegmentation/data` so you can use the dataset outside this project, as shown below:
-
-```shell
-# Replace all "path/to/your" below with where you want to put the dataset!
-
-# Extracting Cityscapes
-mkdir -p path/to/your/cityscapes
-unzip leftImg8bit_trainvaltest.zip -d path/to/your/cityscapes
-unzip gtFine_trainvaltest.zip -d path/to/your/cityscapes
-
-# symlink dataset to kneron-mmsegmentation/data  # where "kneron-mmsegmentation" is the repository you cloned in step 0-4
-mkdir -p kneron-mmsegmentation/data
-ln -s $(realpath path/to/your/cityscapes) kneron-mmsegmentation/data
-
-# Replace all "path/to/your" above with where you want to put the dataset!
-```
-
-Then, we need *cityscapesScripts* to preprocess the CityScapes dataset. If you completely followed our [Step 1-2-4](#step-1-2-4-install-required-python-libraries-for-building-and-installing-kneron-mmsegmentation), you should have python library *cityscapesScripts* installed (if no, execute `pip install cityscapesScripts` command).
-
-```shell
-# Replace "path/to/your" with where you want to put the dataset!
-export CITYSCAPES_DATASET=$(realpath path/to/your/cityscapes)
-csCreateTrainIdLabelImgs
-```
-
-Wait several minutes and you'll see something like this:
-
-```plain
-Processing 5000 annotation files
-Progress: 100.0 %
-```
-
-The files inside the dataset folder should be something like:
-
-```plain
-kneron-mmsegmentation/data/cityscapes
-├── gtFine
-│   ├── test
-│   │   ├── ...
-│   ├── train
-│   │   ├── ...
-│   ├── val
-│   │   ├── frankfurt
-│   │   │   ├── frankfurt_000000_000294_gtFine_color.png
-│   │   │   ├── frankfurt_000000_000294_gtFine_instanceIds.png
-│   │   │   ├── frankfurt_000000_000294_gtFine_labelIds.png
-│   │   │   ├── frankfurt_000000_000294_gtFine_labelTrainIds.png
-│   │   │   ├── frankfurt_000000_000294_gtFine_polygons.png
-│   │   │   ├── ...
-│   │   ├── ...
-├── leftImg8bit
-│   ├── test
-│   │   ├── ...
-│   ├── train
-│   │   ├── ...
-│   ├── val
-│   │   ├── frankfurt
-│   │   │   ├── frankfurt_000000_000294_leftImg8bit.png
-│   │   ├── ...
-...
-```
-
-It's recommended that you *symlink* the dataset folder to mmdetection folder. However, if you place your dataset folder at different place and do not want to symlink, you have to change the corresponding paths in the config file.
-
-Now the dataset should be ready for training.
-
-
-## Step 2-3: Train STDC-Seg on CityScapes
-
-Short-Term Dense Concatenate Network (STDC network) is a light-weight network structure for convolutional neural network. If we apply this network structure to semantic segmentation task, it's called STDC-Seg. It's first introduced in [Rethinking BiSeNet For Real-time Semantic Segmentation
-](https://arxiv.org/abs/2104.13188). Please check the paper if you want to know the algorithm details.
-
-We only need a configuration file to train a deep learning model in either the original MMSegmentation or kneron-mmsegmentation. STDC-Seg is provided in the original MMSegmentation repository, but the original configuration file needs some modification due to our hardware limitation so that we can apply the trained model to our Kneron dongle. 
-
-To make a configuration file compatible with our device, we have to:
-
-* Change the mean and std value in image normalization to `mean=[128., 128., 128.]` and `std=[256., 256., 256.]`.
-* Shrink the input size during inference phase. The original CityScapes image size is too large (2048(w)x1024(h)) for our device; 1024(w)x512(h) might be good for our device.
-
-To achieve this, you can modify the `img_scale` in `test_pipeline` and `img_norm_cfg` in the configuration file `configs/_base_/datasets/cityscapes.py`. 
-
-Luckily, here in kneron-mmsegmentation, we provide a modified STDC-Seg configuration file (`configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py`) so we can easily apply the trained model to our device.
-
-To train STDC-Seg compatible with our device, just execute:
-
-```shell
-cd kneron-mmsegmentation
-python tools/train.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py
-```
-
-kneron-mmsegmentation will generate `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes` folder and save the configuration file and all checkpoints there.
-
-# Step 3: Test Trained Model
-`tools/test.py` is a script that generates inference results from test set with our pytorch model and evaluates the results to see if our pytorch model is well trained (if `--eval` argument is given). Note that it's always good to evluate our pytorch model before deploying it.
-
-```shell
-python tools/test.py \
-    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
-    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth \
-    --eval mIoU
-```
-* `kn_stdc1_in1k-pre_512x1024_80k_cityscapes/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py` can be your training config.
-* `kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth` can be your model checkpoint.
-
-The expected result of the command above should be something similar to the following text (the numbers may slightly differ):
-```
-...
-+---------------+-------+-------+
-|     Class     |  IoU  |  Acc  |
-+---------------+-------+-------+
-|      road     | 97.49 | 98.59 |
-|    sidewalk   | 80.17 | 88.71 |
-|    building   | 89.52 | 95.25 |
-|      wall     | 57.92 | 66.99 |
-|     fence     |  55.5 | 70.15 |
-|      pole     | 38.93 | 47.51 |
-| traffic light | 49.95 | 59.97 |
-|  traffic sign |  62.1 | 70.05 |
-|   vegetation  | 89.02 | 95.27 |
-|    terrain    | 60.18 | 72.26 |
-|      sky      | 91.84 | 96.34 |
-|     person    | 68.98 | 84.35 |
-|     rider     | 47.79 | 60.98 |
-|      car      | 91.63 | 96.48 |
-|     truck     | 74.31 | 83.52 |
-|      bus      | 80.24 | 86.83 |
-|     train     | 66.45 | 76.78 |
-|   motorcycle  | 48.69 | 58.18 |
-|    bicycle    | 65.81 | 81.68 |
-+---------------+-------+-------+
-Summary:
-
-+------+-------+-------+
-| aAcc |  mIoU |  mAcc |
-+------+-------+-------+
-| 94.3 | 69.29 | 78.42 |
-+------+-------+-------+
-```
-
-**NOTE: The training process might take some time, depending on your computation resource. If you just want to take a quick look at the deployment flow, you can download our pretrained model so you can skip Step 1, 2, and 3:**
-```
-# If you don't want to train your own model:
-mkdir -p work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes
-pushd work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes
-wget https://github.com/kneron/Model_Zoo/raw/main/mmsegmentation/stdc_1/latest.zip
-unzip latest.zip
-popd
-```
-
-# Step 4: Export ONNX and Verify
-
-## Step 4-1: Export ONNX
-
-`tools/pytorch2onnx_kneron.py` is a script provided by kneron-mmsegmentation to help users to convert our trained pytorch model to ONNX:
-```shell
-python tools/pytorch2onnx_kneron.py \
-    configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
-    --checkpoint work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth \
-    --output-file work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx \
-    --verify
-```
-* `configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py` can be your training config.
-* `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth` can be your model checkpoint.
-* `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx` can be any other path. Here for convenience, the ONNX file is placed in the same folder of our pytorch checkpoint.
-
-## Step 4-2: Verify ONNX
-
-`tools/deploy_test_kneron.py` is a script provided by kneron-mmsegmentation to help users to verify if our exported ONNX generates similar outputs with what our PyTorch model does:
-```shell
-python tools/deploy_test_kneron.py \
-    configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
-    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx \
-    --eval mIoU
-```
-* `configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py` can be your training config.
-* `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth` can be your exported ONNX file.
-
-The expected result of the command above should be something similar to the following text (the numbers may slightly differ):
-
-```
-...
-+---------------+-------+-------+
-|     Class     |  IoU  |  Acc  |
-+---------------+-------+-------+
-|      road     | 97.52 | 98.62 |
-|    sidewalk   | 80.59 | 88.69 |
-|    building   | 89.59 | 95.38 |
-|      wall     | 58.02 | 66.85 |
-|     fence     | 55.37 | 69.76 |
-|      pole     |  44.4 | 52.28 |
-| traffic light | 50.23 | 60.07 |
-|  traffic sign | 62.58 | 70.25 |
-|   vegetation  |  89.0 | 95.27 |
-|    terrain    | 60.47 | 72.27 |
-|      sky      | 90.56 | 97.07 |
-|     person    |  70.7 | 84.88 |
-|     rider     | 48.66 | 61.37 |
-|      car      | 91.58 | 95.98 |
-|     truck     | 73.92 | 82.66 |
-|      bus      | 79.92 | 85.95 |
-|     train     | 66.26 | 75.92 |
-|   motorcycle  | 48.88 | 57.91 |
-|    bicycle    |  66.9 |  82.0 |
-+---------------+-------+-------+
-Summary:
-
-+------+-------+-------+
-| aAcc |  mIoU |  mAcc |
-+------+-------+-------+
-| 94.4 | 69.75 | 78.59 |
-+------+-------+-------+
-```
-
-Note that the ONNX results may differ from the PyTorch results due to some implementation differences between PyTorch and ONNXRuntime.
-
-# Step 5: Convert ONNX File to [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) Model for Kneron Platform
- 
-## Step 5-1: Install Kneron toolchain docker:
-
-* Check [Kneron Toolchain Installation Document](http://doc.kneron.com/docs/#toolchain/manual/#1-installation)
-
-## Step 5-2: Mount Kneron toolchain docker
-
-* Mount a folder (e.g. '/mnt/hgfs/Competition') to toolchain docker container as `/data1`. The converted ONNX in Step 3 should be put here. All the toolchain operation should happen in this folder.
-```
-sudo docker run --rm -it -v /mnt/hgfs/Competition:/data1 kneron/toolchain:latest
-```
-
-## Step 5-3: Import KTC and the required libraries in python
-
-```python
-import ktc
-import numpy as np
-import os
-import onnx
-from PIL import Image
-```
-
-## Step 5-4: Optimize the onnx model
-
-```python
-onnx_path = '/data1/latest.onnx'
-m = onnx.load(onnx_path)
-m = ktc.onnx_optimizer.onnx2onnx_flow(m)
-onnx.save(m,'latest.opt.onnx')
-```
-
-## Step 5-5: Configure and load data needed for ktc, and check if onnx is ok for toolchain
-```python 
-# npu (only) performance simulation
-km = ktc.ModelConfig((&)model_id_on_public_field, "0001", "720", onnx_model=m)
-eval_result = km.evaluate()
-print("\nNpu performance evaluation result:\n" + str(eval_result))
-```
-
-## Step 5-6: Quantize the onnx model
-We [sampled 3 images from Cityscapes dataset](https://www.kneron.com/tw/support/education-center/?folder=OpenMMLab%20Kneron%20Edition/misc/&download=41) (3 images) as quantization data. To test our quantized model:
-1. Download the [zip file](https://www.kneron.com/tw/support/education-center/?folder=OpenMMLab%20Kneron%20Edition/misc/&download=41)
-2. Extract the zip file as a folder named `cityscapes_minitest`
-3. Put the `cityscapes_minitest` into docker mounted folder (the path in docker container should be `/data1/cityscapes_minitest`)
-
-The following script will preprocess (should be the same as training code) our quantization data, and put it in a list:
-
-```python
-import os
-from os import walk
-
-img_list = []
-for (dirpath, dirnames, filenames) in walk("/data1/cityscapes_minitest"):
-    for f in filenames:
-        fullpath = os.path.join(dirpath, f)
-        
-        image = Image.open(fullpath)
-        image = image.convert("RGB")
-        image = Image.fromarray(np.array(image)[...,::-1])
-        img_data = np.array(image.resize((1024, 512), Image.BILINEAR)) / 256 - 0.5
-        print(fullpath)
-        img_list.append(img_data)
-```
-
-Then perform quantization. The generated BIE model will put generated at `/data1/output.bie`.
-
-```python
-# fixed-point analysis
-bie_model_path = km.analysis({"input": img_list})
-print("\nFixed-point analysis done. Save bie model to '" + str(bie_model_path) + "'")
-```
-
-## Step 5-7: Compile
-
-The final step is compile the BIE model into an NEF model.
-```python
-# compile
-nef_model_path = ktc.compile([km])
-print("\nCompile done. Save Nef file to '" + str(nef_model_path) + "'")
-```
-
-You can find the NEF file at `/data1/batch_compile/models_720.nef`. `models_720.nef` is the final compiled model.
-
-# Step 6: Run [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) model on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
-
-* N/A
-
-# Step 7 (For Kneron AI Competition 2022): Run [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) model on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
-
-[WARNING] Don't do this step in toolchain docker enviroment mentioned in Step 5
-
-Recommend you read [Kneron PLUS official document](http://doc.kneron.com/docs/#plus_python/#_top) first.
-
-### Step 7-1: Download and Install PLUS python library(.whl)
-* Go to [Kneron education center](https://www.kneron.com/tw/support/education-center/)
-* Scroll down to OpenMMLab Kneron Edition table
-* Select Kneron Plus v1.13.0 (pre-built python library)
-* Your OS version(Ubuntu, Windows, MacOS, Raspberry pi)
-* Download KneronPLUS-1.3.0-py3-none-any_{your_os}.whl
-* unzip downloaded `KneronPLUS-1.3.0-py3-none-any.whl.zip`
-* pip install KneronPLUS-1.3.0-py3-none-any.whl
-
-### Step 7-2: Download STDC example code
-* Go to [Kneron education center](https://www.kneron.com/tw/support/education-center/)
-* Scroll down to **OpenMMLab Kneron Edition** table
-* Select **kneron-mmsegmentation**
-* Select **STDC**
-* Download **stdc_plus_demo.zip**
-* unzip downloaded **stdc_plus_demo**
-
-### Step 7-3: Test enviroment is ready (require [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1))
-In `stdc_plus_demo`, we provide a STDC-Seg example model and image for quick test. 
-* Plug in [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1) into your computer USB port
-* Go to the stdc_plus_demo folder
-```bash
-cd /PATH/TO/stdc_plus_demo
-```
-
-* Install required python libraries
-```bash
-pip install -r requirements.txt
-```
-
-* Run example on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
-```python
-python KL720DemoGenericInferenceSTDC_BypassHwPreProc.py -nef ./example_stdc_720.nef -img 000000000641.jpg
-```
-
-Then you can see the inference result is saved as output_000000000641.jpg in the same folder.
-The expected result of the command above will be something similar to the following text:
-```plain
-...
-[Connect Device]
- - Success
-[Set Device Timeout]
- - Success
-[Upload Model]
- - Success
-[Read Image]
- - Success
-[Starting Inference Work]
- - Starting inference loop 1 times
- - .
-[Retrieve Inference Node Output ]
- - Success
-[Output Result Image]
- - Output bounding boxes on 'output_000000000641.jpg'
-...
-```
-
-### Step 7-4: Run your NEF model and your image on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
-Use the same script in previous step, but now we change the input NEF model path and image to yours
-```bash
-python KL720DemoGenericInferenceSTDC_BypassHwPreProc.py -img /PATH/TO/YOUR_IMAGE.bmp -nef /PATH/TO/YOUR/720_NEF_MODEL.nef
+# Step 1: Environment
+
+## Step 1-1: Prerequisites
+
+- Python 3.6+
+- PyTorch 1.3+ (We recommend you installing PyTorch using Conda following the [Official PyTorch Installation Instruction](https://pytorch.org/))
+- (Optional) CUDA 9.2+ (If you installed PyTorch with cuda using Conda following the [Official PyTorch Installation Instruction](https://pytorch.org/), you can skip CUDA installation)
+- (Optional, used to build from source) GCC 5+
+- [mmcv-full](https://mmcv.readthedocs.io/en/latest/#installation) (Note: not `mmcv`!)
+
+**Note:** You need to run `pip uninstall mmcv` first if you have `mmcv` installed.
+If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
+
+## Step 1-2: Install kneron-mmsegmentation
+
+### Step 1-2-1: Install PyTorch
+
+You can follow [Official PyTorch Installation Instruction](https://pytorch.org/) to install PyTorch using Conda:
+
+```shell
+conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
+```
+
+### Step 1-2-2: Install mmcv-full
+
+We recommend you installing mmcv-full using pip:
+
+```shell
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
+```
+
+Please replace `cu113` and `torch1.11.0` in the url to your desired one. For example, to install the `mmcv-full` with `CUDA 11.1` and `PyTorch 1.9.0`, use the following command:
+
+```shell
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
+```
+
+If you see error messages while installing mmcv-full, please check if your installation instruction matches your installed version of PyTorch and Cuda, and see [MMCV pip Installation Instruction](https://github.com/open-mmlab/mmcv#install-with-pip) for different versions of MMCV compatible to different PyTorch and CUDA versions.
+
+### Step 1-2-3: Clone kneron-mmsegmentation Repository
+
+```shell
+git clone https://github.com/kneron/kneron-mmsegmentation.git
+cd kneron-mmsegmentation
+```
+
+### Step 1-2-4: Install Required Python Libraries for Building and Installing kneron-mmsegmentation
+
+```shell
+pip install -r requirements_kneron.txt
+pip install -v -e .  # or "python setup.py develop"
+```
+
+# Step 2: Training Models on Standard Datasets 
+
+kneron-mmsegmentation provides many existing and existing semantic segmentation models in [Model Zoo](https://mmsegmentation.readthedocs.io/en/latest/model_zoo.html), and supports several standard datasets like CityScapes, Pascal Context, Coco Stuff, ADE20K, etc. Here we demonstrate how to train *STDC-Seg*, a semantic segmentation algorithm, on *CityScapes*, a well-known semantic segmentation dataset.
+
+## Step 2-1: Download CityScapes Dataset
+
+1. Go to [CityScapes Official Website](https://www.cityscapes-dataset.com) and click *Download* link on the top of the page. If you're not logged in, it will navigate you to login page.
+2. If it is the first time you visiting CityScapes website, to download CityScapes dataset, you have to register an account.
+3. Click the *Register* link and it will navigate you to the registeration page.
+4.  Fill in all the *required* fields, accept the terms and conditions, and click the *Register* button. If everything goes well, you will see *Registration Successful* on the page and recieve a registration confirmation mail in your email inbox.
+5.  Click on the link provided in the confirmation mail, login with your newly registered account and password, and you should be able to download the CityScapes dataset.
+6. Download *leftImg8bit_trainvaltest.zip* (images) and *gtFine_trainvaltest.zip* (labels) and place them onto your server.
+
+## Step 2-2: Dataset Preparation
+
+We suggest that you extract the zipped files to somewhere outside the project directory and symlink (`ln`) the dataset root to `kneron-mmsegmentation/data` so you can use the dataset outside this project, as shown below:
+
+```shell
+# Replace all "path/to/your" below with where you want to put the dataset!
+
+# Extracting Cityscapes
+mkdir -p path/to/your/cityscapes
+unzip leftImg8bit_trainvaltest.zip -d path/to/your/cityscapes
+unzip gtFine_trainvaltest.zip -d path/to/your/cityscapes
+
+# symlink dataset to kneron-mmsegmentation/data  # where "kneron-mmsegmentation" is the repository you cloned in step 0-4
+mkdir -p kneron-mmsegmentation/data
+ln -s $(realpath path/to/your/cityscapes) kneron-mmsegmentation/data
+
+# Replace all "path/to/your" above with where you want to put the dataset!
+```
+
+Then, we need *cityscapesScripts* to preprocess the CityScapes dataset. If you completely followed our [Step 1-2-4](#step-1-2-4-install-required-python-libraries-for-building-and-installing-kneron-mmsegmentation), you should have python library *cityscapesScripts* installed (if no, execute `pip install cityscapesScripts` command).
+
+```shell
+# Replace "path/to/your" with where you want to put the dataset!
+export CITYSCAPES_DATASET=$(realpath path/to/your/cityscapes)
+csCreateTrainIdLabelImgs
+```
+
+Wait several minutes and you'll see something like this:
+
+```plain
+Processing 5000 annotation files
+Progress: 100.0 %
+```
+
+The files inside the dataset folder should be something like:
+
+```plain
+kneron-mmsegmentation/data/cityscapes
+├── gtFine
+│   ├── test
+│   │   ├── ...
+│   ├── train
+│   │   ├── ...
+│   ├── val
+│   │   ├── frankfurt
+│   │   │   ├── frankfurt_000000_000294_gtFine_color.png
+│   │   │   ├── frankfurt_000000_000294_gtFine_instanceIds.png
+│   │   │   ├── frankfurt_000000_000294_gtFine_labelIds.png
+│   │   │   ├── frankfurt_000000_000294_gtFine_labelTrainIds.png
+│   │   │   ├── frankfurt_000000_000294_gtFine_polygons.png
+│   │   │   ├── ...
+│   │   ├── ...
+├── leftImg8bit
+│   ├── test
+│   │   ├── ...
+│   ├── train
+│   │   ├── ...
+│   ├── val
+│   │   ├── frankfurt
+│   │   │   ├── frankfurt_000000_000294_leftImg8bit.png
+│   │   ├── ...
+...
+```
+
+It's recommended that you *symlink* the dataset folder to mmdetection folder. However, if you place your dataset folder at different place and do not want to symlink, you have to change the corresponding paths in the config file.
+
+Now the dataset should be ready for training.
+
+
+## Step 2-3: Train STDC-Seg on CityScapes
+
+Short-Term Dense Concatenate Network (STDC network) is a light-weight network structure for convolutional neural network. If we apply this network structure to semantic segmentation task, it's called STDC-Seg. It's first introduced in [Rethinking BiSeNet For Real-time Semantic Segmentation
+](https://arxiv.org/abs/2104.13188). Please check the paper if you want to know the algorithm details.
+
+We only need a configuration file to train a deep learning model in either the original MMSegmentation or kneron-mmsegmentation. STDC-Seg is provided in the original MMSegmentation repository, but the original configuration file needs some modification due to our hardware limitation so that we can apply the trained model to our Kneron dongle. 
+
+To make a configuration file compatible with our device, we have to:
+
+* Change the mean and std value in image normalization to `mean=[128., 128., 128.]` and `std=[256., 256., 256.]`.
+* Shrink the input size during inference phase. The original CityScapes image size is too large (2048(w)x1024(h)) for our device; 1024(w)x512(h) might be good for our device.
+
+To achieve this, you can modify the `img_scale` in `test_pipeline` and `img_norm_cfg` in the configuration file `configs/_base_/datasets/cityscapes.py`. 
+
+Luckily, here in kneron-mmsegmentation, we provide a modified STDC-Seg configuration file (`configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py`) so we can easily apply the trained model to our device.
+
+To train STDC-Seg compatible with our device, just execute:
+
+```shell
+cd kneron-mmsegmentation
+python tools/train.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py
+```
+
+kneron-mmsegmentation will generate `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes` folder and save the configuration file and all checkpoints there.
+
+# Step 3: Test Trained Model
+`tools/test.py` is a script that generates inference results from test set with our pytorch model and evaluates the results to see if our pytorch model is well trained (if `--eval` argument is given). Note that it's always good to evluate our pytorch model before deploying it.
+
+```shell
+python tools/test.py \
+    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
+    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth \
+    --eval mIoU
+```
+* `kn_stdc1_in1k-pre_512x1024_80k_cityscapes/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py` can be your training config.
+* `kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth` can be your model checkpoint.
+
+The expected result of the command above should be something similar to the following text (the numbers may slightly differ):
+```
+...
+---------------+-------+-------+
+|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
+|      road     | 97.49 | 98.59 |
+|    sidewalk   | 80.17 | 88.71 |
+|    building   | 89.52 | 95.25 |
+|      wall     | 57.92 | 66.99 |
+|     fence     |  55.5 | 70.15 |
+|      pole     | 38.93 | 47.51 |
+| traffic light | 49.95 | 59.97 |
+|  traffic sign |  62.1 | 70.05 |
+|   vegetation  | 89.02 | 95.27 |
+|    terrain    | 60.18 | 72.26 |
+|      sky      | 91.84 | 96.34 |
+|     person    | 68.98 | 84.35 |
+|     rider     | 47.79 | 60.98 |
+|      car      | 91.63 | 96.48 |
+|     truck     | 74.31 | 83.52 |
+|      bus      | 80.24 | 86.83 |
+|     train     | 66.45 | 76.78 |
+|   motorcycle  | 48.69 | 58.18 |
+|    bicycle    | 65.81 | 81.68 |
+---------------+-------+-------+
+Summary:
+
+------+-------+-------+
+| aAcc |  mIoU |  mAcc |
+------+-------+-------+
+| 94.3 | 69.29 | 78.42 |
+------+-------+-------+
+```
+
+**NOTE: The training process might take some time, depending on your computation resource. If you just want to take a quick look at the deployment flow, you can download our pretrained model so you can skip Step 1, 2, and 3:**
+```
+# If you don't want to train your own model:
+mkdir -p work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes
+pushd work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes
+wget https://github.com/kneron/Model_Zoo/raw/main/mmsegmentation/stdc_1/latest.zip
+unzip latest.zip
+popd
+```
+
+# Step 4: Export ONNX and Verify
+
+## Step 4-1: Export ONNX
+
+`tools/pytorch2onnx_kneron.py` is a script provided by kneron-mmsegmentation to help users to convert our trained pytorch model to ONNX:
+```shell
+python tools/pytorch2onnx_kneron.py \
+    configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
+    --checkpoint work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth \
+    --output-file work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx \
+    --verify
+```
+* `configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py` can be your training config.
+* `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth` can be your model checkpoint.
+* `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx` can be any other path. Here for convenience, the ONNX file is placed in the same folder of our pytorch checkpoint.
+
+## Step 4-2: Verify ONNX
+
+`tools/deploy_test_kneron.py` is a script provided by kneron-mmsegmentation to help users to verify if our exported ONNX generates similar outputs with what our PyTorch model does:
+```shell
+python tools/deploy_test_kneron.py \
+    configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py \
+    work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx \
+    --eval mIoU
+```
+* `configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py` can be your training config.
+* `work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth` can be your exported ONNX file.
+
+The expected result of the command above should be something similar to the following text (the numbers may slightly differ):
+
+```
+...
+---------------+-------+-------+
+|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
+|      road     | 97.52 | 98.62 |
+|    sidewalk   | 80.59 | 88.69 |
+|    building   | 89.59 | 95.38 |
+|      wall     | 58.02 | 66.85 |
+|     fence     | 55.37 | 69.76 |
+|      pole     |  44.4 | 52.28 |
+| traffic light | 50.23 | 60.07 |
+|  traffic sign | 62.58 | 70.25 |
+|   vegetation  |  89.0 | 95.27 |
+|    terrain    | 60.47 | 72.27 |
+|      sky      | 90.56 | 97.07 |
+|     person    |  70.7 | 84.88 |
+|     rider     | 48.66 | 61.37 |
+|      car      | 91.58 | 95.98 |
+|     truck     | 73.92 | 82.66 |
+|      bus      | 79.92 | 85.95 |
+|     train     | 66.26 | 75.92 |
+|   motorcycle  | 48.88 | 57.91 |
+|    bicycle    |  66.9 |  82.0 |
+---------------+-------+-------+
+Summary:
+
+------+-------+-------+
+| aAcc |  mIoU |  mAcc |
+------+-------+-------+
+| 94.4 | 69.75 | 78.59 |
+------+-------+-------+
+```
+
+Note that the ONNX results may differ from the PyTorch results due to some implementation differences between PyTorch and ONNXRuntime.
+
+# Step 5: Convert ONNX File to [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) Model for Kneron Platform
+ 
+## Step 5-1: Install Kneron toolchain docker:
+
+* Check [Kneron Toolchain Installation Document](http://doc.kneron.com/docs/#toolchain/manual/#1-installation)
+
+## Step 5-2: Mount Kneron toolchain docker
+
+* Mount a folder (e.g. '/mnt/hgfs/Competition') to toolchain docker container as `/data1`. The converted ONNX in Step 3 should be put here. All the toolchain operation should happen in this folder.
+```
+sudo docker run --rm -it -v /mnt/hgfs/Competition:/data1 kneron/toolchain:latest
+```
+
+## Step 5-3: Import KTC and the required libraries in python
+
+```python
+import ktc
+import numpy as np
+import os
+import onnx
+from PIL import Image
+```
+
+## Step 5-4: Optimize the onnx model
+
+```python
+onnx_path = '/data1/latest.onnx'
+m = onnx.load(onnx_path)
+m = ktc.onnx_optimizer.onnx2onnx_flow(m)
+onnx.save(m,'latest.opt.onnx')
+```
+
+## Step 5-5: Configure and load data needed for ktc, and check if onnx is ok for toolchain
+```python 
+# npu (only) performance simulation
+km = ktc.ModelConfig((&)model_id_on_public_field, "0001", "720", onnx_model=m)
+eval_result = km.evaluate()
+print("\nNpu performance evaluation result:\n" + str(eval_result))
+```
+
+## Step 5-6: Quantize the onnx model
+We [sampled 3 images from Cityscapes dataset](https://www.kneron.com/tw/support/education-center/?folder=OpenMMLab%20Kneron%20Edition/misc/&download=41) (3 images) as quantization data. To test our quantized model:
+1. Download the [zip file](https://www.kneron.com/tw/support/education-center/?folder=OpenMMLab%20Kneron%20Edition/misc/&download=41)
+2. Extract the zip file as a folder named `cityscapes_minitest`
+3. Put the `cityscapes_minitest` into docker mounted folder (the path in docker container should be `/data1/cityscapes_minitest`)
+
+The following script will preprocess (should be the same as training code) our quantization data, and put it in a list:
+
+```python
+import os
+from os import walk
+
+img_list = []
+for (dirpath, dirnames, filenames) in walk("/data1/cityscapes_minitest"):
+    for f in filenames:
+        fullpath = os.path.join(dirpath, f)
+        
+        image = Image.open(fullpath)
+        image = image.convert("RGB")
+        image = Image.fromarray(np.array(image)[...,::-1])
+        img_data = np.array(image.resize((1024, 512), Image.BILINEAR)) / 256 - 0.5
+        print(fullpath)
+        img_list.append(img_data)
+```
+
+Then perform quantization. The generated BIE model will put generated at `/data1/output.bie`.
+
+```python
+# fixed-point analysis
+bie_model_path = km.analysis({"input": img_list})
+print("\nFixed-point analysis done. Save bie model to '" + str(bie_model_path) + "'")
+```
+
+## Step 5-7: Compile
+
+The final step is compile the BIE model into an NEF model.
+```python
+# compile
+nef_model_path = ktc.compile([km])
+print("\nCompile done. Save Nef file to '" + str(nef_model_path) + "'")
+```
+
+You can find the NEF file at `/data1/batch_compile/models_720.nef`. `models_720.nef` is the final compiled model.
+
+# Step 6: Run [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) model on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
+
+* N/A
+
+# Step 7 (For Kneron AI Competition 2022): Run [NEF](http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow) model on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
+
+[WARNING] Don't do this step in toolchain docker enviroment mentioned in Step 5
+
+Recommend you read [Kneron PLUS official document](http://doc.kneron.com/docs/#plus_python/#_top) first.
+
+### Step 7-1: Download and Install PLUS python library(.whl)
+* Go to [Kneron Education Center](https://www.kneron.com/tw/support/education-center/)
+* Scroll down to `OpenMMLab Kneron Edition` table
+* Select `Kneron Plus v1.3.0 (pre-built python library, firmware)`
+* Select `python library`
+* Select Your OS version (Ubuntu, Windows, MacOS, Raspberry pi)
+* Download `KneronPLUS-1.3.0-py3-none-any_{your_os}.whl`
+* Unzip downloaded `KneronPLUS-1.3.0-py3-none-any.whl.zip`
+* `pip install KneronPLUS-1.3.0-py3-none-any.whl`
+
+### Step 7-2: Download and upgrade KL720 USB accelerator firmware
+* Go to [Kneron education center](https://www.kneron.com/tw/support/education-center/)
+* Scroll down to `OpenMMLab Kneron Edition table`
+* Select `Kneron Plus v1.3.0 (pre-built python library, firmware)`
+* Select `firmware`
+* Download `kl720_frimware.zip (fw_ncpu.bin、fw_scpu.bin)`
+* unzip downloaded `kl720_frimware.zip`
+* upgrade KL720 USB accelerator firmware(fw_ncpu.bin、fw_scpu.bin) by following [document](http://doc.kneron.com/docs/#plus_python/getting_start/), `Sec. 2. Update AI Device to KDP2 Firmware`, `Sec. 2.2 KL720`
+
+### Step 7-3: Download STDC example code
+* Go to [Kneron education center](https://www.kneron.com/tw/support/education-center/)
+* Scroll down to **OpenMMLab Kneron Edition** table
+* Select **kneron-mmsegmentation**
+* Select **STDC**
+* Download **stdc_plus_demo.zip**
+* unzip downloaded **stdc_plus_demo**
+
+### Step 7-4: Test enviroment is ready (require [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1))
+In `stdc_plus_demo`, we provide a STDC-Seg example model and image for quick test. 
+* Plug in [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1) into your computer USB port
+* Go to the stdc_plus_demo folder
+```bash
+cd /PATH/TO/stdc_plus_demo
+```
+
+* Install required python libraries
+```bash
+pip install -r requirements.txt
+```
+
+* Run example on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
+```python
+python KL720DemoGenericInferenceSTDC_BypassHwPreProc.py -nef ./example_stdc_720.nef -img 000000000641.jpg
+```
+
+Then you can see the inference result is saved as output_000000000641.jpg in the same folder.
+The expected result of the command above will be something similar to the following text:
+```plain
+...
+[Connect Device]
+ - Success
+[Set Device Timeout]
+ - Success
+[Upload Model]
+ - Success
+[Read Image]
+ - Success
+[Starting Inference Work]
+ - Starting inference loop 1 times
+ - .
+[Retrieve Inference Node Output ]
+ - Success
+[Output Result Image]
+ - Output bounding boxes on 'output_000000000641.jpg'
+...
+```
+
+### Step 7-4: Run your NEF model and your image on [KL720 USB accelerator](https://www.kneo.ai/products/hardwares/HW2020122500000007/1)
+Use the same script in previous step, but now we change the input NEF model path and image to yours
+```bash
+python KL720DemoGenericInferenceSTDC_BypassHwPreProc.py -img /PATH/TO/YOUR_IMAGE.bmp -nef /PATH/TO/YOUR/720_NEF_MODEL.nef
 ```
--- a/kneron_preprocessing/API.py
+++ b/kneron_preprocessing/API.py
@ -0,0 +1,684 @@
+# -*- coding: utf-8 -*-
+
+import numpy as np
+import os
+from .funcs.utils import str2int, str2bool
+from . import Flow
+
+flow = Flow()
+flow.set_numerical_type('floating')
+flow_520 = Flow()
+flow_520.set_numerical_type('520')
+flow_720 = Flow()
+flow_720.set_numerical_type('720')
+
+DEFAULT = None
+default = {
+    'crop':{
+        'align_w_to_4':False
+        },
+    'resize':{
+        'type':'bilinear',
+        'calculate_ratio_using_CSim':False
+        }
+}
+
+def set_default_as_520():
+    """
+    Set some default parameter as 520 setting
+
+    crop.align_w_to_4 = True
+    crop.pad_square_to_4 = True
+    resize.type = 'fixed_520'
+    resize.calculate_ratio_using_CSim = True
+    """
+    global default
+    default['crop']['align_w_to_4'] = True
+    default['resize']['type'] = 'fixed_520'
+    default['resize']['calculate_ratio_using_CSim'] = True
+    return
+
+def set_default_as_floating():
+    """
+    Set some default parameter as floating setting
+
+    crop.align_w_to_4 = False
+    crop.pad_square_to_4 = False
+    resize.type = 'bilinear'
+    resize.calculate_ratio_using_CSim = False
+    """
+    global default
+    default['crop']['align_w_to_4'] = False
+    default['resize']['type'] = 'bilinear'
+    default['resize']['calculate_ratio_using_CSim'] = False
+    pass
+
+def print_info_on():
+    """
+    turn print infomation on.
+    """
+    flow.set_print_info(True)
+    flow_520.set_print_info(True)
+
+def print_info_off():
+    """
+    turn print infomation off.
+    """
+    flow.set_print_info(False)
+    flow_520.set_print_info(False)
+
+def load_image(image):
+    """
+    load_image function
+    load load_image and output as rgb888 format np.array
+
+    Args:
+        image: [np.array/str], can be np.array or image file path
+
+    Returns:
+        out: [np.array], rgb888 format
+
+    Examples:
+    """
+    image = flow.load_image(image, is_raw = False)
+    return image
+
+def load_bin(image, fmt=None, size=None):
+    """
+    load_bin function
+    load bin file and output as rgb888 format np.array
+
+    Args:
+        image: [str], bin file path
+        fmt: [str], "rgb888" / "rgb565" / "nir"
+        size: [tuble], (image_w, image_h)
+
+    Returns:
+        out: [np.array], rgb888 format
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.load_bin(image,'rgb565',(raw_w,raw_h))
+    """    
+    assert isinstance(size, tuple)
+    assert isinstance(fmt, str)
+    # assert (fmt.lower() in ['rgb888', "rgb565" , "nir",'RGB888', "RGB565" , "NIR", 'NIR888', 'nir888'])
+
+    image = flow.load_image(image, is_raw = True, raw_img_type='bin', raw_img_fmt = fmt, img_in_width = size[0], img_in_height = size[1])
+    flow.set_color_conversion(source_format=fmt, out_format = 'rgb888')
+    image,_ = flow.funcs['color'](image)
+    return image
+
+def load_hex(file, fmt=None, size=None):
+    """
+    load_hex function
+    load hex file and output as rgb888 format np.array
+
+    Args:
+        image: [str], hex file path
+        fmt: [str], "rgb888" / "yuv444" / "ycbcr444" / "yuv422" / "ycbcr422" / "rgb565"
+        size: [tuble], (image_w, image_h)
+
+    Returns:
+        out: [np.array], rgb888 format
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.load_hex(image,'rgb565',(raw_w,raw_h))
+    """  
+    assert isinstance(size, tuple)
+    assert isinstance(fmt, str)
+    assert (fmt.lower() in ['rgb888',"yuv444" , "ycbcr444" , "yuv422" , "ycbcr422" , "rgb565"])
+
+    image = flow.load_image(file, is_raw = True, raw_img_type='hex', raw_img_fmt = fmt, img_in_width = size[0], img_in_height = size[1])
+    flow.set_color_conversion(source_format=fmt, out_format = 'rgb888')
+    image,_ = flow.funcs['color'](image)
+    return image
+
+def dump_image(image, output=None, file_fmt='txt',image_fmt='rgb888',order=0):
+    """
+    dump_image function
+
+    dump txt, bin or hex, default is txt
+    image format as following format: RGB888, RGBA8888, RGB565, NIR, YUV444, YCbCr444, YUV422, YCbCr422, default is RGB888
+
+    Args:
+        image: [np.array/str], can be np.array or image file path
+        output: [str], dump file path
+        file_fmt: [str], "bin" / "txt" / "hex", set dump file format, default is txt
+        image_fmt: [str], RGB888 / RGBA8888 / RGB565 / NIR / YUV444 / YCbCr444 / YUV422 / YCbCr422, default is RGB888
+
+    Examples:
+        >>> kneron_preprocessing.API.dump_image(image_data,out_path,fmt='bin')
+    """
+    if isinstance(image, str):
+        image = load_image(image)
+
+    assert isinstance(image, np.ndarray)
+    if output is None:
+        return
+
+    flow.set_output_setting(is_dump=False, dump_format=file_fmt, image_format=image_fmt ,output_file=output)
+    flow.dump_image(image)
+    return
+
+def convert(image, out_fmt = 'RGB888', source_fmt = 'RGB888'):
+    """
+    color convert
+
+    Args:
+        image: [np.array], input
+        out_fmt: [str], "rgb888" / "rgba8888" / "rgb565" / "yuv" / "ycbcr" / "yuv422" / "ycbcr422"
+        source_fmt: [str], "rgb888" / "rgba8888" / "rgb565" / "yuv" / "ycbcr" / "yuv422" / "ycbcr422"
+
+    Returns:
+        out: [np.array]
+
+    Examples:
+
+    """  
+    flow.set_color_conversion(source_format = source_fmt, out_format=out_fmt, simulation=False)
+    image,_ = flow.funcs['color'](image)
+    return image
+
+def get_crop_range(box,align_w_to_4=DEFAULT, pad_square_to_4=False,rounding_type=0):
+    """
+    get exact crop box according different setting
+
+    Args:
+        box: [tuble], (x1, y1, x2, y2)
+        align_w_to_4: [bool], crop length in w direction align to 4 or not, default False
+        pad_square_to_4: [bool], pad to square(align 4) or not, default False
+        rounding_type: [int], 0-> x1,y1 take floor, x2,y2 take ceil; 1->all take rounding 
+
+    Returns:
+        out: [tuble,4], (crop_x1, crop_y1, crop_x2, crop_y2) 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.get_crop_range((272,145,461,341), align_w_to_4=True, pad_square_to_4=True)
+        (272, 145, 460, 341)
+    """  
+    if box is None:
+        return (0,0,0,0)
+    if align_w_to_4 is None:
+        align_w_to_4 = default['crop']['align_w_to_4']
+
+    flow.set_crop(type='specific', start_x=box[0],start_y=box[1],end_x=box[2],end_y=box[3], align_w_to_4=align_w_to_4, pad_square_to_4=pad_square_to_4,rounding_type=rounding_type)
+    image = np.zeros((1,1,3)).astype('uint8')
+    _,info = flow.funcs['crop'](image)
+    
+    return info['box']
+
+def crop(image, box=None, align_w_to_4=DEFAULT, pad_square_to_4=False,rounding_type=0 ,info_out = {}):
+    """
+    crop function
+
+    specific crop range by box
+
+    Args:
+        image: [np.array], input
+        box: [tuble], (x1, y1, x2, y2)
+        align_w_to_4: [bool], crop length in w direction align to 4 or not, default False
+        pad_square_to_4: [bool], pad to square(align 4) or not, default False
+        rounding_type: [int], 0-> x1,y1 take floor, x2,y2 take ceil; 1->all take rounding 
+        info_out: [dic], save the final crop box into info_out['box']
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.crop(image_data,(272,145,461,341), align_w_to_4=True, info_out=info)
+        >>> info['box']
+        (272, 145, 460, 341)
+
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.crop(image_data,(272,145,461,341), pad_square_to_4=True, info_out=info)
+        >>> info['box']
+        (268, 145, 464, 341)
+    """  
+    assert isinstance(image, np.ndarray)
+    if box is None:
+        return image
+    if align_w_to_4 is None:
+        align_w_to_4 = default['crop']['align_w_to_4']
+
+    flow.set_crop(type='specific', start_x=box[0],start_y=box[1],end_x=box[2],end_y=box[3], align_w_to_4=align_w_to_4, pad_square_to_4=pad_square_to_4,rounding_type=rounding_type)
+    image,info = flow.funcs['crop'](image)
+    
+    info_out['box'] = info['box']
+    return image
+
+def crop_center(image, range=None, align_w_to_4=DEFAULT, pad_square_to_4=False,rounding_type=0 ,info_out = {}):
+    """
+    crop function
+
+    center crop by range
+
+    Args:
+        image: [np.array], input
+        range: [tuble], (crop_w, crop_h)
+        align_w_to_4: [bool], crop length in w direction align to 4 or not, default False
+        pad_square_to_4: [bool], pad to square(align 4) or not, default False
+        rounding_type: [int], 0-> x1,y1 take floor, x2,y2 take ceil; 1->all take rounding 
+        info_out: [dic], save the final crop box into info_out['box']
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.crop_center(image_data,(102,40), align_w_to_4=True,info_out=info)
+        >>> info['box']
+        (268, 220, 372, 260)
+
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.crop_center(image_data,(102,40), pad_square_to_4=True, info_out=info)
+        >>> info['box']
+        (269, 192, 371, 294)
+    """   
+    assert isinstance(image, np.ndarray)
+    if range is None:
+        return image
+    if align_w_to_4 is None:
+        align_w_to_4 = default['crop']['align_w_to_4']
+
+    flow.set_crop(type='center', crop_w=range[0],crop_h=range[1], align_w_to_4=align_w_to_4, pad_square_to_4=pad_square_to_4,rounding_type=rounding_type)
+    image,info = flow.funcs['crop'](image)
+
+    info_out['box'] = info['box']
+    return image
+
+def crop_corner(image, range=None, align_w_to_4=DEFAULT,pad_square_to_4=False,rounding_type=0 ,info_out = {}):
+    """
+    crop function
+
+    corner crop by range
+
+    Args:
+        image: [np.array], input
+        range: [tuble], (crop_w, crop_h)
+        align_w_to_4: [bool], crop length in w direction align to 4 or not, default False
+        pad_square_to_4: [bool], pad to square(align 4) or not, default False
+        rounding_type: [int], 0-> x1,y1 take floor, x2,y2 take ceil; 1->all take rounding 
+        info_out: [dic], save the final crop box into info_out['box']
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.crop_corner(image_data,(102,40), align_w_to_4=True,info_out=info)
+        >>> info['box']
+        (0, 0, 104, 40)
+
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.crop_corner(image_data,(102,40), pad_square_to_4=True,info_out=info)
+        >>> info['box']
+        (0, -28, 102, 74)
+    """
+    assert isinstance(image, np.ndarray)
+    if range is None:
+        return image
+    if align_w_to_4 is None:
+        align_w_to_4 = default['crop']['align_w_to_4']
+
+    flow.set_crop(type='corner', crop_w=range[0],crop_h=range[1], align_w_to_4=align_w_to_4, pad_square_to_4=pad_square_to_4)
+    image, info = flow.funcs['crop'](image)
+
+    info_out['box'] = info['box']
+    return image
+
+def resize(image, size=None, keep_ratio = True, zoom = True, type=DEFAULT, calculate_ratio_using_CSim = DEFAULT, info_out = {}):
+    """
+    resize function
+
+    resize type can be bilinear or bilicubic as floating type, fixed or fixed_520/fixed_720 as fixed type.
+    fixed_520/fixed_720 type has add some function to simulate 520/720 bug.
+
+    Args:
+        image: [np.array], input
+        size: [tuble], (input_w, input_h)
+        keep_ratio: [bool], keep_ratio or not, default True
+        zoom: [bool], enable resize can zoom image or not, default True
+        type: [str], "bilinear" / "bilicubic" / "cv2" / "fixed" / "fixed_520" / "fixed_720"
+        calculate_ratio_using_CSim: [bool], calculate the ratio and scale using Csim function and C float, default False
+        info_out: [dic], save the final scale size(w,h) into info_out['size']
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> info = {}
+        >>> image_data = kneron_preprocessing.API.resize(image_data,size=(56,56),type='fixed',info_out=info)
+        >>> info_out['size']
+        (54,56)
+    """
+    assert isinstance(image, np.ndarray)
+    if size is None:
+        return image
+    if type is None:
+        type = default['resize']['type']
+    if calculate_ratio_using_CSim is None:
+        calculate_ratio_using_CSim = default['resize']['calculate_ratio_using_CSim']
+
+    flow.set_resize(resize_w = size[0], resize_h = size[1], type=type, keep_ratio=keep_ratio,zoom=zoom, calculate_ratio_using_CSim=calculate_ratio_using_CSim)
+    image, info = flow.funcs['resize'](image)
+    info_out['size'] = info['size']
+
+    return image
+
+def pad(image, pad_l=0, pad_r=0, pad_t=0, pad_b=0, pad_val=0):
+    """
+    pad function
+
+    specific left, right, top and bottom pad size.
+
+    Args:
+        image[np.array]: input
+        pad_l: [int], pad size from left, default 0
+        pad_r: [int], pad size form right, default 0
+        pad_t: [int], pad size from top, default 0
+        pad_b: [int], pad size form bottom, default 0
+        pad_val: [float], the value of pad, , default 0 
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.pad(image_data,20,40,20,40,-0.5)
+    """
+    assert isinstance(image, np.ndarray)
+
+    flow.set_padding(type='specific',pad_l=pad_l,pad_r=pad_r,pad_t=pad_t,pad_b=pad_b,pad_val=pad_val)
+    image, _ = flow.funcs['padding'](image)
+    return image
+
+def pad_center(image,size=None, pad_val=0):
+    """
+    pad function
+
+    center pad with pad size.
+
+    Args:
+        image[np.array]: input
+        size: [tuble], (padded_size_w, padded_size_h)
+        pad_val: [float], the value of pad, , default 0 
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.pad_center(image_data,size=(56,56),pad_val=-0.5)
+    """
+    assert isinstance(image, np.ndarray)
+    if size is None:
+        return image
+    assert ( (image.shape[0] <= size[1]) & (image.shape[1] <= size[0]) )
+
+    flow.set_padding(type='center',padded_w=size[0],padded_h=size[1],pad_val=pad_val)
+    image, _ = flow.funcs['padding'](image)
+    return image
+
+def pad_corner(image,size=None, pad_val=0):
+    """
+    pad function
+
+    corner pad with pad size.
+
+    Args:
+        image[np.array]: input
+        size: [tuble], (padded_size_w, padded_size_h)
+        pad_val: [float], the value of pad, , default 0 
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.pad_corner(image_data,size=(56,56),pad_val=-0.5)
+    """   
+    assert isinstance(image, np.ndarray)
+    if size is None:
+        return image
+    assert ( (image.shape[0] <= size[1]) & (image.shape[1] <= size[0]) )
+
+    flow.set_padding(type='corner',padded_w=size[0],padded_h=size[1],pad_val=pad_val)
+    image, _ = flow.funcs['padding'](image)
+    return image
+
+def norm(image,scale=256.,bias=-0.5, mean=None, std=None):
+    """
+    norm function
+    
+    x = (x/scale - bias)
+    x[0,1,2] = x - mean[0,1,2]
+    x[0,1,2] = x / std[0,1,2]
+
+    Args:
+        image: [np.array], input
+        scale: [float], default = 256
+        bias: [float], default = -0.5
+        mean: [tuble,3], default = None
+        std: [tuble,3], default = None
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.norm(image_data)
+        >>> image_data = kneron_preprocessing.API.norm(image_data,mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+    """  
+    assert isinstance(image, np.ndarray)
+
+    flow.set_normalize(type='specific',scale=scale,  bias=bias, mean=mean, std =std)
+    image, _ = flow.funcs['normalize'](image)
+    return image
+
+def inproc_520(image,raw_fmt='rgb565',raw_size=None,npu_size=None, crop_box=None, pad_mode=0, norm='kneron', gray=False, rotate=0, radix=8, bit_width=8, round_w_to_16=True, NUM_BANK_LINE=32,BANK_ENTRY_CNT=512,MAX_IMG_PREPROC_ROW_NUM=511,MAX_IMG_PREPROC_COL_NUM=256):
+    """
+    inproc_520
+
+    Args:
+        image: [np.array], input
+        crop_box: [tuble], (x1, y1, x2, y2), if None will skip crop
+        pad_mode: [int], 0: pad 2 sides, 1: pad 1 side, 2: no pad. default = 0
+        norm: [str], default = 'kneron'
+        rotate: [int], 0 / 1 / 2 ,default = 0
+        radix: [int], default = 8
+        bit_width: [int], default = 8
+        round_w_to_16: [bool], default = True
+        gray: [bool], default = False
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.inproc_520(image_data,npu_size=(56,56),crop_box=(272,145,460,341),pad_mode=1)
+    """  
+    # assert isinstance(image, np.ndarray)
+
+    if (not isinstance(image, np.ndarray)):
+        flow_520.set_raw_img(is_raw_img='yes',raw_img_type = 'bin',raw_img_fmt=raw_fmt, img_in_width=raw_size[0], img_in_height=raw_size[1])
+    else:
+        flow_520.set_raw_img(is_raw_img='no')
+        flow_520.set_color_conversion(source_format='rgb888')
+
+    if npu_size is None:
+        return image
+
+    flow_520.set_model_size(w=npu_size[0],h=npu_size[1])
+
+    ## Crop
+    if crop_box != None:
+        flow_520.set_crop(start_x=crop_box[0],start_y=crop_box[1],end_x=crop_box[2],end_y=crop_box[3])
+        crop_fisrt = True
+    else:
+        crop_fisrt = False
+
+    ## Color
+    if gray:
+        flow_520.set_color_conversion(out_format='l',simulation='no')
+    else:
+        flow_520.set_color_conversion(out_format='rgb888',simulation='no')
+
+    ## Resize & Pad
+    pad_mode = str2int(pad_mode)
+    if (pad_mode == 0):
+        pad_type = 'center'
+        resize_keep_ratio = 'yes'
+    elif (pad_mode == 1):
+        pad_type = 'corner'
+        resize_keep_ratio = 'yes'
+    else:
+        pad_type = 'center'
+        resize_keep_ratio = 'no'
+    
+    flow_520.set_resize(keep_ratio=resize_keep_ratio)
+    flow_520.set_padding(type=pad_type)
+
+    ## Norm
+    flow_520.set_normalize(type=norm)
+
+    ## 520 inproc
+    flow_520.set_520_setting(radix=radix,bit_width=bit_width,rotate=rotate,crop_fisrt=crop_fisrt,round_w_to_16=round_w_to_16,NUM_BANK_LINE=NUM_BANK_LINE,BANK_ENTRY_CNT=BANK_ENTRY_CNT,MAX_IMG_PREPROC_ROW_NUM=MAX_IMG_PREPROC_ROW_NUM,MAX_IMG_PREPROC_COL_NUM=MAX_IMG_PREPROC_COL_NUM)
+    image_data, _ = flow_520.run_whole_process(image)
+
+    return image_data
+
+def inproc_720(image,raw_fmt='rgb565',raw_size=None,npu_size=None, crop_box=None, pad_mode=0, norm='kneron', gray=False):
+    """
+    inproc_720
+
+    Args:
+        image: [np.array], input
+        crop_box: [tuble], (x1, y1, x2, y2), if None will skip crop
+        pad_mode: [int], 0: pad 2 sides, 1: pad 1 side, 2: no pad. default = 0
+        norm: [str], default = 'kneron'
+        rotate: [int], 0 / 1 / 2 ,default = 0
+        radix: [int], default = 8
+        bit_width: [int], default = 8
+        round_w_to_16: [bool], default = True
+        gray: [bool], default = False
+
+    Returns:
+        out: [np.array] 
+
+    Examples:
+        >>> image_data = kneron_preprocessing.API.inproc_520(image_data,npu_size=(56,56),crop_box=(272,145,460,341),pad_mode=1)
+    """  
+    # assert isinstance(image, np.ndarray)
+
+    if (not isinstance(image, np.ndarray)):
+        flow_720.set_raw_img(is_raw_img='yes',raw_img_type = 'bin',raw_img_fmt=raw_fmt, img_in_width=raw_size[0], img_in_height=raw_size[1])
+    else:
+        flow_720.set_raw_img(is_raw_img='no')
+        flow_720.set_color_conversion(source_format='rgb888')
+
+    if npu_size is None:
+        return image
+
+    flow_720.set_model_size(w=npu_size[0],h=npu_size[1])
+
+    ## Crop
+    if crop_box != None:
+        flow_720.set_crop(start_x=crop_box[0],start_y=crop_box[1],end_x=crop_box[2],end_y=crop_box[3])
+        crop_fisrt = True
+    else:
+        crop_fisrt = False
+
+    ## Color
+    if gray:
+        flow_720.set_color_conversion(out_format='l',simulation='no')
+    else:
+        flow_720.set_color_conversion(out_format='rgb888',simulation='no')
+
+    ## Resize & Pad
+    pad_mode = str2int(pad_mode)
+    if (pad_mode == 0):
+        pad_type = 'center'
+        resize_keep_ratio = 'yes'
+    elif (pad_mode == 1):
+        pad_type = 'corner'
+        resize_keep_ratio = 'yes'
+    else:
+        pad_type = 'center'
+        resize_keep_ratio = 'no'
+    
+    flow_720.set_resize(keep_ratio=resize_keep_ratio)
+    flow_720.set_padding(type=pad_type)
+
+    ## 720 inproc
+    # flow_720.set_720_setting(radix=radix,bit_width=bit_width,rotate=rotate,crop_fisrt=crop_fisrt,round_w_to_16=round_w_to_16,NUM_BANK_LINE=NUM_BANK_LINE,BANK_ENTRY_CNT=BANK_ENTRY_CNT,MAX_IMG_PREPROC_ROW_NUM=MAX_IMG_PREPROC_ROW_NUM,MAX_IMG_PREPROC_COL_NUM=MAX_IMG_PREPROC_COL_NUM)
+    image_data, _ = flow_720.run_whole_process(image)
+
+    return image_data
+
+def bit_match(data1, data2):
+    """
+    bit_match function
+
+    check data1 is equal to data2 or not.
+
+    Args:
+        data1: [np.array / str], can be array or txt/bin file
+        data2: [np.array / str], can be array or txt/bin file
+
+    Returns:
+        out1: [bool], is match or not
+        out2: [np.array], if not match, save the position for mismatched data
+
+    Examples:
+        >>> result, mismatched = kneron_preprocessing.API.bit_match(data1,data2)
+    """
+    if isinstance(data1, str):
+        if os.path.splitext(data1)[1] == '.bin':
+            data1 = np.fromfile(data1, dtype='uint8')
+        elif os.path.splitext(data1)[1] == '.txt':
+            data1 = np.loadtxt(data1)
+    
+    assert isinstance(data1, np.ndarray)
+
+    if isinstance(data2, str):
+        if os.path.splitext(data2)[1] == '.bin':
+            data2 = np.fromfile(data2, dtype='uint8')
+        elif os.path.splitext(data2)[1] == '.txt':
+            data2 = np.loadtxt(data2)
+
+    assert isinstance(data2, np.ndarray)
+
+
+    data1 = data1.reshape((-1,1))
+    data2 = data2.reshape((-1,1))
+
+    if not(len(data1) == len(data2)):
+        print('error len')
+        return False, np.zeros((1))
+    else: 
+        ans = data2 - data1    
+        if len(np.where(ans>0)[0]) > 0:
+            print('error',np.where(ans>0)[0])
+            return False, np.where(ans>0)[0]
+        else:
+            print('pass')
+            return True, np.zeros((1))
+
+def cpr_to_crp(x_start, x_end, y_start, y_end, pad_l, pad_r, pad_t, pad_b, rx_start, rx_end, ry_start, ry_end):
+    """
+    calculate the parameters of crop->pad->resize flow  to HW crop->resize->padding flow
+
+    Args:
+
+    Returns:
+
+    Examples:
+
+    """
+    pad_l = round(pad_l * (rx_end-rx_start) / (x_end - x_start + pad_l + pad_r))
+    pad_r = round(pad_r * (rx_end-rx_start) / (x_end - x_start + pad_l + pad_r)) 
+    pad_t = round(pad_t * (ry_end-ry_start) / (y_end - y_start + pad_t + pad_b))
+    pad_b = round(pad_b * (ry_end-ry_start) / (y_end - y_start + pad_t + pad_b))
+
+    rx_start +=pad_l
+    rx_end -=pad_r
+    ry_start +=pad_t
+    ry_end -=pad_b
+
+    return x_start, x_end, y_start, y_end, pad_l, pad_r, pad_t, pad_b, rx_start, rx_end, ry_start, ry_end
--- a/kneron_preprocessing/Cflow.py
+++ b/kneron_preprocessing/Cflow.py
@ -0,0 +1,172 @@
+import numpy as np
+import argparse
+import kneron_preprocessing
+
+def main_(args):
+    image = args.input_file
+    filefmt = args.file_fmt
+    if filefmt == 'bin':
+        raw_format = args.raw_format
+        raw_w = args.input_width
+        raw_h = args.input_height
+
+        image_data = kneron_preprocessing.API.load_bin(image,raw_format,(raw_w,raw_h))
+    else:
+        image_data = kneron_preprocessing.API.load_image(image)
+
+
+    npu_w = args.width
+    npu_h = args.height
+
+    crop_first = True if args.crop_first == "True" else False
+    if crop_first:
+        x1 = args.x_pos
+        y1 = args.y_pos
+        x2 = args.crop_w + x1
+        y2 = args.crop_h + y1
+        crop_box = [x1,y1,x2,y2]
+    else:
+        crop_box = None
+
+    pad_mode = args.pad_mode
+    norm_mode = args.norm_mode
+    bitwidth = args.bitwidth
+    radix = args.radix
+    rotate = args.rotate_mode
+
+    ##
+    image_data = kneron_preprocessing.API.inproc_520(image_data,npu_size=(npu_w,npu_h),crop_box=crop_box,pad_mode=pad_mode,norm=norm_mode,rotate=rotate,radix=radix,bit_width=bitwidth)
+
+    output_file = args.output_file
+    kneron_preprocessing.API.dump_image(image_data,output_file,'bin','rgba')
+
+    return
+
+
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser(
+        description="preprocessing"
+        )
+
+    argparser.add_argument(
+        '-i',
+        '--input_file',
+        help="input file name"
+        )
+
+    argparser.add_argument(
+        '-ff',
+        '--file_fmt',
+        help="input file format, jpg or bin"
+        )
+
+    argparser.add_argument(
+        '-rf',
+        '--raw_format',
+        help="input file image format, rgb or rgb565 or nir"
+        )
+
+    argparser.add_argument(
+        '-i_w',
+        '--input_width',
+        type=int,
+        help="input image width"
+        )
+
+    argparser.add_argument(
+        '-i_h',
+        '--input_height',
+        type=int,
+        help="input image height"
+        )
+
+    argparser.add_argument(
+        '-o',
+        '--output_file',
+        help="output file name"
+        )
+
+    argparser.add_argument(
+        '-s_w',
+        '--width',
+        type=int,
+        help="output width for npu input",
+        )
+
+    argparser.add_argument(
+        '-s_h',
+        '--height',
+        type=int,
+        help="output height for npu input",
+        )
+
+    argparser.add_argument(
+        '-c_f',
+        '--crop_first',
+        help="crop first True or False",
+        )
+
+    argparser.add_argument(
+        '-x',
+        '--x_pos',
+        type=int,
+        help="left up coordinate x",
+        )
+
+    argparser.add_argument(
+        '-y',
+        '--y_pos',
+        type=int,
+        help="left up coordinate y",
+        )
+
+    argparser.add_argument(
+        '-c_w',
+        '--crop_w',
+        type=int,
+        help="crop width",
+        )
+
+    argparser.add_argument(
+        '-c_h',
+        '--crop_h',
+        type=int,
+        help="crop height",
+        )
+
+    argparser.add_argument(
+        '-p_m',
+        '--pad_mode',
+        type=int,
+        help=" 0: pad 2 sides, 1: pad 1 side, 2: no pad.",
+        )
+
+    argparser.add_argument(
+        '-n_m',
+        '--norm_mode',
+        help="normalizaton mode: yolo, kneron, tf."
+        )
+
+    argparser.add_argument(
+        '-r_m',
+        '--rotate_mode',
+        type=int,
+        help="rotate mode:0,1,2"
+        )
+
+    argparser.add_argument(
+        '-bw',
+        '--bitwidth',
+        type=int,
+        help="Int for bitwidth"
+        )
+    
+    argparser.add_argument(
+        '-r',
+        '--radix',
+        type=int,
+        help="Int for radix"
+        )
+
+    args = argparser.parse_args()
+    main_(args)
--- a/kneron_preprocessing/Flow.py
+++ b/kneron_preprocessing/Flow.py
--- a/kneron_preprocessing/init.py
+++ b/kneron_preprocessing/init.py
@ -0,0 +1,2 @@
+from .Flow import *
+from .API import *
--- a/kneron_preprocessing/funcs/ColorConversion.py
+++ b/kneron_preprocessing/funcs/ColorConversion.py
@ -0,0 +1,285 @@
+import numpy as np
+from PIL import Image
+from .utils import signed_rounding, clip, str2bool
+
+format_bit = 10
+c00_yuv = 1
+c02_yuv = 1436
+c10_yuv = 1
+c11_yuv = -354
+c12_yuv = -732
+c20_yuv = 1
+c21_yuv = 1814
+c00_ycbcr = 1192
+c02_ycbcr = 1634
+c10_ycbcr = 1192
+c11_ycbcr = -401
+c12_ycbcr = -833
+c20_ycbcr = 1192
+c21_ycbcr = 2065
+
+Matrix_ycbcr_to_rgb888 = np.array(
+    [[1.16438356e+00,  1.16438356e+00,  1.16438356e+00],
+     [2.99747219e-07, - 3.91762529e-01,  2.01723263e+00],
+     [1.59602686e+00, - 8.12968294e-01,  3.04059479e-06]])
+
+Matrix_rgb888_to_ycbcr = np.array(
+    [[0.25678824, - 0.14822353,  0.43921569],
+     [0.50412941, - 0.29099216, - 0.36778824],
+     [0.09790588,  0.43921569, - 0.07142745]])
+
+Matrix_rgb888_to_yuv = np.array(
+    [[ 0.29899106, -0.16877996,  0.49988381],
+    [ 0.5865453,  -0.33110385, -0.41826072],
+    [ 0.11446364,  0.49988381, -0.08162309]])
+
+# Matrix_rgb888_to_yuv = np.array(
+#     [[0.299, - 0.147,   0.615],
+#      [0.587, - 0.289, - 0.515],
+#      [0.114,   0.436, - 0.100]])
+
+# Matrix_yuv_to_rgb888 = np.array(
+#     [[1.000,   1.000,  1.000],
+#      [0.000, - 0.394,  2.032],
+#      [1.140, - 0.581,  0.000]])
+
+class runner(object):
+    def __init__(self):
+        self.set = {
+            'print_info':'no',
+            'model_size':[0,0],
+            'numerical_type':'floating',
+            "source_format": "rgb888",
+            "out_format": "rgb888",
+            "options": {
+                "simulation": "no",
+                "simulation_format": "rgb888"
+            }
+        }
+
+    def update(self, **kwargs):
+        #
+        self.set.update(kwargs)
+
+        ## simulation
+        self.funs = []
+        if str2bool(self.set['options']['simulation']) and self.set['source_format'].lower() in ['RGB888', 'rgb888', 'RGB', 'rgb']:
+            if self.set['options']['simulation_format'].lower() in ['YUV422', 'yuv422', 'YUV', 'yuv']:
+                self.funs.append(self._ColorConversion_RGB888_to_YUV422)
+                self.set['source_format'] = 'YUV422'
+            elif self.set['options']['simulation_format'].lower() in ['YCBCR422', 'YCbCr422', 'ycbcr422', 'YCBCR', 'YCbCr', 'ycbcr']:
+                self.funs.append(self._ColorConversion_RGB888_to_YCbCr422)
+                self.set['source_format'] = 'YCbCr422'
+            elif self.set['options']['simulation_format'].lower() in['RGB565', 'rgb565']:
+                self.funs.append(self._ColorConversion_RGB888_to_RGB565)
+                self.set['source_format'] = 'RGB565'
+        
+        ## to rgb888
+        if self.set['source_format'].lower() in ['YUV444', 'yuv444','YUV422', 'yuv422', 'YUV', 'yuv']:
+            self.funs.append(self._ColorConversion_YUV_to_RGB888)
+        elif self.set['source_format'].lower() in ['YCBCR444', 'YCbCr444', 'ycbcr444','YCBCR422', 'YCbCr422', 'ycbcr422', 'YCBCR', 'YCbCr', 'ycbcr']:
+            self.funs.append(self._ColorConversion_YCbCr_to_RGB888)
+        elif self.set['source_format'].lower() in ['RGB565', 'rgb565']:
+            self.funs.append(self._ColorConversion_RGB565_to_RGB888)
+        elif self.set['source_format'].lower() in ['l', 'L' , 'nir', 'NIR']:
+            self.funs.append(self._ColorConversion_L_to_RGB888)
+        elif self.set['source_format'].lower() in ['RGBA8888', 'rgba8888' , 'RGBA', 'rgba']:
+            self.funs.append(self._ColorConversion_RGBA8888_to_RGB888)
+
+        ## output format
+        if self.set['out_format'].lower() in ['L', 'l']:
+            self.funs.append(self._ColorConversion_RGB888_to_L)
+        elif self.set['out_format'].lower() in['RGB565', 'rgb565']:
+            self.funs.append(self._ColorConversion_RGB888_to_RGB565)
+        elif self.set['out_format'].lower() in['RGBA', 'RGBA8888','rgba','rgba8888']:
+            self.funs.append(self._ColorConversion_RGB888_to_RGBA8888)
+        elif self.set['out_format'].lower() in['YUV', 'YUV444','yuv','yuv444']:
+            self.funs.append(self._ColorConversion_RGB888_to_YUV444)
+        elif self.set['out_format'].lower() in['YUV422','yuv422']:
+            self.funs.append(self._ColorConversion_RGB888_to_YUV422)
+        elif self.set['out_format'].lower() in['YCBCR', 'YCBCR444','YCbCr','YCbCr444','ycbcr','ycbcr444']:
+            self.funs.append(self._ColorConversion_RGB888_to_YCbCr444)
+        elif self.set['out_format'].lower() in['YCBCR422','YCbCr422','ycbcr422']:
+            self.funs.append(self._ColorConversion_RGB888_to_YCbCr422)
+
+    def print_info(self):
+        print("<colorConversion>",
+              "source_format:", self.set['source_format'],
+              ', out_format:', self.set['out_format'],
+              ', simulation:', self.set['options']['simulation'],
+              ', simulation_format:', self.set['options']['simulation_format'])
+
+    def run(self, image_data):
+        assert isinstance(image_data, np.ndarray)
+        # print info
+        if str2bool(self.set['print_info']):
+            self.print_info()
+
+        # color
+        for _, f in enumerate(self.funs):
+            image_data = f(image_data)
+
+        # output
+        info = {}
+        return image_data, info
+
+    def _ColorConversion_RGB888_to_YUV444(self, image):
+        ## floating
+        image = image.astype('float')
+        image = (image @ Matrix_rgb888_to_yuv + 0.5).astype('uint8')
+        return image
+
+    def _ColorConversion_RGB888_to_YUV422(self, image):
+        # rgb888 to yuv444
+        image = self._ColorConversion_RGB888_to_YUV444(image)
+
+        # yuv444 to yuv422
+        u2 = image[:, 0::2, 1]
+        u4 = np.repeat(u2, 2, axis=1)
+        v2 = image[:, 1::2, 2]
+        v4 = np.repeat(v2, 2, axis=1)
+        image[..., 1] = u4
+        image[..., 2] = v4
+        return image
+           
+    def _ColorConversion_YUV_to_RGB888(self, image):
+        ## fixed
+        h, w, c = image.shape
+        image_f = image.reshape((h * w, c))
+        image_rgb_f = np.zeros(image_f.shape, dtype=np.uint8)
+
+        for i in range(h * w):
+            image_y = image_f[i, 0] *1024
+            if image_f[i, 1] > 127:
+                image_u = -((~(image_f[i, 1] - 1)) & 0xFF)
+            else:
+                image_u = image_f[i, 1]
+            if image_f[i, 2] > 127:
+                image_v = -((~(image_f[i, 2] - 1)) & 0xFF)
+            else:
+                image_v = image_f[i, 2]
+
+            image_r = c00_yuv * image_y + c02_yuv * image_v
+            image_g = c10_yuv * image_y + c11_yuv * image_u + c12_yuv * image_v
+            image_b = c20_yuv * image_y + c21_yuv * image_u
+
+            image_r = signed_rounding(image_r, format_bit)
+            image_g = signed_rounding(image_g, format_bit)
+            image_b = signed_rounding(image_b, format_bit)
+
+            image_r = image_r >> format_bit
+            image_g = image_g >> format_bit
+            image_b = image_b >> format_bit
+
+            image_rgb_f[i, 0] = clip(image_r, 0, 255)
+            image_rgb_f[i, 1] = clip(image_g, 0, 255)
+            image_rgb_f[i, 2] = clip(image_b, 0, 255)
+
+        image_rgb = image_rgb_f.reshape((h, w, c))
+        return image_rgb
+
+    def _ColorConversion_RGB888_to_YCbCr444(self, image):
+        ## floating
+        image = image.astype('float')
+        image = (image @ Matrix_rgb888_to_ycbcr + 0.5).astype('uint8')
+        image[:, :, 0] += 16
+        image[:, :, 1] += 128
+        image[:, :, 2] += 128
+
+        return image
+
+    def _ColorConversion_RGB888_to_YCbCr422(self, image):
+        # rgb888 to ycbcr444
+        image = self._ColorConversion_RGB888_to_YCbCr444(image)
+
+        # ycbcr444 to ycbcr422
+        cb2 = image[:, 0::2, 1]
+        cb4 = np.repeat(cb2, 2, axis=1)
+        cr2 = image[:, 1::2, 2]
+        cr4 = np.repeat(cr2, 2, axis=1)
+        image[..., 1] = cb4
+        image[..., 2] = cr4
+        return image
+
+    def _ColorConversion_YCbCr_to_RGB888(self, image):
+        ## floating
+        if (self.set['numerical_type'] == 'floating'):
+            image = image.astype('float')
+            image[:, :, 0] -= 16
+            image[:, :, 1] -= 128
+            image[:, :, 2] -= 128
+            image = ((image @ Matrix_ycbcr_to_rgb888) + 0.5).astype('uint8')
+            return image
+
+        ## fixed
+        h, w, c = image.shape
+        image_f = image.reshape((h * w, c))
+        image_rgb_f = np.zeros(image_f.shape, dtype=np.uint8)
+
+        for i in range(h * w):
+            image_y = (image_f[i, 0] - 16) * c00_ycbcr
+            image_cb = image_f[i, 1] - 128
+            image_cr = image_f[i, 2] - 128
+
+            image_r = image_y + c02_ycbcr * image_cr
+            image_g = image_y + c11_ycbcr * image_cb + c12_ycbcr * image_cr
+            image_b = image_y + c21_ycbcr * image_cb
+
+            image_r = signed_rounding(image_r, format_bit)
+            image_g = signed_rounding(image_g, format_bit)
+            image_b = signed_rounding(image_b, format_bit)
+
+            image_r = image_r >> format_bit
+            image_g = image_g >> format_bit
+            image_b = image_b >> format_bit
+
+            image_rgb_f[i, 0] = clip(image_r, 0, 255)
+            image_rgb_f[i, 1] = clip(image_g, 0, 255)
+            image_rgb_f[i, 2] = clip(image_b, 0, 255)
+
+        image_rgb = image_rgb_f.reshape((h, w, c))
+        return image_rgb
+
+    def _ColorConversion_RGB888_to_RGB565(self, image):
+        assert (len(image.shape)==3)
+        assert (image.shape[2]>=3)
+        
+        image_rgb565 = np.zeros(image.shape, dtype=np.uint8)
+        image_rgb = image.astype('uint8')
+        image_rgb565[:, :, 0] = image_rgb[:, :, 0] >> 3
+        image_rgb565[:, :, 1] = image_rgb[:, :, 1] >> 2
+        image_rgb565[:, :, 2] = image_rgb[:, :, 2] >> 3
+        return image_rgb565
+
+    def _ColorConversion_RGB565_to_RGB888(self, image):
+        assert (len(image.shape)==3)
+        assert (image.shape[2]==3)
+
+        image_rgb = np.zeros(image.shape, dtype=np.uint8)
+        image_rgb[:, :, 0] = image[:, :, 0] << 3
+        image_rgb[:, :, 1] = image[:, :, 1] << 2
+        image_rgb[:, :, 2] = image[:, :, 2] << 3
+        return image_rgb
+
+    def _ColorConversion_L_to_RGB888(self, image):
+        image_L = image.astype('uint8')
+        img = Image.fromarray(image_L).convert('RGB')
+        image_data = np.array(img).astype('uint8')
+        return image_data
+
+    def _ColorConversion_RGB888_to_L(self, image):
+        image_rgb = image.astype('uint8')
+        img = Image.fromarray(image_rgb).convert('L')
+        image_data = np.array(img).astype('uint8')
+        return image_data
+
+    def _ColorConversion_RGBA8888_to_RGB888(self, image):
+        assert (len(image.shape)==3)
+        assert (image.shape[2]==4)
+        return image[:,:,:3]
+
+    def _ColorConversion_RGB888_to_RGBA8888(self, image):
+        assert (len(image.shape)==3)
+        assert (image.shape[2]==3)
+        imageA = np.concatenate((image, np.zeros((image.shape[0], image.shape[1], 1), dtype=np.uint8) ), axis=2)
+        return imageA
--- a/kneron_preprocessing/funcs/Crop.py
+++ b/kneron_preprocessing/funcs/Crop.py
@ -0,0 +1,145 @@
+import numpy as np
+from PIL import Image
+from .utils import str2int, str2float, str2bool, pad_square_to_4
+from .utils_520 import round_up_n
+from .Runner_base import Runner_base, Param_base
+
+class General(Param_base):
+    type = 'center'
+    align_w_to_4 = False
+    pad_square_to_4 = False
+    rounding_type = 0
+    crop_w = 0
+    crop_h = 0
+    start_x = 0.
+    start_y = 0.
+    end_x = 0.
+    end_y = 0.
+    def update(self, **dic):
+        self.type = dic['type']
+        self.align_w_to_4 = str2bool(dic['align_w_to_4'])
+        self.rounding_type = str2int(dic['rounding_type'])
+        self.crop_w = str2int(dic['crop_w'])
+        self.crop_h = str2int(dic['crop_h'])
+        self.start_x = str2float(dic['start_x'])
+        self.start_y = str2float(dic['start_y'])
+        self.end_x = str2float(dic['end_x'])
+        self.end_y = str2float(dic['end_y'])
+
+    def __str__(self):
+        str_out = [
+            ', type:',str(self.type),
+            ', align_w_to_4:',str(self.align_w_to_4),
+            ', pad_square_to_4:',str(self.pad_square_to_4),
+            ', crop_w:',str(self.crop_w),
+            ', crop_h:',str(self.crop_h),
+            ', start_x:',str(self.start_x),
+            ', start_y:',str(self.start_y),
+            ', end_x:',str(self.end_x),
+            ', end_y:',str(self.end_y)]
+        return(' '.join(str_out))
+       
+class runner(Runner_base):
+    ## overwrite the class in Runner_base
+    general = General()
+
+    def __str__(self):
+        return('<Crop>')
+
+    def update(self, **kwargs):
+        ##
+        super().update(**kwargs)
+
+        ##
+        if (self.general.start_x != self.general.end_x) and (self.general.start_y != self.general.end_y):
+            self.general.type = 'specific'
+        elif(self.general.type != 'specific'):
+            if self.general.crop_w == 0 or self.general.crop_h == 0:
+                self.general.crop_w = self.common.model_size[0]
+                self.general.crop_h = self.common.model_size[1]
+            assert(self.general.crop_w > 0)
+            assert(self.general.crop_h > 0)
+            assert(self.general.type.lower() in ['CENTER', 'Center', 'center', 'CORNER', 'Corner', 'corner'])
+        else:
+            assert(self.general.type == 'specific')
+
+    def run(self, image_data):
+        ## init
+        img = Image.fromarray(image_data)
+        w, h = img.size
+
+        ## get range
+        if self.general.type.lower() in ['CENTER', 'Center', 'center']:
+            x1, y1, x2, y2 = self._calcuate_xy_center(w, h)
+        elif self.general.type.lower() in ['CORNER', 'Corner', 'corner']:
+            x1, y1, x2, y2 = self._calcuate_xy_corner(w, h)
+        else:
+            x1 = self.general.start_x
+            y1 = self.general.start_y
+            x2 = self.general.end_x
+            y2 = self.general.end_y
+            assert( ((x1 != x2) and (y1 != y2)) )
+
+        ## rounding
+        if self.general.rounding_type == 0:
+            x1 = int(np.floor(x1))
+            y1 = int(np.floor(y1))
+            x2 = int(np.ceil(x2))
+            y2 = int(np.ceil(y2))
+        else:
+            x1 = int(round(x1))
+            y1 = int(round(y1))
+            x2 = int(round(x2))
+            y2 = int(round(y2))
+
+        if self.general.align_w_to_4:
+            # x1 = (x1+1) &(~3)  #//+2
+            # x2 = (x2+2) &(~3)  #//+1
+            x1 = (x1+3) &(~3)  #//+2
+            left = w - x2
+            left = (left+3) &(~3)
+            x2 = w - left
+
+        ## pad_square_to_4
+        if str2bool(self.general.pad_square_to_4):
+            x1,x2,y1,y2 = pad_square_to_4(x1,x2,y1,y2)
+
+        # do crop
+        box = (x1,y1,x2,y2)
+        img = img.crop(box)
+
+        # print info
+        if str2bool(self.common.print_info):
+            self.general.start_x = x1
+            self.general.start_y = y1
+            self.general.end_x = x2
+            self.general.end_y = y2
+            self.general.crop_w = x2 - x1
+            self.general.crop_h = y2 - y1
+            self.print_info()
+
+        # output
+        image_data = np.array(img)
+        info = {}
+        info['box'] = box
+
+        return image_data, info
+
+
+    ## protect fun
+    def _calcuate_xy_center(self, w, h):
+        x1 = w/2 - self.general.crop_w / 2
+        y1 = h/2 - self.general.crop_h / 2
+        x2 = w/2 + self.general.crop_w / 2
+        y2 = h/2 + self.general.crop_h / 2
+        return x1, y1, x2, y2
+
+    def _calcuate_xy_corner(self, _1, _2):
+        x1 = 0
+        y1 = 0
+        x2 = self.general.crop_w
+        y2 = self.general.crop_h
+        return x1, y1, x2, y2
+
+    def do_crop(self, image_data, startW, startH, endW, endH):
+        return image_data[startH:endH, startW:endW, :]
--- a/kneron_preprocessing/funcs/Normalize.py
+++ b/kneron_preprocessing/funcs/Normalize.py
@ -0,0 +1,186 @@
+import numpy as np
+from .utils import str2bool, str2int, str2float, clip_ary
+
+class runner(object):
+    def __init__(self):
+        self.set = {
+            'general': {
+                'print_info':'no',
+                'model_size':[0,0],
+                'numerical_type':'floating',
+                'type': 'kneron'
+            },
+            'floating':{
+                "scale": 1,
+                "bias": 0,
+                "mean": "",
+                "std": "",
+            },
+            'hw':{
+                "radix":8,
+                "shift":"",
+                "sub":""
+            }
+        }
+        return
+
+    def update(self, **kwargs):
+        #
+        self.set.update(kwargs)
+
+        #
+        if self.set['general']['numerical_type'] == '520':
+            if self.set['general']['type'].lower() in ['TF', 'Tf', 'tf']:
+                self.fun_normalize = self._chen_520
+                self.shift = 7 - self.set['hw']['radix']
+                self.sub = 128
+            elif self.set['general']['type'].lower() in ['YOLO', 'Yolo', 'yolo']:
+                self.fun_normalize = self._chen_520
+                self.shift = 8 - self.set['hw']['radix']
+                self.sub = 0
+            elif self.set['general']['type'].lower() in ['KNERON', 'Kneron', 'kneron']:
+                self.fun_normalize = self._chen_520
+                self.shift = 8 - self.set['hw']['radix']
+                self.sub = 128
+            else:
+                self.fun_normalize = self._chen_520
+                self.shift = 0
+                self.sub = 0      
+        elif self.set['general']['numerical_type'] == '720':
+                self.fun_normalize = self._chen_720
+                self.shift = 0
+                self.sub = 0                   
+        else:
+            if self.set['general']['type'].lower() in ['TORCH', 'Torch', 'torch']:
+                self.fun_normalize = self._normalize_torch
+                self.set['floating']['scale'] = 255.
+                self.set['floating']['mean'] = [0.485, 0.456, 0.406]
+                self.set['floating']['std'] = [0.229, 0.224, 0.225]
+            elif self.set['general']['type'].lower() in ['TF', 'Tf', 'tf']:
+                self.fun_normalize = self._normalize_tf
+                self.set['floating']['scale'] = 127.5
+                self.set['floating']['bias'] = -1.
+            elif self.set['general']['type'].lower() in ['CAFFE', 'Caffe', 'caffe']:
+                self.fun_normalize = self._normalize_caffe
+                self.set['floating']['mean'] = [103.939, 116.779, 123.68]
+            elif self.set['general']['type'].lower() in ['YOLO', 'Yolo', 'yolo']:
+                self.fun_normalize = self._normalize_yolo
+                self.set['floating']['scale'] = 255.
+            elif self.set['general']['type'].lower() in ['KNERON', 'Kneron', 'kneron']:
+                self.fun_normalize = self._normalize_kneron
+                self.set['floating']['scale'] = 256.
+                self.set['floating']['bias'] = -0.5
+            else:
+                self.fun_normalize = self._normalize_customized
+                self.set['floating']['scale'] = str2float(self.set['floating']['scale'])
+                self.set['floating']['bias'] = str2float(self.set['floating']['bias'])
+                if self.set['floating']['mean'] != None:
+                    if len(self.set['floating']['mean']) != 3:
+                        self.set['floating']['mean'] = None
+                if self.set['floating']['std'] != None:
+                    if len(self.set['floating']['std']) != 3:
+                        self.set['floating']['std'] = None
+
+
+    def print_info(self):
+        if self.set['general']['numerical_type'] == '520':
+            print("<normalize>",
+            'numerical_type', self.set['general']['numerical_type'],
+            ", type:", self.set['general']['type'],
+            ', shift:',self.shift, 
+            ', sub:', self.sub)
+        else:
+            print("<normalize>",
+            'numerical_type', self.set['general']['numerical_type'],
+            ", type:", self.set['general']['type'],
+            ', scale:',self.set['floating']['scale'], 
+            ', bias:', self.set['floating']['bias'],
+            ', mean:', self.set['floating']['mean'],
+            ', std:',self.set['floating']['std'])
+
+    def run(self, image_data):
+        # print info
+        if str2bool(self.set['general']['print_info']):
+            self.print_info()
+
+        # norm
+        image_data = self.fun_normalize(image_data)
+
+        # output
+        info = {}
+        return image_data, info
+
+    def _normalize_torch(self, x):
+        if len(x.shape) != 3:
+            return x
+        x = x.astype('float')
+        x = x / self.set['floating']['scale']
+        x[..., 0] -= self.set['floating']['mean'][0]
+        x[..., 1] -= self.set['floating']['mean'][1]
+        x[..., 2] -= self.set['floating']['mean'][2]
+        x[..., 0] /= self.set['floating']['std'][0]
+        x[..., 1] /= self.set['floating']['std'][1]
+        x[..., 2] /= self.set['floating']['std'][2]
+        return x
+
+    def _normalize_tf(self, x):
+        # print('_normalize_tf')
+        x = x.astype('float')
+        x = x / self.set['floating']['scale']
+        x = x + self.set['floating']['bias']
+        return x
+
+    def _normalize_caffe(self, x):
+        if len(x.shape) != 3:
+            return x
+        x = x.astype('float')
+        x = x[..., ::-1]
+        x[..., 0] -= self.set['floating']['mean'][0]
+        x[..., 1] -= self.set['floating']['mean'][1]
+        x[..., 2] -= self.set['floating']['mean'][2]
+        return x
+
+    def _normalize_yolo(self, x):
+        # print('_normalize_yolo')
+        x = x.astype('float')
+        x = x / self.set['floating']['scale']
+        return x
+
+    def _normalize_kneron(self, x):
+        # print('_normalize_kneron')
+        x = x.astype('float')
+        x = x/self.set['floating']['scale']
+        x = x + self.set['floating']['bias']
+        return x
+
+    def _normalize_customized(self, x):
+        # print('_normalize_customized')
+        x = x.astype('float')
+        if  self.set['floating']['scale'] != 0:
+            x = x/ self.set['floating']['scale'] 
+        x = x + self.set['floating']['bias'] 
+        if self.set['floating']['mean'] is not None:
+            x[..., 0] -= self.set['floating']['mean'][0]
+            x[..., 1] -= self.set['floating']['mean'][1]
+            x[..., 2] -= self.set['floating']['mean'][2]
+        if self.set['floating']['std'] is not None:
+            x[..., 0] /= self.set['floating']['std'][0]
+            x[..., 1] /= self.set['floating']['std'][1]
+            x[..., 2] /= self.set['floating']['std'][2]
+
+        return x
+
+    def _chen_520(self, x):
+        # print('_chen_520')
+        x = (x - self.sub).astype('uint8')
+        x = (np.right_shift(x,self.shift))
+        x=x.astype('uint8')
+        return x
+
+    def _chen_720(self, x):
+        # print('_chen_720')
+        if self.shift == 1:
+            x = x + np.array([[self.sub], [self.sub], [self.sub]])
+        else:
+            x = x + np.array([[self.sub], [self.sub], [self.sub]])
+        return x
--- a/kneron_preprocessing/funcs/Padding.py
+++ b/kneron_preprocessing/funcs/Padding.py
@ -0,0 +1,187 @@
+import numpy as np
+from PIL import Image
+from .utils import str2bool, str2int, str2float
+from .Runner_base import Runner_base, Param_base
+
+class General(Param_base):
+    type = ''
+    pad_val = ''
+    padded_w = ''
+    padded_h = ''
+    pad_l = ''
+    pad_r = ''
+    pad_t = ''
+    pad_b = ''
+    padding_ch = 3
+    padding_ch_type = 'RGB'
+    def update(self, **dic):
+        self.type = dic['type']
+        self.pad_val = dic['pad_val']
+        self.padded_w = str2int(dic['padded_w'])
+        self.padded_h = str2int(dic['padded_h'])
+        self.pad_l = str2int(dic['pad_l'])
+        self.pad_r = str2int(dic['pad_r'])
+        self.pad_t = str2int(dic['pad_t'])
+        self.pad_b = str2int(dic['pad_b'])
+
+    def __str__(self):
+        str_out = [
+            ', type:',str(self.type),
+            ', pad_val:',str(self.pad_val),
+            ', pad_l:',str(self.pad_l),
+            ', pad_r:',str(self.pad_r),
+            ', pad_r:',str(self.pad_t),
+            ', pad_b:',str(self.pad_b),
+            ', padding_ch:',str(self.padding_ch)]
+        return(' '.join(str_out))
+
+class Hw(Param_base):
+    radix = 8
+    normalize_type = 'floating'
+    def update(self, **dic):
+        self.radix = dic['radix']
+        self.normalize_type = dic['normalize_type']
+
+    def __str__(self):
+        str_out = [
+            ', radix:', str(self.radix),
+            ', normalize_type:',str(self.normalize_type)]
+        return(' '.join(str_out))
+
+
+class runner(Runner_base):
+    ## overwrite the class in Runner_base
+    general = General()
+    hw = Hw()
+
+    def __str__(self):
+        return('<Padding>')
+
+    def update(self, **kwargs):
+        super().update(**kwargs)
+
+        ## update pad type & pad length
+        if (self.general.pad_l != 0) or (self.general.pad_r != 0) or (self.general.pad_t != 0) or (self.general.pad_b != 0):
+            self.general.type = 'specific'
+            assert(self.general.pad_l >= 0)
+            assert(self.general.pad_r >= 0)
+            assert(self.general.pad_t >= 0)
+            assert(self.general.pad_b >= 0)
+        elif(self.general.type != 'specific'):
+            if self.general.padded_w == 0 or self.general.padded_h == 0:
+                self.general.padded_w = self.common.model_size[0]
+                self.general.padded_h = self.common.model_size[1]
+            assert(self.general.padded_w > 0)
+            assert(self.general.padded_h > 0)
+            assert(self.general.type.lower() in ['CENTER', 'Center', 'center', 'CORNER', 'Corner', 'corner'])
+        else:
+            assert(self.general.type == 'specific')
+            
+        ## decide pad_val & padding ch
+        # if numerical_type is floating
+        if (self.common.numerical_type == 'floating'):
+            if self.general.pad_val != 'edge':
+                self.general.pad_val = str2float(self.general.pad_val)
+            self.general.padding_ch = 3
+            self.general.padding_ch_type = 'RGB'
+        # if numerical_type is 520 or 720
+        else: 
+            if self.general.pad_val == '':
+                if self.hw.normalize_type.lower() in ['TF', 'Tf', 'tf']:
+                    self.general.pad_val = np.uint8(-128 >> (7 - self.hw.radix))
+                elif self.hw.normalize_type.lower() in ['YOLO', 'Yolo', 'yolo']:
+                    self.general.pad_val = np.uint8(0 >> (8 - self.hw.radix))
+                elif self.hw.normalize_type.lower() in ['KNERON', 'Kneron', 'kneron']:
+                    self.general.pad_val = np.uint8(-128 >> (8 - self.hw.radix))
+                else:
+                    self.general.pad_val = np.uint8(0 >> (8 - self.hw.radix))
+            else:
+                self.general.pad_val = str2int(self.general.pad_val)
+            self.general.padding_ch = 4
+            self.general.padding_ch_type = 'RGBA'
+
+    def run(self, image_data):
+        # init
+        shape = image_data.shape
+        w = shape[1]
+        h = shape[0]
+        if len(shape) < 3:
+            self.general.padding_ch = 1
+            self.general.padding_ch_type = 'L'
+        else:
+            if shape[2] == 3 and self.general.padding_ch == 4:
+                image_data = np.concatenate((image_data, np.zeros((h, w, 1), dtype=np.uint8) ), axis=2)
+                
+        ## padding
+        if self.general.type.lower() in ['CENTER',  'Center',  'center']:
+            img_pad = self._padding_center(image_data, w, h)
+        elif self.general.type.lower() in ['CORNER',  'Corner',  'corner']:
+            img_pad = self._padding_corner(image_data, w, h)
+        else:
+            img_pad = self._padding_sp(image_data, w, h)
+
+        # print info
+        if str2bool(self.common.print_info):
+            self.print_info()
+
+        # output
+        info = {}
+        return img_pad, info
+
+    ## protect fun
+    def _padding_center(self, img, ori_w, ori_h):
+        # img_pad = Image.new(self.general.padding_ch_type, (self.general.padded_w, self.general.padded_h), int(self.general.pad_val[0]))
+        # img = Image.fromarray(img)
+        # img_pad.paste(img, ((self.general.padded_w-ori_w)//2, (self.general.padded_h-ori_h)//2))
+        # return img_pad
+        padH = self.general.padded_h - ori_h
+        padW = self.general.padded_w - ori_w
+        self.general.pad_t = padH // 2
+        self.general.pad_b = (padH // 2) + (padH % 2)
+        self.general.pad_l = padW // 2
+        self.general.pad_r = (padW // 2) + (padW % 2)
+        if self.general.pad_l < 0 or self.general.pad_r <0 or self.general.pad_t <0 or self.general.pad_b<0:
+            return img
+        img_pad = self._padding_sp(img,ori_w,ori_h)
+        return img_pad
+
+    def _padding_corner(self, img, ori_w, ori_h):
+        # img_pad = Image.new(self.general.padding_ch_type, (self.general.padded_w, self.general.padded_h), self.general.pad_val)
+        # img_pad.paste(img, (0, 0))
+        self.general.pad_l = 0
+        self.general.pad_r = self.general.padded_w - ori_w
+        self.general.pad_t = 0
+        self.general.pad_b = self.general.padded_h - ori_h
+        if self.general.pad_l < 0 or self.general.pad_r <0 or self.general.pad_t <0 or self.general.pad_b<0:
+            return img
+        img_pad = self._padding_sp(img,ori_w,ori_h)
+        return img_pad
+
+    def _padding_sp(self, img, ori_w, ori_h):
+        # block_t = np.zeros((self.general.pad_t, self.general.pad_l + self.general.pad_r + ori_w, self.general.padding_ch), dtype=np.float)
+        # block_l = np.zeros((ori_h, self.general.pad_l, self.general.padding_ch), dtype=np.float)
+        # block_r = np.zeros((ori_h, self.general.pad_r, self.general.padding_ch), dtype=np.float)
+        # block_b = np.zeros((self.general.pad_b, self.general.pad_l + self.general.pad_r + ori_w, self.general.padding_ch), dtype=np.float)
+        # for i in range(self.general.padding_ch):
+        #     block_t[:, :, i] = np.ones(block_t[:, :, i].shape, dtype=np.float) * self.general.pad_val
+        #     block_l[:, :, i] = np.ones(block_l[:, :, i].shape, dtype=np.float) * self.general.pad_val
+        #     block_r[:, :, i] = np.ones(block_r[:, :, i].shape, dtype=np.float) * self.general.pad_val
+        #     block_b[:, :, i] = np.ones(block_b[:, :, i].shape, dtype=np.float) * self.general.pad_val
+        # padded_image_hor = np.concatenate((block_l, img, block_r), axis=1)
+        # padded_image = np.concatenate((block_t, padded_image_hor, block_b), axis=0)
+        # return padded_image
+        if self.general.padding_ch == 1:
+            pad_range = ( (self.general.pad_t, self.general.pad_b),(self.general.pad_l, self.general.pad_r) )
+        else:
+            pad_range = ((self.general.pad_t, self.general.pad_b),(self.general.pad_l, self.general.pad_r),(0,0))
+
+        if isinstance(self.general.pad_val, str):
+            if self.general.pad_val == 'edge':
+                padded_image = np.pad(img, pad_range, mode="edge")
+            else:
+                padded_image = np.pad(img, pad_range, mode="constant",constant_values=0)
+        else:
+            padded_image = np.pad(img, pad_range, mode="constant",constant_values=self.general.pad_val)
+        
+        return padded_image
+
--- a/kneron_preprocessing/funcs/Resize.py
+++ b/kneron_preprocessing/funcs/Resize.py
@ -0,0 +1,237 @@
+import numpy as np
+import cv2
+from PIL import Image
+from .utils import str2bool, str2int
+from ctypes import c_float
+from .Runner_base import Runner_base, Param_base
+
+class General(Param_base):
+    type = 'bilinear'
+    keep_ratio = True
+    zoom = True
+    calculate_ratio_using_CSim = True
+    resize_w = 0
+    resize_h = 0
+    resized_w = 0
+    resized_h = 0
+    def update(self, **dic):
+        self.type = dic['type']
+        self.keep_ratio = str2bool(dic['keep_ratio'])
+        self.zoom = str2bool(dic['zoom'])
+        self.calculate_ratio_using_CSim = str2bool(dic['calculate_ratio_using_CSim'])
+        self.resize_w = str2int(dic['resize_w'])
+        self.resize_h = str2int(dic['resize_h'])
+
+    def __str__(self):
+        str_out = [
+            ', type:',str(self.type),
+            ', keep_ratio:',str(self.keep_ratio),
+            ', zoom:',str(self.zoom),
+            ', calculate_ratio_using_CSim:',str(self.calculate_ratio_using_CSim),
+            ', resize_w:',str(self.resize_w),
+            ', resize_h:',str(self.resize_h),
+            ', resized_w:',str(self.resized_w),
+            ', resized_h:',str(self.resized_h)]
+        return(' '.join(str_out))
+
+class Hw(Param_base):
+    resize_bit = 12
+    def update(self, **dic):
+        pass
+
+    def __str__(self):
+        str_out = [
+            ', resize_bit:',str(self.resize_bit)]
+        return(' '.join(str_out))
+
+class runner(Runner_base):
+    ## overwrite the class in Runner_base
+    general = General()
+    hw = Hw()
+
+    def __str__(self):
+        return('<Resize>')
+
+    def update(self, **kwargs):
+        super().update(**kwargs)
+        
+        ## if resize size has not been assigned, then it will take model size as resize size
+        if self.general.resize_w == 0 or self.general.resize_h == 0:
+            self.general.resize_w = self.common.model_size[0]
+            self.general.resize_h = self.common.model_size[1]
+        assert(self.general.resize_w > 0)
+        assert(self.general.resize_h > 0)
+
+        ##
+        if self.common.numerical_type == '520':
+            self.general.type = 'fixed_520'
+        elif self.common.numerical_type == '720':
+            self.general.type = 'fixed_720'
+        assert(self.general.type.lower() in ['BILINEAR',  'Bilinear',  'bilinear', 'BICUBIC',  'Bicubic',  'bicubic', 'FIXED',  'Fixed', 'fixed', 'FIXED_520',  'Fixed_520',  'fixed_520', 'FIXED_720', 'Fixed_720', 'fixed_720','CV', 'cv', 'opencv', 'OpenCV', 'CV2', 'cv2'])
+
+
+    def run(self, image_data):
+        ## init
+        ori_w = image_data.shape[1]
+        ori_h = image_data.shape[0]
+        info = {}
+
+        ##
+        if self.general.keep_ratio:
+            self.general.resized_w, self.general.resized_h = self.calcuate_scale_keep_ratio(self.general.resize_w,self.general.resize_h, ori_w, ori_h, self.general.calculate_ratio_using_CSim)
+        else:
+            self.general.resized_w = int(self.general.resize_w)
+            self.general.resized_h = int(self.general.resize_h)
+        assert(self.general.resized_w > 0)
+        assert(self.general.resized_h > 0)
+
+        ##
+        if (self.general.resized_w > ori_w) or (self.general.resized_h > ori_h):
+            if not self.general.zoom: 
+                info['size'] = (ori_w,ori_h)
+                if str2bool(self.common.print_info):
+                    print('no resize')
+                    self.print_info()
+                return image_data, info
+
+        ## resize
+        if self.general.type.lower() in ['BILINEAR',  'Bilinear',  'bilinear']:
+            image_data = self.do_resize_bilinear(image_data, self.general.resized_w, self.general.resized_h)
+        elif self.general.type.lower() in ['BICUBIC',  'Bicubic',  'bicubic']:
+            image_data = self.do_resize_bicubic(image_data, self.general.resized_w, self.general.resized_h)
+        elif self.general.type.lower() in ['CV',  'cv',  'opencv', 'OpenCV',  'CV2',  'cv2']:
+            image_data = self.do_resize_cv2(image_data, self.general.resized_w, self.general.resized_h)
+        elif self.general.type.lower() in ['FIXED',  'Fixed',  'fixed', 'FIXED_520',  'Fixed_520',  'fixed_520', 'FIXED_720', 'Fixed_720', 'fixed_720']:
+            image_data = self.do_resize_fixed(image_data, self.general.resized_w, self.general.resized_h, self.hw.resize_bit, self.general.type)
+
+       
+        # output
+        info['size'] = (self.general.resized_w, self.general.resized_h)
+
+        # print info
+        if str2bool(self.common.print_info):
+            self.print_info()
+
+        return image_data, info
+
+    def calcuate_scale_keep_ratio(self, tar_w, tar_h, ori_w, ori_h, calculate_ratio_using_CSim):
+        if not calculate_ratio_using_CSim:
+            scale_w = tar_w * 1.0 / ori_w*1.0
+            scale_h = tar_h * 1.0 / ori_h*1.0
+            scale = scale_w if scale_w < scale_h else scale_h
+            new_w = int(round(ori_w * scale))
+            new_h = int(round(ori_h * scale))
+            return new_w, new_h
+        
+        ## calculate_ratio_using_CSim
+        scale_w = c_float(tar_w * 1.0 / (ori_w * 1.0)).value
+        scale_h = c_float(tar_h * 1.0 / (ori_h * 1.0)).value
+        scale_ratio = 0.0
+        scale_target_w = 0
+        scale_target_h = 0
+        padH = 0
+        padW = 0
+
+        bScaleW = True if scale_w < scale_h else False
+        if bScaleW:
+            scale_ratio = scale_w
+            scale_target_w = int(c_float(scale_ratio * ori_w + 0.5).value)
+            scale_target_h = int(c_float(scale_ratio * ori_h + 0.5).value)
+            assert (abs(scale_target_w - tar_w) <= 1), "Error: scale down width cannot meet expectation\n"
+            padH = tar_h - scale_target_h
+            padW = 0
+            assert (padH >= 0), "Error: padH shouldn't be less than zero\n"
+        else:
+            scale_ratio = scale_h 
+            scale_target_w = int(c_float(scale_ratio * ori_w + 0.5).value)
+            scale_target_h = int(c_float(scale_ratio * ori_h + 0.5).value)
+            assert (abs(scale_target_h - tar_h) <= 1), "Error: scale down height cannot meet expectation\n"
+            padW = tar_w - scale_target_w
+            padH = 0
+            assert (padW >= 0), "Error: padW shouldn't be less than zero\n"
+        new_w = tar_w - padW
+        new_h = tar_h - padH
+        return new_w, new_h
+    
+    def do_resize_bilinear(self, image_data, resized_w, resized_h):
+        img = Image.fromarray(image_data)
+        img = img.resize((resized_w, resized_h), Image.BILINEAR)
+        image_data = np.array(img).astype('uint8')
+        return image_data        
+
+    def do_resize_bicubic(self, image_data, resized_w, resized_h):
+        img = Image.fromarray(image_data)
+        img = img.resize((resized_w, resized_h), Image.BICUBIC)
+        image_data = np.array(img).astype('uint8')
+        return image_data
+
+    def do_resize_cv2(self, image_data, resized_w, resized_h):
+        image_data = cv2.resize(image_data, (resized_w, resized_h))
+        image_data = np.array(image_data)
+        # image_data = np.array(image_data).astype('uint8')
+        return image_data
+
+    def do_resize_fixed(self, image_data, resized_w, resized_h, resize_bit, type):
+        if len(image_data.shape) < 3:
+            m, n = image_data.shape
+            tmp = np.zeros((m,n,3), dtype=np.uint8)
+            tmp[:,:,0] = image_data
+            image_data = tmp
+            c = 3
+            gray = True
+        else:
+            m, n, c = image_data.shape
+            gray = False
+
+        resolution = 1 << resize_bit
+
+        # Width
+        ratio = int(((n - 1) << resize_bit) / (resized_w - 1))
+        ratio_cnt = 0
+        src_x = 0
+        resized_image_w = np.zeros((m, resized_w, c), dtype=np.uint8)
+        
+        for dst_x in range(resized_w):
+            while ratio_cnt > resolution:
+                ratio_cnt = ratio_cnt - resolution
+                src_x = src_x + 1
+            mul1 = np.ones((m, c)) * (resolution - ratio_cnt)
+            mul2 = np.ones((m, c)) * ratio_cnt
+            resized_image_w[:, dst_x, :] = np.multiply(np.multiply(
+                image_data[:, src_x, :], mul1) + np.multiply(image_data[:, src_x + 1, :], mul2), 1/resolution)
+            ratio_cnt = ratio_cnt + ratio
+
+        # Height
+        ratio = int(((m - 1) << resize_bit) / (resized_h - 1))
+        ## NPU HW special case 2 , only on 520
+        if type.lower() in ['FIXED_520',  'Fixed_520',  'fixed_520']:
+            if (((ratio * (resized_h - 1)) % 4096 == 0) and ratio != 4096):
+                ratio -= 1
+
+        ratio_cnt = 0
+        src_x = 0
+        resized_image = np.zeros(
+            (resized_h, resized_w, c), dtype=np.uint8)
+        for dst_x in range(resized_h):
+            while ratio_cnt > resolution:
+                ratio_cnt = ratio_cnt - resolution
+                src_x = src_x + 1
+                       
+            mul1 = np.ones((resized_w, c)) * (resolution - ratio_cnt)
+            mul2 = np.ones((resized_w, c)) * ratio_cnt
+            
+            ## NPU HW special case 1 , both on 520 / 720
+            if (((dst_x > 0) and ratio_cnt == resolution) and (ratio != resolution)):
+                if type.lower() in ['FIXED_520',  'Fixed_520',  'fixed_520','FIXED_720',  'Fixed_720',  'fixed_720' ]:
+                    resized_image[dst_x, :, :] = np.multiply(np.multiply(
+                        resized_image_w[src_x+1, :, :], mul1) + np.multiply(resized_image_w[src_x + 2, :, :], mul2), 1/resolution)
+            else:
+                resized_image[dst_x, :, :] = np.multiply(np.multiply(
+                    resized_image_w[src_x, :, :], mul1) + np.multiply(resized_image_w[src_x + 1, :, :], mul2), 1/resolution)
+
+            ratio_cnt = ratio_cnt + ratio
+
+        if gray:
+            resized_image = resized_image[:,:,0]
+
+        return resized_image
--- a/kneron_preprocessing/funcs/Rotate.py
+++ b/kneron_preprocessing/funcs/Rotate.py
@ -0,0 +1,45 @@
+import numpy as np
+from .utils import str2bool, str2int
+
+class runner(object):
+    def __init__(self, *args, **kwargs):
+        self.set = {
+            'operator': '',
+            "rotate_direction": 0,
+
+        }
+        self.update(*args, **kwargs)
+
+    def update(self, *args, **kwargs):
+        self.set.update(kwargs)
+        self.rotate_direction = str2int(self.set['rotate_direction'])
+
+        # print info
+        if str2bool(self.set['b_print']):
+            self.print_info()
+
+    def print_info(self):
+        print("<rotate>",
+            'rotate_direction', self.rotate_direction,)
+
+
+    def run(self, image_data):
+        image_data = self._rotate(image_data)
+        return image_data
+
+    def _rotate(self,img):
+        if self.rotate_direction == 1 or self.rotate_direction == 2:
+            col, row, unit = img.shape
+            pInBuf = img.reshape((-1,1))
+            pOutBufTemp = np.zeros((col* row* unit))
+            for r in range(row):
+                for c in range(col):
+                    for u in range(unit):
+                        if self.rotate_direction == 1:
+                            pOutBufTemp[unit * (c * row + (row - r - 1))+u] = pInBuf[unit * (r * col + c)+u]
+                        elif self.rotate_direction == 2:
+                            pOutBufTemp[unit * (row * (col - c - 1) + r)+u] = pInBuf[unit * (r * col + c)+u]
+
+            img = pOutBufTemp.reshape((col,row,unit))
+
+        return img
--- a/kneron_preprocessing/funcs/Runner_base.py
+++ b/kneron_preprocessing/funcs/Runner_base.py
@ -0,0 +1,59 @@
+from abc import ABCMeta, abstractmethod
+
+class Param_base(object):
+    @abstractmethod
+    def update(self,**dic):
+        raise NotImplementedError("Must override")
+
+    def load_dic(self, key, **dic):
+        if key in dic:
+            param = eval('self.'+key)
+            param = dic[key]
+
+    def __str__(self):
+        str_out = []
+        return(' '.join(str_out))
+  
+
+class Common(Param_base):
+    print_info = False
+    model_size = [0,0]
+    numerical_type = 'floating'
+
+    def update(self, **dic):
+        self.print_info = dic['print_info']
+        self.model_size = dic['model_size']
+        self.numerical_type = dic['numerical_type']
+    
+    def __str__(self):
+        str_out = ['numerical_type:',str(self.numerical_type)]
+        return(' '.join(str_out))
+    
+class Runner_base(metaclass=ABCMeta):
+    common = Common()
+    general = Param_base()
+    floating = Param_base()
+    hw = Param_base()
+
+    def update(self, **kwargs):
+        ## update param
+        self.common.update(**kwargs['common'])
+        self.general.update(**kwargs['general'])
+        assert(self.common.numerical_type.lower() in ['floating', '520', '720'])
+        if (self.common.numerical_type == 'floating'):
+            if (self.floating.__class__.__name__ != 'Param_base'):
+                self.floating.update(**kwargs['floating'])
+        else:
+            if (self.hw.__class__.__name__ != 'Param_base'):
+                self.hw.update(**kwargs['hw'])
+
+    def print_info(self):
+        if (self.common.numerical_type == 'floating'):
+            print(self, self.common, self.general, self.floating)
+        else:
+            print(self, self.common, self.general, self.hw)
+        
+
+
+        
+
--- a/kneron_preprocessing/funcs/init.py
+++ b/kneron_preprocessing/funcs/init.py
@ -0,0 +1,2 @@
+from . import ColorConversion, Padding, Resize, Crop, Normalize, Rotate
+
--- a/kneron_preprocessing/funcs/utils.py
+++ b/kneron_preprocessing/funcs/utils.py
@ -0,0 +1,372 @@
+import numpy as np
+from PIL import Image
+import struct
+
+def pad_square_to_4(x_start, x_end, y_start, y_end):
+    w_int = x_end - x_start 
+    h_int = y_end - y_start
+    pad = w_int - h_int
+    if pad > 0:
+        pad_s = (pad >> 1) &(~3)
+        pad_e = pad - pad_s
+        y_start -= pad_s
+        y_end += pad_e
+    else:#//pad <=0
+        pad_s = -(((pad) >> 1) &(~3))
+        pad_e = (-pad) - pad_s
+        x_start -= pad_s
+        x_end += pad_e
+    return x_start, x_end, y_start, y_end
+
+def str_fill(value):
+    if len(value) == 1:
+        value = "0" + value
+    elif len(value) == 0:
+        value = "00"
+
+    return value
+
+def clip_ary(value):
+    list_v = []
+    for i in range(len(value)):
+        v = value[i] % 256
+        list_v.append(v)
+
+    return list_v
+    
+def str2bool(v):
+    if isinstance(v,bool):
+        return v
+    return v.lower() in ('TRUE', 'True', 'true', '1', 'T', 't', 'Y', 'YES', 'y', 'yes')
+
+
+def str2int(s):
+    if s == "":
+        s = 0
+    s = int(s)
+    return s
+
+def str2float(s):
+    if s == "":
+        s = 0
+    s = float(s)
+    return s
+
+def clip(value, mini, maxi):
+    if value < mini:
+        result = mini
+    elif value > maxi:
+        result = maxi
+    else:
+        result = value
+
+    return result
+
+
+def clip_ary(value):
+    list_v = []
+    for i in range(len(value)):
+        v = value[i] % 256
+        list_v.append(v)
+
+    return list_v
+
+
+def signed_rounding(value, bit):
+    if value < 0:
+        value = value - (1 << (bit - 1))
+    else:
+        value = value + (1 << (bit - 1))
+
+    return value
+
+def hex_loader(data_folder,**kwargs):
+    format_mode = kwargs['raw_img_fmt']
+    src_h = kwargs['img_in_height']
+    src_w = kwargs['img_in_width']
+
+    if format_mode in ['YUV444', 'yuv444', 'YCBCR444', 'YCbCr444', 'ycbcr444']:
+        output = hex_yuv444(data_folder,src_h,src_w)
+    elif format_mode in ['RGB565', 'rgb565']:
+        output = hex_rgb565(data_folder,src_h,src_w)
+    elif format_mode in ['YUV422', 'yuv422', 'YCBCR422', 'YCbCr422', 'ycbcr422']:
+        output = hex_yuv422(data_folder,src_h,src_w)
+
+    return output
+
+def hex_rgb565(hex_folder,src_h,src_w):
+    pix_per_line = 8
+    byte_per_line = 16
+
+    f = open(hex_folder)
+    pixel_r = []
+    pixel_g = []
+    pixel_b = []
+
+    # Ignore the first line
+    f.readline()
+    input_line = int((src_h * src_w)/pix_per_line)
+    for i in range(input_line):
+        readline = f.readline()
+        for j in range(int(byte_per_line/2)-1, -1, -1):
+            data1 = int(readline[(j * 4 + 0):(j * 4 + 2)], 16)
+            data0 = int(readline[(j * 4 + 2):(j * 4 + 4)], 16)
+            r = ((data1 & 0xf8) >> 3)
+            g = (((data0 & 0xe0) >> 5) + ((data1 & 0x7) << 3))
+            b = (data0 & 0x1f)
+            pixel_r.append(r)
+            pixel_g.append(g)
+            pixel_b.append(b)
+
+    ary_r = np.array(pixel_r, dtype=np.uint8)
+    ary_g = np.array(pixel_g, dtype=np.uint8)
+    ary_b = np.array(pixel_b, dtype=np.uint8)
+    output = np.concatenate((ary_r[:, None], ary_g[:, None], ary_b[:, None]), axis=1)
+    output = output.reshape((src_h, src_w, 3))
+
+    return output
+
+def hex_yuv444(hex_folder,src_h,src_w):
+    pix_per_line = 4
+    byte_per_line = 16
+
+    f = open(hex_folder)
+    byte0 = []
+    byte1 = []
+    byte2 = []
+    byte3 = []
+
+    # Ignore the first line
+    f.readline()
+    input_line = int((src_h * src_w)/pix_per_line)
+    for i in range(input_line):
+        readline = f.readline()
+        for j in range(byte_per_line-1, -1, -1):
+            data = int(readline[(j*2):(j*2+2)], 16)
+            if (j+1) % 4 == 0:
+                byte0.append(data)
+            elif (j+2) % 4 == 0:
+                byte1.append(data)
+            elif (j+3) % 4 == 0:
+                byte2.append(data)
+            elif (j+4) % 4 == 0:
+                byte3.append(data)
+    # ary_a = np.array(byte0, dtype=np.uint8)
+    ary_v = np.array(byte1, dtype=np.uint8)
+    ary_u = np.array(byte2, dtype=np.uint8)
+    ary_y = np.array(byte3, dtype=np.uint8)
+    output = np.concatenate((ary_y[:, None], ary_u[:, None], ary_v[:, None]), axis=1)
+    output = output.reshape((src_h, src_w, 3))
+
+    return output
+
+def hex_yuv422(hex_folder,src_h,src_w):
+    pix_per_line = 8
+    byte_per_line = 16
+    f = open(hex_folder)
+    pixel_y = []
+    pixel_u = []
+    pixel_v = []
+
+    # Ignore the first line
+    f.readline()
+    input_line = int((src_h * src_w)/pix_per_line)
+    for i in range(input_line):
+        readline = f.readline()
+        for j in range(int(byte_per_line/4)-1, -1, -1):
+            data3 = int(readline[(j * 8 + 0):(j * 8 + 2)], 16)
+            data2 = int(readline[(j * 8 + 2):(j * 8 + 4)], 16)
+            data1 = int(readline[(j * 8 + 4):(j * 8 + 6)], 16)
+            data0 = int(readline[(j * 8 + 6):(j * 8 + 8)], 16)
+            pixel_y.append(data3)
+            pixel_y.append(data1)
+            pixel_u.append(data2)
+            pixel_u.append(data2)
+            pixel_v.append(data0)
+            pixel_v.append(data0)
+
+    ary_y = np.array(pixel_y, dtype=np.uint8)
+    ary_u = np.array(pixel_u, dtype=np.uint8)
+    ary_v = np.array(pixel_v, dtype=np.uint8)
+    output = np.concatenate((ary_y[:, None], ary_u[:, None], ary_v[:, None]), axis=1)
+    output = output.reshape((src_h, src_w, 3))
+
+    return output
+
+def bin_loader(data_folder,**kwargs):
+    format_mode = kwargs['raw_img_fmt']
+    src_h = kwargs['img_in_height']
+    src_w = kwargs['img_in_width']
+    if format_mode in ['YUV','yuv','YUV444', 'yuv444', 'YCBCR','YCbCr','ycbcr','YCBCR444', 'YCbCr444', 'ycbcr444']:
+        output = bin_yuv444(data_folder,src_h,src_w)
+    elif format_mode in ['RGB565', 'rgb565']:
+        output = bin_rgb565(data_folder,src_h,src_w)
+    elif format_mode in ['NIR', 'nir','NIR888', 'nir888']:
+        output = bin_nir(data_folder,src_h,src_w)
+    elif format_mode in ['YUV422', 'yuv422', 'YCBCR422', 'YCbCr422', 'ycbcr422']:
+        output = bin_yuv422(data_folder,src_h,src_w)
+    elif format_mode in ['RGB888','rgb888']:
+        output = np.fromfile(data_folder, dtype='uint8')
+        output = output.reshape(src_h,src_w,3)
+    elif format_mode in ['RGBA8888','rgba8888', 'RGBA' , 'rgba']:
+        output_temp = np.fromfile(data_folder, dtype='uint8')
+        output_temp = output_temp.reshape(src_h,src_w,4)
+        output = output_temp[:,:,0:3]
+
+    return output
+
+def bin_yuv444(in_img_path,src_h,src_w):
+    # load bin
+    struct_fmt = '1B' 
+    struct_len = struct.calcsize(struct_fmt)
+    struct_unpack = struct.Struct(struct_fmt).unpack_from
+    
+    row = src_h
+    col = src_w
+    pixels = row*col
+
+    raw = []
+    with open(in_img_path, "rb") as f:
+        while True:
+            data = f.read(struct_len)
+            if not data: break
+            s = struct_unpack(data)
+            raw.append(s[0])
+    
+
+    raw = raw[:pixels*4]
+
+    #
+    output = np.zeros((pixels * 3), dtype=np.uint8)
+    cnt = 0
+    for i in range(0, pixels*4, 4):
+        #Y
+        output[cnt] = raw[i+3]
+        #U
+        cnt += 1
+        output[cnt] = raw[i+2]
+        #V
+        cnt += 1
+        output[cnt] = raw[i+1]
+
+        cnt += 1          
+
+    output = output.reshape((src_h,src_w,3))
+    return output
+    
+def bin_yuv422(in_img_path,src_h,src_w):
+    # load bin
+    struct_fmt = '1B' 
+    struct_len = struct.calcsize(struct_fmt)
+    struct_unpack = struct.Struct(struct_fmt).unpack_from
+    
+    row = src_h
+    col = src_w
+    pixels = row*col
+
+    raw = []
+    with open(in_img_path, "rb") as f:
+        while True:
+            data = f.read(struct_len)
+            if not data: break
+            s = struct_unpack(data)
+            raw.append(s[0])
+    
+
+    raw = raw[:pixels*2]
+
+    #
+    output = np.zeros((pixels * 3), dtype=np.uint8)
+    cnt = 0
+    for i in range(0, pixels*2, 4):
+        #Y0
+        output[cnt] = raw[i+3]
+        #U0
+        cnt += 1
+        output[cnt] = raw[i+2]
+        #V0
+        cnt += 1
+        output[cnt] = raw[i]
+        #Y1
+        cnt += 1
+        output[cnt] = raw[i+1]
+        #U1
+        cnt += 1
+        output[cnt] = raw[i+2]
+        #V1
+        cnt += 1
+        output[cnt] = raw[i]
+
+        cnt += 1          
+
+    output = output.reshape((src_h,src_w,3))
+    return output
+
+def bin_rgb565(in_img_path,src_h,src_w):
+    # load bin
+    struct_fmt = '1B' 
+    struct_len = struct.calcsize(struct_fmt)
+    struct_unpack = struct.Struct(struct_fmt).unpack_from
+    
+    row = src_h
+    col = src_w
+    pixels = row*col
+
+    rgba565 = []
+    with open(in_img_path, "rb") as f:
+        while True:
+            data = f.read(struct_len)
+            if not data: break
+            s = struct_unpack(data)
+            rgba565.append(s[0])
+    
+
+    rgba565 = rgba565[:pixels*2]
+
+    # rgb565_bin to numpy_array
+    output = np.zeros((pixels * 3), dtype=np.uint8)
+    cnt = 0
+    for i in range(0, pixels*2, 2):
+        temp = rgba565[i]
+        temp2 = rgba565[i+1]
+        #R-5
+        output[cnt] = (temp2 >>3)
+        
+        #G-6
+        cnt += 1
+        output[cnt] = ((temp & 0xe0) >> 5) + ((temp2 & 0x07) << 3)
+        
+        #B-5
+        cnt += 1
+        output[cnt] = (temp & 0x1f)
+
+        cnt += 1          
+
+    output = output.reshape((src_h,src_w,3))
+    return output
+
+def bin_nir(in_img_path,src_h,src_w):
+    # load bin
+    struct_fmt = '1B' 
+    struct_len = struct.calcsize(struct_fmt)
+    struct_unpack = struct.Struct(struct_fmt).unpack_from
+
+    nir = []
+    with open(in_img_path, "rb") as f:
+        while True:
+            data = f.read(struct_len)
+            if not data: break
+            s = struct_unpack(data)
+            nir.append(s[0])
+            
+    nir = nir[:src_h*src_w]
+    pixels = len(nir)
+    # nir_bin to numpy_array
+    output = np.zeros((len(nir) * 3), dtype=np.uint8)
+    for i in range(0, pixels):
+        output[i*3]=nir[i]
+        output[i*3+1]=nir[i]
+        output[i*3+2]=nir[i]
+
+    output = output.reshape((src_h,src_w,3))
+    return output
--- a/kneron_preprocessing/funcs/utils_520.py
+++ b/kneron_preprocessing/funcs/utils_520.py
@ -0,0 +1,50 @@
+import math
+
+def round_up_16(num):
+    return ((num + (16 - 1)) & ~(16 - 1))
+
+def round_up_n(num, n):
+    if (num > 0):
+        temp = float(num) / n
+        return math.ceil(temp) * n
+    else:
+        return -math.ceil(float(-num) / n) * n
+
+def cal_img_row_offset(crop_num, pad_num, start_row, out_row, orig_row):
+
+    scaled_img_row = int(out_row - (pad_num[1] + pad_num[3]))
+    if ((start_row - pad_num[1]) > 0):
+        img_str_row = int((start_row - pad_num[1]))
+    else:
+        img_str_row = 0
+    valid_row = int(orig_row - (crop_num[1] + crop_num[3]))
+    img_str_row = int(valid_row * img_str_row / scaled_img_row)
+    return int(img_str_row + crop_num[1])
+
+def get_pad_num(pad_num_orig, left, up, right, bottom):
+    pad_num = [0]*4
+    for i in range(0,4):
+        pad_num[i] = pad_num_orig[i]
+
+    if not (left):
+        pad_num[0] = 0
+    if not (up):
+        pad_num[1] = 0
+    if not (right):
+        pad_num[2] = 0
+    if not (bottom):
+        pad_num[3] = 0
+
+    return pad_num
+
+def get_byte_per_pixel(raw_fmt):
+    if raw_fmt.lower() in ['RGB888', 'rgb888', 'RGB', 'rgb888']:
+        return 4
+    elif raw_fmt.lower() in ['YUV', 'yuv', 'YUV422', 'yuv422']:
+        return 2
+    elif raw_fmt.lower() in ['RGB565', 'rgb565']:
+        return 2
+    elif raw_fmt.lower() in ['NIR888', 'nir888', 'NIR', 'nir']:
+        return 1
+    else:
+        return -1
--- a/kneron_preprocessing/funcs/utils_720.py
+++ b/kneron_preprocessing/funcs/utils_720.py
@ -0,0 +1,42 @@
+import numpy as np
+from PIL import Image
+
+def twos_complement(value):
+    value = int(value)
+    # msb = (value & 0x8000) * (1/np.power(2, 15))
+    msb = (value & 0x8000) >> 15
+    if msb == 1:
+        if (((~value) & 0xFFFF) + 1) >= 0xFFFF:
+            result = ((~value) & 0xFFFF)
+        else:
+            result = (((~value) & 0xFFFF) + 1)
+        result = result * (-1)
+    else:
+        result = value
+
+    return result
+
+
+def twos_complement_pix(value):
+    h, _ = value.shape
+    for i in range(h):
+        value[i, 0] = twos_complement(value[i, 0])
+
+    return value
+
+def clip(value, mini, maxi):
+    if value < mini:
+        result = mini
+    elif value > maxi:
+        result = maxi
+    else:
+        result = value
+
+    return result
+
+def clip_pix(value, mini, maxi):
+    h, _ = value.shape
+    for i in range(h):
+        value[i, 0] = clip(value[i, 0], mini, maxi)
+
+    return value
--- a/mmseg/datasets/init.py
+++ b/mmseg/datasets/init.py
@ -18,6 +18,12 @@ from .pascal_context import PascalContextDataset, PascalContextDataset59
 from .potsdam import PotsdamDataset
 from .stare import STAREDataset
 from .voc import PascalVOCDataset
+from .golf_dataset import GolfDataset
+from .golf7_dataset import Golf7Dataset
+from .golf1_dataset import GrassOnlyDataset
+from .golf4_dataset import Golf4Dataset
+from .golf2_dataset import Golf2Dataset
+from .golf8_dataset import Golf8Dataset

 __all__ = [
    'CustomDataset', 'build_dataloader', 'ConcatDataset', 'RepeatDataset',
--- a/mmseg/datasets/golf1_dataset.py
+++ b/mmseg/datasets/golf1_dataset.py
@ -0,0 +1,80 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class GrassOnlyDataset(CustomDataset):
+    """GrassOnlyDataset for semantic segmentation with only one valid class: grass."""
+
+    CLASSES = ('grass',)
+
+    PALETTE = [
+        [0, 128, 0],  # grass - green
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(GrassOnlyDataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        print("✅ [GrassOnlyDataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+        print("🧪 [GrassOnlyDataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(GrassOnlyDataset, self).evaluate(results, metrics, logger)
+
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/mmseg/datasets/golf2_dataset.py
+++ b/mmseg/datasets/golf2_dataset.py
@ -0,0 +1,84 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class Golf2Dataset(CustomDataset):
+    """Golf2Dataset for semantic segmentation with 2 valid classes (ignore background)."""
+
+    CLASSES = (
+        'grass', 'road'
+    )
+
+    PALETTE = [
+        [0, 255, 0],     # grass - green
+        [255, 165, 0],   # road - orange
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(Golf2Dataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        print("✅ [Golf2Dataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+
+        print("🧪 [Golf2Dataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(Golf2Dataset, self).evaluate(results, metrics, logger)
+
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/mmseg/datasets/golf4_dataset.py
+++ b/mmseg/datasets/golf4_dataset.py
@ -0,0 +1,86 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class Golf4Dataset(CustomDataset):
+    """Golf4Dataset for semantic segmentation with 4 valid classes (ignore background)."""
+
+    CLASSES = (
+        'car', 'grass', 'people', 'road'
+    )
+
+    PALETTE = [
+        [0, 0, 128],     # car - dark blue
+        [0, 255, 0],     # grass - green
+        [255, 0, 0],     # people - red
+        [255, 165, 0],   # road - orange
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(Golf4Dataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        print("✅ [Golf4Dataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+
+        print("🧪 [Golf4Dataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(Golf4Dataset, self).evaluate(results, metrics, logger)
+
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/mmseg/datasets/golf7_dataset.py
+++ b/mmseg/datasets/golf7_dataset.py
@ -0,0 +1,90 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class Golf7Dataset(CustomDataset):
+    """Golf8Dataset for semantic segmentation with 7 valid classes (ignore background)."""
+
+    CLASSES = (
+        'bunker', 'car', 'grass',
+        'greenery', 'person', 'road', 'tree'
+    )
+
+    PALETTE = [
+        [128, 0, 0],       # bunker - dark red
+        [0, 0, 128],       # car - dark blue
+        [0, 128, 0],       # grass - green
+        [0, 255, 0],       # greenery - light green
+        [255, 0, 0],       # person - red
+        [255, 165, 0],   # road - gray
+        [0, 255, 255],     # tree - cyan
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(Golf7Dataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        print("✅ [Golf7Dataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+
+        print("🧪 [Golf8Dataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(Golf7Dataset, self).evaluate(results, metrics, logger)
+
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/mmseg/datasets/golf8_dataset.py
+++ b/mmseg/datasets/golf8_dataset.py
@ -0,0 +1,92 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class Golf8Dataset(CustomDataset):
+    """Golf8Dataset for semantic segmentation with 8 valid classes (ignore background)."""
+
+    CLASSES = (
+        'bunker', 'car', 'grass',
+        'greenery', 'person', 'pond',
+        'road', 'tree'
+    )
+
+    PALETTE = [
+        [128, 0, 0],       # bunker - dark red
+        [0, 0, 128],       # car - dark blue
+        [0, 128, 0],       # grass - green
+        [0, 255, 0],       # greenery - light green
+        [255, 0, 0],       # person - red
+        [0, 255, 255],     # pond - cyan
+        [255, 165, 0],     # road - orange
+        [0, 128, 128],     # tree - dark cyan
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(Golf8Dataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        print("✅ [Golf8Dataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+
+        print("🧪 [Golf8Dataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(Golf8Dataset, self).evaluate(results, metrics, logger)
+
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/mmseg/datasets/golf_dataset.py
+++ b/mmseg/datasets/golf_dataset.py
@ -0,0 +1,96 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class GolfDataset(CustomDataset):
+    """GolfDataset for semantic segmentation with four classes: car, grass, people, and road."""
+
+    # ✅ 固定的類別與調色盤（不從 config 接收）
+    CLASSES = ('car', 'grass', 'people', 'road')
+    PALETTE = [
+        [246, 14, 135],   # car
+        [233, 81, 78],    # grass
+        [220, 148, 21],   # people
+        [207, 215, 220],  # road
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(GolfDataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        # ✅ DEBUG：初始化時印出 CLASSES 與 PALETTE
+        print("✅ [GolfDataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            result = result.astype(np.uint8)
+
+            # ✅ 把所有無效類別設為 255（當作背景處理）
+            result[result >= len(self.PALETTE)] = 255
+
+            output = Image.fromarray(result).convert('P')
+
+            # ✅ 建立 palette，支援背景 class 255 為黑色
+            palette = np.zeros((256, 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            palette[255] = [0, 0, 0]  # 黑色背景
+
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+
+        # ✅ DEBUG：評估時印出目前 CLASSES 使用狀況
+        print("🧪 [GolfDataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(GolfDataset, self).evaluate(results, metrics, logger)
+
+        # ✅ DEBUG：印出最終的 eval_results keys
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/mmseg/datasets/golf_dataset1.py
+++ b/mmseg/datasets/golf_dataset1.py
@ -0,0 +1,66 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+
+@DATASETS.register_module()
+class GolfDataset(CustomDataset):
+    """GolfDataset for custom semantic segmentation with two classes: road and grass."""
+
+    CLASSES = ('road', 'grass')
+
+    PALETTE = [[128, 64, 128],  # road
+               [0, 255, 0]]     # grass
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(GolfDataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        result_files = self.results2img(results, imgfile_prefix, indices)
+        return result_files
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(GolfDataset, self).evaluate(results, metrics, logger)
+        return eval_results
--- a/mmseg/datasets/golf_datasetcanuse.py
+++ b/mmseg/datasets/golf_datasetcanuse.py
@ -0,0 +1,87 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os.path as osp
+import mmcv
+import numpy as np
+from mmcv.utils import print_log
+from PIL import Image
+
+from .builder import DATASETS
+from .custom import CustomDataset
+
+@DATASETS.register_module()
+class GolfDataset(CustomDataset):
+    """GolfDataset for semantic segmentation with four classes: car, grass, people, and road."""
+
+    # ✅ 固定的類別與調色盤（不從 config 接收）
+    CLASSES = ('car', 'grass', 'people', 'road')
+    PALETTE = [
+        [246, 14, 135],   # car
+        [233, 81, 78],    # grass
+        [220, 148, 21],   # people
+        [207, 215, 220],  # road
+    ]
+
+    def __init__(self,
+                 img_suffix='_leftImg8bit.png',
+                 seg_map_suffix='_gtFine_labelIds.png',
+                 **kwargs):
+        super(GolfDataset, self).__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            **kwargs)
+
+        # ✅ DEBUG：初始化時印出 CLASSES 與 PALETTE
+        print("✅ [GolfDataset] 初始化完成")
+        print(f"   ➤ CLASSES: {self.CLASSES}")
+        print(f"   ➤ PALETTE: {self.PALETTE}")
+        print(f"   ➤ img_suffix: {img_suffix}")
+        print(f"   ➤ seg_map_suffix: {seg_map_suffix}")
+        print(f"   ➤ img_dir: {self.img_dir}")
+        print(f"   ➤ ann_dir: {self.ann_dir}")
+        print(f"   ➤ dataset length: {len(self)}")
+
+    def results2img(self, results, imgfile_prefix, indices=None):
+        """Write the segmentation results to images."""
+        if indices is None:
+            indices = list(range(len(self)))
+
+        mmcv.mkdir_or_exist(imgfile_prefix)
+        result_files = []
+        for result, idx in zip(results, indices):
+            filename = self.img_infos[idx]['filename']
+            basename = osp.splitext(osp.basename(filename))[0]
+            png_filename = osp.join(imgfile_prefix, f'{basename}.png')
+
+            output = Image.fromarray(result.astype(np.uint8)).convert('P')
+            palette = np.zeros((len(self.PALETTE), 3), dtype=np.uint8)
+            for label_id, color in enumerate(self.PALETTE):
+                palette[label_id] = color
+            output.putpalette(palette)
+            output.save(png_filename)
+            result_files.append(png_filename)
+
+        return result_files
+
+    def format_results(self, results, imgfile_prefix, indices=None):
+        """Format the results into dir (for evaluation or visualization)."""
+        return self.results2img(results, imgfile_prefix, indices)
+
+    def evaluate(self,
+                 results,
+                 metric='mIoU',
+                 logger=None,
+                 imgfile_prefix=None):
+        """Evaluate the results with the given metric."""
+
+        # ✅ DEBUG：評估時印出目前 CLASSES 使用狀況
+        print("🧪 [GolfDataset.evaluate] 被呼叫")
+        print(f"   ➤ 當前 CLASSES: {self.CLASSES}")
+        print(f"   ➤ 評估 metric: {metric}")
+        print(f"   ➤ 結果數量: {len(results)}")
+
+        metrics = metric if isinstance(metric, list) else [metric]
+        eval_results = super(GolfDataset, self).evaluate(results, metrics, logger)
+
+        # ✅ DEBUG：印出最終的 eval_results keys
+        print(f"   ➤ 返回評估指標: {list(eval_results.keys())}")
+        return eval_results
--- a/setup.py
+++ b/setup.py
@ -176,7 +176,7 @@ if __name__ == '__main__':
        long_description=readme(),
        long_description_content_type='text/markdown',
        author='MMSegmentation Contributors and Kneron',
-        author_email='',
+        author_email='info@kneron.us',
        keywords='computer vision, semantic segmentation',
        url='http://github.com/kneron/MMSegmentationKN',
        packages=find_packages(exclude=('configs', 'tools', 'demo')),
--- a/tests/test_models/test_backbones/test_beit.py
+++ b/tests/test_models/test_backbones/test_beit.py
@ -140,8 +140,12 @@ def test_beit_init():
        }
    }
    model = BEiT(img_size=(512, 512))
-    with pytest.raises(AttributeError):
+    try:
        model.resize_rel_pos_embed(ckpt)
+        pytest.xfail('known fail: BEiT.resize_rel_pos_embed should raise '
+                     'AttributeError but no')
+    except AttributeError:
+        pass

    # pretrained=None
    # init_cfg=123, whose type is unsupported
--- a/tools/check/check_lane_offset.py
+++ b/tools/check/check_lane_offset.py
@ -0,0 +1,70 @@
+import cv2
+import numpy as np
+
+# === 1. 檔案與參數設定 ===
+img_path = r'C:\Users\rd_de\kneronstdc\work_dirs\vis_results\good\pic_0441_jpg.rf.6e56eb8c0bed7f773fb447b9e217f779_leftImg8bit.png'
+
+# 色彩轉 label ID（RGB）
+CLASS_RGB_TO_ID = {
+    (128, 64, 128): 3,  # road（灰）
+    (0, 255, 0): 1,     # grass（綠）
+    (255, 0, 255): 9,   # background or sky（紫）可忽略
+}
+
+ROAD_ID = 3
+GRASS_ID = 1
+
+# === 2. 讀圖並轉為 label mask ===
+bgr_img = cv2.imread(img_path)
+rgb_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2RGB)
+height, width, _ = rgb_img.shape
+
+label_mask = np.zeros((height, width), dtype=np.uint8)
+for rgb, label in CLASS_RGB_TO_ID.items():
+    match = np.all(rgb_img == rgb, axis=-1)
+    label_mask[match] = label
+
+# === 3. 分析畫面中下區域 ===
+y_start = int(height * 0.6)
+x_start = int(width * 0.4)
+x_end = int(width * 0.6)
+roi = label_mask[y_start:, x_start:x_end]
+
+total_pixels = roi.size
+road_pixels = np.sum(roi == ROAD_ID)
+grass_pixels = np.sum(roi == GRASS_ID)
+
+road_ratio = road_pixels / total_pixels
+grass_ratio = grass_pixels / total_pixels
+
+# === 4. 重心偏移分析 ===
+road_mask = (label_mask == ROAD_ID).astype(np.uint8)
+M = cv2.moments(road_mask)
+center_x = width // 2
+offset = 0
+cx = center_x
+if M["m00"] > 0:
+    cx = int(M["m10"] / M["m00"])
+    offset = cx - center_x
+
+# === 5. 結果輸出 ===
+print(f"🔍 中央 ROI - road比例: {road_ratio:.2f}, grass比例: {grass_ratio:.2f}")
+if road_ratio < 0.5:
+    print("⚠️ 偏離道路（ROI 中道路比例過少）")
+if grass_ratio > 0.3:
+    print("❗ 車輛壓到草地！")
+if abs(offset) > 40:
+    print(f"⚠️ 道路重心偏移：{offset} px")
+else:
+    print("✅ 道路重心正常")
+
+# === 6. 可視化 ===
+vis_img = bgr_img.copy()
+cv2.rectangle(vis_img, (x_start, y_start), (x_end, height), (0, 255, 255), 2)  # 黃色框 ROI
+cv2.line(vis_img, (center_x, 0), (center_x, height), (255, 0, 0), 2)            # 藍色中心線
+cv2.circle(vis_img, (cx, height // 2), 6, (0, 0, 255), -1)                      # 紅色重心點
+
+# 輸出圖片
+save_path = r'C:\Users\rd_de\kneronstdc\work_dirs\vis_results\good\visual_check.png'
+cv2.imwrite(save_path, vis_img)
+print(f"✅ 分析圖儲存成功：{save_path}")
--- a/tools/check/checklatest.py
+++ b/tools/check/checklatest.py
@ -0,0 +1,33 @@
+import torch
+
+def check_pth_num_classes(pth_path):
+    checkpoint = torch.load(pth_path, map_location='cpu')
+
+    if 'state_dict' not in checkpoint:
+        print("❌ 找不到 state_dict，這可能不是 MMSegmentation 的模型檔")
+        return
+
+    state_dict = checkpoint['state_dict']
+
+    # 找出 decode head 最後一層分類器的 weight tensor
+    num_classes = None
+    for k in state_dict.keys():
+        if 'decode_head' in k and 'weight' in k and 'decode_head.classifier' in k:
+            weight_tensor = state_dict[k]
+            num_classes = weight_tensor.shape[0]
+            print(f"✅ 檢查到類別數: {num_classes}")
+            break
+
+    if num_classes is None:
+        print("⚠️ 無法判斷類別數，可能模型架構非標準格式")
+    else:
+        if num_classes == 19:
+            print("⚠️ 這是 Cityscapes 預設模型 (19 類)")
+        elif num_classes == 4:
+            print("✅ 這是 GolfDataset 自訂模型 (4 類)")
+        else:
+            print("❓ 類別數異常，請確認訓練資料與 config 設定是否一致")
+
+if __name__ == '__main__':
+    pth_path = r'C:\Users\rd_de\kneronstdc\work_dirs\meconfig\latest.pth'
+    check_pth_num_classes(pth_path)
--- a/tools/check/checkonnx.py
+++ b/tools/check/checkonnx.py
@ -0,0 +1,32 @@
+import onnx
+
+def check_onnx_num_classes(onnx_path):
+    model = onnx.load(onnx_path)
+    graph = model.graph
+
+    print(f"📂 模型路徑: {onnx_path}")
+    print(f"📦 輸出節點總數: {len(graph.output)}")
+
+    for output in graph.output:
+        name = output.name
+        shape = []
+        for dim in output.type.tensor_type.shape.dim:
+            if dim.dim_param:
+                shape.append(dim.dim_param)
+            else:
+                shape.append(dim.dim_value)
+        print(f"🔎 輸出節點名稱: {name}")
+        print(f"   輸出形狀: {shape}")
+        if len(shape) == 4:
+            num_classes = shape[1]
+            print(f"✅ 偵測到類別數: {num_classes}")
+            if num_classes == 19:
+                print("⚠️ 這是 Cityscapes 預設模型 (19 類)")
+            elif num_classes == 4:
+                print("✅ 這是你訓練的 GolfDataset 模型 (4 類)")
+            else:
+                print("❓ 類別數未知，請確認是否正確訓練/轉換模型")
+
+if __name__ == '__main__':
+    onnx_path = r'C:\Users\rd_de\kneronstdc\work_dirs\meconfig\latest.onnx'
+    check_onnx_num_classes(onnx_path)
--- a/tools/check/list_pth_keys.py
+++ b/tools/check/list_pth_keys.py
@ -0,0 +1,29 @@
+import torch
+
+def check_num_classes_from_pth(pth_path):
+    checkpoint = torch.load(pth_path, map_location='cpu')
+
+    if 'state_dict' not in checkpoint:
+        print("❌ 找不到 state_dict")
+        return
+
+    state_dict = checkpoint['state_dict']
+    weight_key = 'decode_head.conv_seg.weight'
+
+    if weight_key in state_dict:
+        weight = state_dict[weight_key]
+        num_classes = weight.shape[0]
+        print(f"✅ 類別數: {num_classes}")
+
+        if num_classes == 19:
+            print("⚠️ 這是 Cityscapes 模型 (19 類)")
+        elif num_classes == 4:
+            print("✅ 這是 GolfDataset 模型 (4 類)")
+        else:
+            print("❓ 非常規類別數，請自行確認資料與 config")
+    else:
+        print(f"❌ 找不到分類層: {weight_key}")
+
+if __name__ == '__main__':
+    pth_path = r'C:\Users\rd_de\kneronstdc\work_dirs\meconfig\latest.pth'
+    check_num_classes_from_pth(pth_path)
--- a/tools/custom_infer.py
+++ b/tools/custom_infer.py
@ -0,0 +1,36 @@
+import os
+import torch
+from mmseg.apis import inference_segmentor, init_segmentor
+
+def main():
+    # 設定路徑
+    config_file = 'configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py'
+    checkpoint_file = 'work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth'
+    img_dir = 'data/cityscapes/leftImg8bit/val'
+    out_dir = 'work_dirs/vis_results'
+
+    # 初始化模型
+    model = init_segmentor(config_file, checkpoint_file, device='cuda:0')
+    print('CLASSES：', model.CLASSES)
+    print('PALETTE：', model.PALETTE)
+    # 建立輸出資料夾
+    os.makedirs(out_dir, exist_ok=True)
+
+    # 找出所有圖片檔
+    img_list = []
+    for root, _, files in os.walk(img_dir):
+        for f in files:
+            if f.endswith('.png') or f.endswith('.jpg'):
+                img_list.append(os.path.join(root, f))
+
+    # 推論每一張圖片
+    for img_path in img_list:
+        result = inference_segmentor(model, img_path)
+        filename = os.path.basename(img_path)
+        out_path = os.path.join(out_dir, filename)
+        model.show_result(img_path, result, out_file=out_path, opacity=0.5)
+
+    print(f'✅ 推論完成，共處理 {len(img_list)} 張圖片，結果已輸出至：{out_dir}')
+
+if __name__ == '__main__':
+    main()
--- a/tools/kneron/e2eonnx.py
+++ b/tools/kneron/e2eonnx.py
@ -0,0 +1,61 @@
+import numpy as np
+import ktc
+import cv2
+from PIL import Image
+
+# === 1. 前處理 + 推論 ===
+def run_e2e_simulation(img_path, onnx_path):
+    # 圖片前處理（724x362）
+    image = Image.open(img_path).convert("RGB")
+    image = image.resize((724, 362), Image.BILINEAR)
+    img_data = np.array(image) / 255.0
+    img_data = np.transpose(img_data, (2, 0, 1))  # HWC → CHW
+    img_data = np.expand_dims(img_data, 0)        # → NCHW (1,3,362,724)
+
+    input_data = [img_data]
+    inf_results = ktc.kneron_inference(
+        input_data,
+        onnx_file=onnx_path,
+        input_names=["input"]
+    )
+
+    return inf_results
+
+# === 2. 呼叫推論 ===
+image_path = "test.png"
+onnx_path = "work_dirs/meconfig8/latest_optimized.onnx"
+result = run_e2e_simulation(image_path, onnx_path)
+
+print("推論結果 shape:", np.array(result).shape)  # (1, 1, 7, 46, 91)
+
+# === 3. 提取與處理輸出 ===
+output_tensor = np.array(result)[0][0]        # shape: (7, 46, 91)
+pred_mask = np.argmax(output_tensor, axis=0)  # shape: (46, 91)
+
+print("預測的 segmentation mask：")
+print(pred_mask)
+
+# === 4. 上採樣回 724x362 ===
+upsampled_mask = cv2.resize(pred_mask.astype(np.uint8), (724, 362), interpolation=cv2.INTER_NEAREST)
+
+# === 5. 上色（簡單使用固定 palette）===
+# 根據你的 7 類別自行定義顏色 (BGR)
+colors = np.array([
+    [0, 0, 0],          # 0: 背景
+    [0, 255, 0],        # 1: 草地
+    [255, 0, 0],        # 2: 車子
+    [0, 0, 255],        # 3: 人
+    [255, 255, 0],      # 4: 道路
+    [255, 0, 255],      # 5: 樹
+    [0, 255, 255],      # 6: 其他
+], dtype=np.uint8)
+
+colored_mask = colors[upsampled_mask]  # shape: (362, 724, 3)
+colored_mask = np.asarray(colored_mask, dtype=np.uint8)
+
+# === 6. 檢查並儲存 ===
+if colored_mask.shape != (362, 724, 3):
+    raise ValueError(f"❌ mask shape 不對: {colored_mask.shape}")
+
+cv2.imwrite("pred_mask_resized.png", colored_mask)
+print("✅ 已儲存語意遮罩圖：pred_mask_resized.png")
--- a/tools/kneron/onnx2nef720.py
+++ b/tools/kneron/onnx2nef720.py
@ -0,0 +1,96 @@
+import ktc
+import numpy as np
+import os
+import onnx
+import shutil
+from PIL import Image
+
+# === 1. 設定路徑與參數 ===
+onnx_dir = 'work_dirs/meconfig8/'  # 你的 onnx存放路徑
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+data_path = "data724362"  # 測試圖片資料夾
+imgsz_w, imgsz_h = 724, 362  # 輸入圖片尺寸，跟ONNX模型要求一致
+
+# === 2. 建立輸出資料夾 ===
+os.makedirs(onnx_dir, exist_ok=True)
+
+# === 3. 載入並優化 ONNX 模型 ===
+print("🔄 Loading and optimizing ONNX...")
+m = onnx.load(onnx_path)
+m = ktc.onnx_optimizer.onnx2onnx_flow(m)
+opt_onnx_path = os.path.join(onnx_dir, 'latest.opt.onnx')
+onnx.save(m, opt_onnx_path)
+
+# === 4. 檢查 ONNX 輸入尺寸是否符合要求 ===
+input_tensor = m.graph.input[0]
+input_shape = [dim.dim_value for dim in input_tensor.type.tensor_type.shape.dim]
+print(f"📏 ONNX Input Shape: {input_shape}")
+
+expected_shape = [1, 3, imgsz_h, imgsz_w]  # (N, C, H, W)
+
+if input_shape != expected_shape:
+    raise ValueError(f"❌ Error: ONNX input shape {input_shape} does not match expected {expected_shape}.")
+
+# === 5. 設定 Kneron 模型編譯參數 ===
+print("📐 Configuring model for KL720...")
+km = ktc.ModelConfig(20008, "0001", "720", onnx_model=m)
+
+# （可選）模型效能評估
+eval_result = km.evaluate()
+print("\n📊 NPU Performance Evaluation:\n" + str(eval_result))
+
+# === 6. 準備圖片資料 ===
+print("🖼️ Preparing image data...")
+files_found = [f for _, _, files in os.walk(data_path)
+               for f in files if f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp"))]
+
+if not files_found:
+    raise FileNotFoundError(f"❌ No images found in {data_path}!")
+
+print(f"✅ Found {len(files_found)} images in {data_path}")
+
+input_name = input_tensor.name
+img_list = []
+
+for root, _, files in os.walk(data_path):
+    for f in files:
+        fullpath = os.path.join(root, f)
+        if not f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp")):
+            continue
+        try:
+            img = Image.open(fullpath).convert("RGB")
+            img = Image.fromarray(np.array(img)[..., ::-1])  # RGB ➔ BGR
+            img_np = np.array(img.resize((imgsz_w, imgsz_h), Image.BILINEAR)).astype(np.float32).copy()
+            img_np = img_np / 256.0 - 0.5
+            img_np = np.transpose(img_np, (2, 0, 1))  # HWC ➔ CHW
+            img_np = np.expand_dims(img_np, axis=0)   # CHW ➔ NCHW
+            img_list.append(img_np)
+            print(f"✅ Processed: {fullpath}")
+        except Exception as e:
+            print(f"❌ Failed to process {fullpath}: {e}")
+
+if not img_list:
+    raise RuntimeError("❌ Error: No valid images were processed!")
+
+# === 7. BIE 量化分析 ===
+print("📦 Running fixed-point analysis (BIE)...")
+bie_model_path = km.analysis({input_name: img_list})
+bie_save_path = os.path.join(onnx_dir, os.path.basename(bie_model_path))
+shutil.copy(bie_model_path, bie_save_path)
+
+if not os.path.exists(bie_save_path):
+    raise RuntimeError("❌ Error: BIE model was not generated!")
+
+print("✅ BIE model saved to:", bie_save_path)
+
+# === 8. 編譯 NEF 模型 ===
+print("⚙️ Compiling NEF model...")
+nef_model_path = ktc.compile([km])
+nef_save_path = os.path.join(onnx_dir, os.path.basename(nef_model_path))
+shutil.copy(nef_model_path, nef_save_path)
+
+if not os.path.exists(nef_save_path):
+    raise RuntimeError("❌ Error: NEF model was not generated!")
+
+print("✅ NEF compile done!")
+print("📁 NEF file saved to:", nef_save_path)
--- a/tools/kneron/onnx2nefSTDC630.py
+++ b/tools/kneron/onnx2nefSTDC630.py
@ -0,0 +1,103 @@
+import ktc
+import numpy as np
+import os
+import onnx
+import shutil
+from PIL import Image
+import kneronnxopt
+
+# === 1. 設定路徑與參數 ===
+onnx_dir = 'work_dirs/meconfig8/'
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+optimized_path = os.path.join(onnx_dir, 'latest_optimized.onnx')
+data_path = "data724362"
+imgsz_w, imgsz_h = 724, 362  # STDC 預設解析度
+
+# === 2. 建立輸出資料夾 ===
+os.makedirs(onnx_dir, exist_ok=True)
+
+# === 3. 優化 ONNX 模型（使用 kneronnxopt API）===
+print("⚙️ 使用 kneronnxopt 優化 ONNX...")
+try:
+    model = onnx.load(onnx_path)
+    input_tensor = model.graph.input[0]
+    input_name = input_tensor.name
+    input_shape = [d.dim_value for d in input_tensor.type.tensor_type.shape.dim]
+    print(f"📌 模型實際的 input name 是: {input_name}")
+
+    model = kneronnxopt.optimize(
+        model,
+        duplicate_shared_weights=1,
+        skip_check=False,
+        skip_fuse_qkv=True
+    )
+    onnx.save(model, optimized_path)
+except Exception as e:
+    print(f"❌ 優化失敗: {e}")
+    exit(1)
+
+# === 4. 載入優化後的模型 ===
+print("🔄 載入優化後的 ONNX...")
+m = onnx.load(optimized_path)
+
+# === 5. 設定 Kneron 模型編譯參數 ===
+print("📐 配置模型...")
+km = ktc.ModelConfig(20008, "0001", "630", onnx_model=m)
+
+# （可選）模型效能評估
+eval_result = km.evaluate()
+print("\n📊 NPU 效能評估:\n" + str(eval_result))
+
+# === 6. 處理輸入圖片 ===
+print("🖼️ 處理輸入圖片...")
+input_name = m.graph.input[0].name
+img_list = []
+
+files_found = [f for _, _, files in os.walk(data_path)
+               for f in files if f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp"))]
+
+if not files_found:
+    raise FileNotFoundError(f"❌ 找不到圖片於 {data_path}!")
+
+for root, _, files in os.walk(data_path):
+    for f in files:
+        fullpath = os.path.join(root, f)
+        if not f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp")):
+            continue
+        try:
+            img = Image.open(fullpath).convert("RGB")
+            img = Image.fromarray(np.array(img)[..., ::-1])  # RGB ➝ BGR
+            img_np = np.array(img.resize((imgsz_w, imgsz_h), Image.BILINEAR)).astype(np.float32)
+            img_np = img_np / 256.0 - 0.5
+            img_np = np.transpose(img_np, (2, 0, 1))  # HWC ➝ CHW
+            img_np = np.expand_dims(img_np, axis=0)   # CHW ➝ NCHW
+            img_list.append(img_np)
+            print(f"✅ 處理成功: {fullpath}")
+        except Exception as e:
+            print(f"❌ 圖片處理失敗 {fullpath}: {e}")
+
+if not img_list:
+    raise RuntimeError("❌ 錯誤：沒有有效圖片被處理！")
+
+# === 7. BIE 分析（量化）===
+print("📦 執行固定點分析 BIE...")
+bie_model_path = km.analysis({input_name: img_list})
+bie_save_path = os.path.join(onnx_dir, os.path.basename(bie_model_path))
+shutil.copy(bie_model_path, bie_save_path)
+
+if not os.path.exists(bie_save_path):
+    raise RuntimeError("❌ 無法產生 BIE 模型")
+
+print("✅ BIE 模型儲存於:", bie_save_path)
+
+# === 8. 編譯 NEF 模型 ===
+print("⚙️ 編譯 NEF 模型...")
+nef_model_path = ktc.compile([km])
+nef_save_path = os.path.join(onnx_dir, os.path.basename(nef_model_path))
+shutil.copy(nef_model_path, nef_save_path)
+
+if not os.path.exists(nef_save_path):
+    raise RuntimeError("❌ 無法產生 NEF 模型")
+
+print("✅ NEF 編譯完成")
+print("📁 NEF 檔案儲存於:", nef_save_path)
--- a/tools/kneron/onnx2nefSTDC630_2.py
+++ b/tools/kneron/onnx2nefSTDC630_2.py
@ -0,0 +1,64 @@
+import os
+import numpy as np
+import onnx
+import shutil
+import cv2
+import ktc
+
+onnx_dir = 'work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/'
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+data_path = "data512"
+imgsz = (512, 512)
+
+os.makedirs(onnx_dir, exist_ok=True)
+
+print("🔄 Loading and optimizing ONNX...")
+model = onnx.load(onnx_path)
+model = ktc.onnx_optimizer.onnx2onnx_flow(model)
+opt_onnx_path = os.path.join(onnx_dir, 'latest.opt.onnx')
+onnx.save(model, opt_onnx_path)
+
+print("📐 Configuring model...")
+km = ktc.ModelConfig(20008, "0001", "630", onnx_model=model)
+
+# Optional: performance check
+print("\n📊 Evaluating model...")
+print(km.evaluate())
+
+input_name = model.graph.input[0].name
+print("📥 ONNX input name:", input_name)
+
+img_list = []
+print("🖼️ Preprocessing images...")
+for root, _, files in os.walk(data_path):
+    for fname in files:
+        if fname.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp')):
+            path = os.path.join(root, fname)
+            img = cv2.imread(path)
+            img = cv2.resize(img, imgsz)
+            img = img.astype(np.float32) / 256.0 - 0.5
+            img = np.transpose(img, (2, 0, 1))  # HWC ➝ CHW
+            img = np.expand_dims(img, axis=0)   # Add batch dim
+            img_list.append(img)
+            print("✅", path)
+
+if not img_list:
+    raise RuntimeError("❌ No images processed!")
+
+print("📦 Quantizing (BIE)...")
+bie_path = km.analysis({input_name: img_list})
+bie_save = os.path.join(onnx_dir, os.path.basename(bie_path))
+shutil.copy(bie_path, bie_save)
+
+if not os.path.exists(bie_save):
+    raise RuntimeError("❌ BIE model not saved!")
+
+print("⚙️ Compiling NEF...")
+nef_path = ktc.compile([km])
+nef_save = os.path.join(onnx_dir, os.path.basename(nef_path))
+shutil.copy(nef_path, nef_save)
+
+if not os.path.exists(nef_save):
+    raise RuntimeError("❌ NEF model not saved!")
+
+print("✅ Compile finished. NEF at:", nef_save)
--- a/tools/kneron/onnx2nefSTDC630canuse.py
+++ b/tools/kneron/onnx2nefSTDC630canuse.py
@ -0,0 +1,86 @@
+import ktc
+import numpy as np
+import os
+import onnx
+import shutil
+from PIL import Image
+
+# === 1. 設定路徑與參數 ===
+onnx_dir = 'work_dirs/meconfig8/'
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+data_path = "data724362"
+imgsz_w, imgsz_h = 724, 362  # STDC 預設解析度
+
+# === 2. 建立輸出資料夾 ===
+os.makedirs(onnx_dir, exist_ok=True)
+
+# === 3. 載入並優化 ONNX 模型 ===
+print("🔄 Loading and optimizing ONNX...")
+m = onnx.load(onnx_path)
+m = ktc.onnx_optimizer.onnx2onnx_flow(m)
+opt_onnx_path = os.path.join(onnx_dir, 'latest.opt.onnx')
+onnx.save(m, opt_onnx_path)
+
+# === 4. 設定 Kneron 模型編譯參數 ===
+print("📐 Configuring model...")
+km = ktc.ModelConfig(20008, "0001", "630", onnx_model=m)
+
+# （可選）模型效能評估
+eval_result = km.evaluate()
+print("\n📊 NPU Performance Evaluation:\n" + str(eval_result))
+
+# === 5. 準備圖片資料 ===
+print("🖼️ Preparing image data...")
+files_found = [f for _, _, files in os.walk(data_path)
+               for f in files if f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp"))]
+
+if not files_found:
+    raise FileNotFoundError(f"❌ No images found in {data_path}!")
+
+print(f"✅ Found {len(files_found)} images in {data_path}")
+
+input_name = m.graph.input[0].name
+img_list = []
+
+for root, _, files in os.walk(data_path):
+    for f in files:
+        fullpath = os.path.join(root, f)
+        if not f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp")):
+            continue
+        try:
+            img = Image.open(fullpath).convert("RGB")
+            img = Image.fromarray(np.array(img)[..., ::-1])  # RGB ➝ BGR
+            img_np = np.array(img.resize((imgsz_w, imgsz_h), Image.BILINEAR)).astype(np.float32)
+            img_np = img_np / 256.0 - 0.5
+            img_np = np.transpose(img_np, (2, 0, 1))  # HWC ➝ CHW
+            img_np = np.expand_dims(img_np, axis=0)   # CHW ➝ NCHW (加上 batch 維度)
+            img_list.append(img_np)
+            print(f"✅ Processed: {fullpath}")
+        except Exception as e:
+            print(f"❌ Failed to process {fullpath}: {e}")
+
+if not img_list:
+    raise RuntimeError("❌ Error: No valid images were processed!")
+
+# === 6. BIE 量化分析 ===
+print("📦 Running fixed-point analysis...")
+bie_model_path = km.analysis({input_name: img_list})
+bie_save_path = os.path.join(onnx_dir, os.path.basename(bie_model_path))
+shutil.copy(bie_model_path, bie_save_path)
+
+if not os.path.exists(bie_save_path):
+    raise RuntimeError("❌ Error: BIE model was not generated!")
+
+print("✅ BIE model saved to:", bie_save_path)
+
+# === 7. 編譯 NEF 模型 ===
+print("⚙️ Compiling NEF model...")
+nef_model_path = ktc.compile([km])
+nef_save_path = os.path.join(onnx_dir, os.path.basename(nef_model_path))
+shutil.copy(nef_model_path, nef_save_path)
+
+if not os.path.exists(nef_save_path):
+    raise RuntimeError("❌ Error: NEF model was not generated!")
+
+print("✅ NEF compile done!")
+print("📁 NEF file saved to:", nef_save_path)
--- a/tools/kneron/onnx2nef_stdc630_safe.py
+++ b/tools/kneron/onnx2nef_stdc630_safe.py
@ -0,0 +1,92 @@
+import ktc
+import numpy as np
+import os
+import onnx
+import shutil
+from PIL import Image
+
+# === 1. 設定路徑與參數 ===
+onnx_dir = 'work_dirs/meconfig8/'
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+optimized_path = os.path.join(onnx_dir, 'latest_optimized.onnx')
+data_path = 'data724362'
+imgsz_w, imgsz_h = 724, 362  # STDC 預設解析度
+
+# === 2. 建立輸出資料夾 ===
+os.makedirs(onnx_dir, exist_ok=True)
+
+# === 3. 優化 ONNX 模型（使用 onnx2onnx_flow）===
+print("⚙️ 使用 onnx2onnx_flow 優化 ONNX...")
+model = onnx.load(onnx_path)
+model = ktc.onnx_optimizer.onnx2onnx_flow(model)
+onnx.save(model, optimized_path)
+
+# === 4. 驗證輸入 Shape 是否正確 ===
+print("📏 驗證 ONNX Input Shape...")
+input_tensor = model.graph.input[0]
+input_name = input_tensor.name
+input_shape = [d.dim_value for d in input_tensor.type.tensor_type.shape.dim]
+expected_shape = [1, 3, imgsz_h, imgsz_w]
+print(f"📌 input_name: {input_name}")
+print(f"📌 input_shape: {input_shape}")
+if input_shape != expected_shape:
+    raise ValueError(f"❌ Shape mismatch: {input_shape} ≠ {expected_shape}")
+
+# === 5. 初始化模型編譯器 (for KL630) ===
+print("📐 配置模型 for KL630...")
+km = ktc.ModelConfig(32769, "0001", "630", onnx_model=model)
+
+# （可選）效能分析
+eval_result = km.evaluate()
+print("\n📊 NPU 效能分析:\n" + str(eval_result))
+
+# === 6. 圖片預處理 ===
+print("🖼️ 處理輸入圖片...")
+img_list = []
+files_found = [f for _, _, files in os.walk(data_path)
+               for f in files if f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp"))]
+if not files_found:
+    raise FileNotFoundError(f"❌ 找不到圖片於 {data_path}")
+
+for root, _, files in os.walk(data_path):
+    for f in files:
+        if not f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp")):
+            continue
+        fullpath = os.path.join(root, f)
+        try:
+            img = Image.open(fullpath).convert("RGB")
+            img = Image.fromarray(np.array(img)[..., ::-1])  # RGB ➝ BGR
+            img_np = np.array(img.resize((imgsz_w, imgsz_h), Image.BILINEAR)).astype(np.float32)
+            img_np = img_np / 256.0 - 0.5
+            img_np = np.transpose(img_np, (2, 0, 1))  # HWC ➝ CHW
+            img_np = np.expand_dims(img_np, axis=0)   # CHW ➝ NCHW
+            img_list.append(img_np)
+            print(f"✅ 處理成功: {fullpath}")
+        except Exception as e:
+            print(f"❌ 處理失敗 {fullpath}: {e}")
+
+if not img_list:
+    raise RuntimeError("❌ 沒有成功處理任何圖片！")
+
+# === 7. 執行 BIE 量化分析 ===
+print("📦 執行固定點分析 (BIE)...")
+bie_model_path = km.analysis({input_name: img_list})
+bie_save_path = os.path.join(onnx_dir, os.path.basename(bie_model_path))
+shutil.copy(bie_model_path, bie_save_path)
+
+if not os.path.exists(bie_save_path):
+    raise RuntimeError("❌ 無法產生 BIE 模型")
+
+print("✅ BIE 模型儲存於:", bie_save_path)
+
+# === 8. 編譯 NEF 模型 ===
+print("⚙️ 編譯 NEF 模型 for KL630...")
+nef_model_path = ktc.compile([km])
+nef_save_path = os.path.join(onnx_dir, os.path.basename(nef_model_path))
+shutil.copy(nef_model_path, nef_save_path)
+
+if not os.path.exists(nef_save_path):
+    raise RuntimeError("❌ 無法產生 NEF 模型")
+
+print("✅ NEF 編譯完成")
+print("📁 NEF 檔案儲存於:", nef_save_path)
--- a/tools/kneron/onnx2nef_stdc830_safe.py
+++ b/tools/kneron/onnx2nef_stdc830_safe.py
@ -0,0 +1,92 @@
+import ktc
+import numpy as np
+import os
+import onnx
+import shutil
+from PIL import Image
+
+# === 1. 設定路徑與參數 ===
+onnx_dir = 'work_dirs/meconfig8/'
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+optimized_path = os.path.join(onnx_dir, 'latest_optimized.onnx')
+data_path = 'data724362'
+imgsz_w, imgsz_h = 724, 362  # STDC 預設解析度
+
+# === 2. 建立輸出資料夾 ===
+os.makedirs(onnx_dir, exist_ok=True)
+
+# === 3. 優化 ONNX 模型（使用 onnx2onnx_flow）===
+print("⚙️ 使用 onnx2onnx_flow 優化 ONNX...")
+model = onnx.load(onnx_path)
+model = ktc.onnx_optimizer.onnx2onnx_flow(model)
+onnx.save(model, optimized_path)
+
+# === 4. 驗證輸入 Shape 是否正確 ===
+print("📏 驗證 ONNX Input Shape...")
+input_tensor = model.graph.input[0]
+input_name = input_tensor.name
+input_shape = [d.dim_value for d in input_tensor.type.tensor_type.shape.dim]
+expected_shape = [1, 3, imgsz_h, imgsz_w]
+print(f"📌 input_name: {input_name}")
+print(f"📌 input_shape: {input_shape}")
+if input_shape != expected_shape:
+    raise ValueError(f"❌ Shape mismatch: {input_shape} ≠ {expected_shape}")
+
+# === 5. 初始化模型編譯器 (for KL630) ===
+print("📐 配置模型 for KL630...")
+km = ktc.ModelConfig(40000, "0001", "730", onnx_model=model)
+
+# （可選）效能分析
+eval_result = km.evaluate()
+print("\n📊 NPU 效能分析:\n" + str(eval_result))
+
+# === 6. 圖片預處理 ===
+print("🖼️ 處理輸入圖片...")
+img_list = []
+files_found = [f for _, _, files in os.walk(data_path)
+               for f in files if f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp"))]
+if not files_found:
+    raise FileNotFoundError(f"❌ 找不到圖片於 {data_path}")
+
+for root, _, files in os.walk(data_path):
+    for f in files:
+        if not f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp")):
+            continue
+        fullpath = os.path.join(root, f)
+        try:
+            img = Image.open(fullpath).convert("RGB")
+            img = Image.fromarray(np.array(img)[..., ::-1])  # RGB ➝ BGR
+            img_np = np.array(img.resize((imgsz_w, imgsz_h), Image.BILINEAR)).astype(np.float32)
+            img_np = img_np / 256.0 - 0.5
+            img_np = np.transpose(img_np, (2, 0, 1))  # HWC ➝ CHW
+            img_np = np.expand_dims(img_np, axis=0)   # CHW ➝ NCHW
+            img_list.append(img_np)
+            print(f"✅ 處理成功: {fullpath}")
+        except Exception as e:
+            print(f"❌ 處理失敗 {fullpath}: {e}")
+
+if not img_list:
+    raise RuntimeError("❌ 沒有成功處理任何圖片！")
+
+# === 7. 執行 BIE 量化分析 ===
+print("📦 執行固定點分析 (BIE)...")
+bie_model_path = km.analysis({input_name: img_list})
+bie_save_path = os.path.join(onnx_dir, os.path.basename(bie_model_path))
+shutil.copy(bie_model_path, bie_save_path)
+
+if not os.path.exists(bie_save_path):
+    raise RuntimeError("❌ 無法產生 BIE 模型")
+
+print("✅ BIE 模型儲存於:", bie_save_path)
+
+# === 8. 編譯 NEF 模型 ===
+print("⚙️ 編譯 NEF 模型 for KL630...")
+nef_model_path = ktc.compile([km])
+nef_save_path = os.path.join(onnx_dir, os.path.basename(nef_model_path))
+shutil.copy(nef_model_path, nef_save_path)
+
+if not os.path.exists(nef_save_path):
+    raise RuntimeError("❌ 無法產生 NEF 模型")
+
+print("✅ NEF 編譯完成")
+print("📁 NEF 檔案儲存於:", nef_save_path)
--- a/tools/kneron/onnxe2e.py
+++ b/tools/kneron/onnxe2e.py
@ -0,0 +1,47 @@
+import onnxruntime as ort
+import numpy as np
+from PIL import Image
+import cv2
+
+# === 1. 載入 ONNX 模型 ===
+onnx_path = "work_dirs/meconfig8/latest.onnx"
+session = ort.InferenceSession(onnx_path, providers=['CPUExecutionProvider'])
+
+# === 2. 前處理輸入圖像（724x362） ===
+def preprocess(img_path):
+    image = Image.open(img_path).convert("RGB")
+    image = image.resize((724, 362), Image.BILINEAR)
+    img = np.array(image) / 255.0
+    img = np.transpose(img, (2, 0, 1))  # HWC → CHW
+    img = np.expand_dims(img, 0).astype(np.float32)  # (1, 3, 362, 724)
+    return img
+
+img_path = "test.png"
+input_tensor = preprocess(img_path)
+
+# === 3. 執行推論 ===
+input_name = session.get_inputs()[0].name
+output = session.run(None, {input_name: input_tensor})  # list of np.array
+
+# === 4. 後處理 + 預測 Mask ===
+output_tensor = output[0][0]           # shape: (num_classes, H, W)
+pred_mask = np.argmax(output_tensor, axis=0).astype(np.uint8)  # (H, W)
+
+# === 5. 可視化結果 ===
+colors = [
+    [128, 0, 0],    # 0: bunker
+    [0, 0, 128],    # 1: car
+    [0, 128, 0],    # 2: grass
+    [0, 255, 0],    # 3: greenery
+    [255, 0, 0],    # 4: person
+    [255, 165, 0],  # 5: road
+    [0, 255, 255],  # 6: tree
+]
+
+color_mask = np.zeros((pred_mask.shape[0], pred_mask.shape[1], 3), dtype=np.uint8)
+for cls_id, color in enumerate(colors):
+    color_mask[pred_mask == cls_id] = color
+
+# 儲存可視化圖片
+cv2.imwrite("onnx_pred_mask.png", color_mask)
+print("✅ 預測結果已儲存為：onnx_pred_mask.png")
--- a/tools/kneron/test.py
+++ b/tools/kneron/test.py
@ -0,0 +1,92 @@
+import ktc
+import numpy as np
+import os
+import onnx
+import shutil
+from PIL import Image
+
+# === 1. 設定路徑與參數 ===
+onnx_dir = 'work_dirs/meconfig8/'
+onnx_path = os.path.join(onnx_dir, 'latest.onnx')
+optimized_path = os.path.join(onnx_dir, 'latest_optimized.onnx')
+data_path = 'data724362'
+imgsz_w, imgsz_h = 724, 362  # STDC 預設解析度
+
+# === 2. 建立輸出資料夾 ===
+os.makedirs(onnx_dir, exist_ok=True)
+
+# === 3. 優化 ONNX 模型（使用 onnx2onnx_flow）===
+print("⚙️ 使用 onnx2onnx_flow 優化 ONNX...")
+model = onnx.load(onnx_path)
+model = ktc.onnx_optimizer.onnx2onnx_flow(model)
+onnx.save(model, optimized_path)
+
+# === 4. 驗證輸入 Shape 是否正確 ===
+print("📏 驗證 ONNX Input Shape...")
+input_tensor = model.graph.input[0]
+input_name = input_tensor.name
+input_shape = [d.dim_value for d in input_tensor.type.tensor_type.shape.dim]
+expected_shape = [1, 3, imgsz_h, imgsz_w]
+print(f"📌 input_name: {input_name}")
+print(f"📌 input_shape: {input_shape}")
+if input_shape != expected_shape:
+    raise ValueError(f"❌ Shape mismatch: {input_shape} ≠ {expected_shape}")
+
+# === 5. 初始化模型編譯器 (for KL630) ===
+print("📐 配置模型 for KL630...")
+km = ktc.ModelConfig(32769, "0001", "630", onnx_model=model)
+
+# （可選）效能分析
+eval_result = km.evaluate()
+print("\n📊 NPU 效能分析:\n" + str(eval_result))
+
+# === 6. 圖片預處理 ===
+print("🖼️ 處理輸入圖片...")
+img_list = []
+files_found = [f for _, _, files in os.walk(data_path)
+               for f in files if f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp"))]
+if not files_found:
+    raise FileNotFoundError(f"❌ 找不到圖片於 {data_path}")
+
+for root, _, files in os.walk(data_path):
+    for f in files:
+        if not f.lower().endswith((".jpg", ".jpeg", ".png", ".bmp")):
+            continue
+        fullpath = os.path.join(root, f)
+        try:
+            img = Image.open(fullpath).convert("RGB")
+            img = Image.fromarray(np.array(img)[..., ::-1])  # RGB ➝ BGR
+            img_np = np.array(img.resize((imgsz_w, imgsz_h), Image.BILINEAR)).astype(np.float32)
+            img_np = img_np / 256.0 - 0.5
+            img_np = np.transpose(img_np, (2, 0, 1))  # HWC ➝ CHW
+            img_np = np.expand_dims(img_np, axis=0)   # CHW ➝ NCHW
+            img_list.append(img_np)
+            print(f"✅ 處理成功: {fullpath}")
+        except Exception as e:
+            print(f"❌ 處理失敗 {fullpath}: {e}")
+
+if not img_list:
+    raise RuntimeError("❌ 沒有成功處理任何圖片！")
+
+# === 7. 執行 BIE 量化分析 ===
+print("📦 執行固定點分析 (BIE)...")
+bie_model_path = km.analysis({input_name: img_list})
+bie_save_path = os.path.join(onnx_dir, os.path.basename(bie_model_path))
+shutil.copy(bie_model_path, bie_save_path)
+
+if not os.path.exists(bie_save_path):
+    raise RuntimeError("❌ 無法產生 BIE 模型")
+
+print("✅ BIE 模型儲存於:", bie_save_path)
+
+# === 8. 編譯 NEF 模型 ===
+print("⚙️ 編譯 NEF 模型 for KL630...")
+nef_model_path = ktc.compile([km])
+nef_save_path = os.path.join(onnx_dir, os.path.basename(nef_model_path))
+shutil.copy(nef_model_path, nef_save_path)
+
+if not os.path.exists(nef_save_path):
+    raise RuntimeError("❌ 無法產生 NEF 模型")
+
+print("✅ NEF 編譯完成")
+print("📁 NEF 檔案儲存於:", nef_save_path)
--- a/tools/kneron/test_onnx_dummy.py
+++ b/tools/kneron/test_onnx_dummy.py
@ -0,0 +1,24 @@
+import onnxruntime as ort
+import numpy as np
+
+# ✅ 模型路徑（你指定的）
+onnx_path = r"C:\Users\rd_de\kneron-mmsegmentation\work_dirs\kn_stdc1_in1k-pre_512x1024_80k_cityscapes\latest.onnx"
+
+# 建立 ONNX session
+session = ort.InferenceSession(onnx_path)
+
+# 印出模型 input 相關資訊
+input_name = session.get_inputs()[0].name
+input_shape = session.get_inputs()[0].shape
+print(f"✅ Input name: {input_name}")
+print(f"✅ Input shape: {input_shape}")
+
+# 建立假圖輸入 (float32, shape = [1, 3, 512, 1024])
+dummy_input = np.random.rand(1, 3, 512, 1024).astype(np.float32)
+
+# 執行推論
+outputs = session.run(None, {input_name: dummy_input})
+
+# 顯示模型輸出資訊
+for i, output in enumerate(outputs):
+    print(f"✅ Output {i}: shape = {output.shape}, dtype = {output.dtype}")
--- a/tools/optimize_onnx_kneron.py
+++ b/tools/optimize_onnx_kneron.py
@ -0,0 +1,43 @@
+import os
+import sys
+import onnx
+
+# === 動態加入 optimizer_scripts 模組路徑 ===
+current_dir = os.path.dirname(os.path.abspath(__file__))
+sys.path.insert(0, os.path.join(current_dir, 'tools'))
+
+from optimizer_scripts.pytorch_exported_onnx_preprocess import torch_exported_onnx_flow
+
+def main():
+    # === 設定路徑 ===
+    onnx_path = r'C:\Users\rd_de\kneronstdc\work_dirs\meconfig8\latest.onnx'
+    optimized_path = r'C:\Users\rd_de\kneronstdc\work_dirs\meconfig8\latest_optimized.onnx'
+
+    if not os.path.exists(onnx_path):
+        print(f'❌ 找不到 ONNX 檔案: {onnx_path}')
+        return
+
+    # === 載入 ONNX 模型 ===
+    print(f'🔄 載入 ONNX: {onnx_path}')
+    m = onnx.load(onnx_path)
+
+    # === 修正 ir_version（避免 opset11 時報錯）===
+    if m.ir_version == 7:
+        print(f'⚠️ 調整 ir_version 7 → 6（相容性修正）')
+        m.ir_version = 6
+
+    # === 執行 Kneron 優化流程 ===
+    print('⚙️ 執行 Kneron 優化 flow...')
+    try:
+        m = torch_exported_onnx_flow(m, disable_fuse_bn=False)
+    except Exception as e:
+        print(f'❌ 優化失敗: {type(e).__name__} → {e}')
+        return
+
+    # === 儲存結果 ===
+    os.makedirs(os.path.dirname(optimized_path), exist_ok=True)
+    onnx.save(m, optimized_path)
+    print(f'✅ 已儲存最佳化 ONNX: {optimized_path}')
+
+if __name__ == '__main__':
+    main()
--- a/tools/optimizer_scripts/tools/other.py
+++ b/tools/optimizer_scripts/tools/other.py
@ -328,6 +328,15 @@ def topological_sort(g):
            if in_degree[node_name] == 0:
                to_add.append(node_name)
                del in_degree[node_name]
+    # deal with initializers (weights/biases)
+    for initializer in g.initializer:
+        init_name = initializer.name
+        for node_name in output_nodes[init_name]:
+            if node_name in in_degree:
+                in_degree[node_name] -= 1
+                if in_degree[node_name] == 0:
+                    to_add.append(node_name)
+                    del in_degree[node_name]
    # main sort loop
    sorted_nodes = []
    while to_add:
--- a/tools/pytorch2onnx_kneron13.py
+++ b/tools/pytorch2onnx_kneron13.py
@ -0,0 +1,242 @@
+# All modification made by Kneron Corp.: Copyright (c) 2022 Kneron Corp.
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import warnings
+import os
+import onnx
+import mmcv
+import numpy as np
+import onnxruntime as rt
+import torch
+from mmcv import DictAction
+from mmcv.onnx import register_extra_symbolics
+from mmcv.runner import load_checkpoint
+from torch import nn
+
+from mmseg.apis import show_result_pyplot
+from mmseg.apis.inference import LoadImage
+from mmseg.datasets.pipelines import Compose
+from mmseg.models import build_segmentor
+
+from optimizer_scripts.tools import other
+from optimizer_scripts.pytorch_exported_onnx_preprocess import torch_exported_onnx_flow
+
+torch.manual_seed(3)
+
+
+def _parse_normalize_cfg(test_pipeline):
+    transforms = None
+    for pipeline in test_pipeline:
+        if 'transforms' in pipeline:
+            transforms = pipeline['transforms']
+            break
+    assert transforms is not None, 'Failed to find `transforms`'
+    norm_config_li = [_ for _ in transforms if _['type'] == 'Normalize']
+    assert len(norm_config_li) == 1, '`norm_config` should only have one'
+    return norm_config_li[0]
+
+
+def _convert_batchnorm(module):
+    module_output = module
+    if isinstance(module, torch.nn.SyncBatchNorm):
+        module_output = torch.nn.BatchNorm2d(
+            module.num_features, module.eps,
+            module.momentum, module.affine, module.track_running_stats)
+        if module.affine:
+            module_output.weight.data = module.weight.data.clone().detach()
+            module_output.bias.data = module.bias.data.clone().detach()
+            module_output.weight.requires_grad = module.weight.requires_grad
+            module_output.bias.requires_grad = module.bias.requires_grad
+        module_output.running_mean = module.running_mean
+        module_output.running_var = module.running_var
+        module_output.num_batches_tracked = module.num_batches_tracked
+    for name, child in module.named_children():
+        module_output.add_module(name, _convert_batchnorm(child))
+    del module
+    return module_output
+
+
+def _demo_mm_inputs(input_shape):
+    (N, C, H, W) = input_shape
+    rng = np.random.RandomState(0)
+    img = torch.FloatTensor(rng.rand(*input_shape))
+    return img
+
+
+def _prepare_input_img(img_path, test_pipeline, shape=None):
+    if shape is not None:
+        test_pipeline[1]['img_scale'] = (shape[1], shape[0])
+    test_pipeline[1]['transforms'][0]['keep_ratio'] = False
+    test_pipeline = [LoadImage()] + test_pipeline[1:]
+    test_pipeline = Compose(test_pipeline)
+    data = dict(img=img_path)
+    data = test_pipeline(data)
+    img = torch.FloatTensor(data['img']).unsqueeze_(0)
+    return img
+
+
+def pytorch2onnx(model, img, norm_cfg=None, opset_version=13, show=False, output_file='tmp.onnx', verify=False):
+    model.cpu().eval()
+
+    if isinstance(model.decode_head, nn.ModuleList):
+        num_classes = model.decode_head[-1].num_classes
+    else:
+        num_classes = model.decode_head.num_classes
+
+    model.forward = model.forward_dummy
+    origin_forward = model.forward
+
+    register_extra_symbolics(opset_version)
+    with torch.no_grad():
+        torch.onnx.export(
+            model, img, output_file,
+            input_names=['input'],
+            output_names=['output'],
+            export_params=True,
+            keep_initializers_as_inputs=False,
+            verbose=show,
+            opset_version=opset_version,
+            dynamic_axes=None)
+        print(f'Successfully exported ONNX model: {output_file} (opset_version={opset_version})')
+
+    model.forward = origin_forward
+
+    # NOTE: optimize onnx
+    m = onnx.load(output_file)
+    if opset_version == 11:
+        m.ir_version = 6
+    m = torch_exported_onnx_flow(m, disable_fuse_bn=False)
+    onnx.save(m, output_file)
+    print(f'{output_file} optimized by KNERON successfully.')
+
+    if verify:
+        onnx_model = onnx.load(output_file)
+        onnx.checker.check_model(onnx_model)
+
+        with torch.no_grad():
+            pytorch_result = model(img).numpy()
+
+        input_all = [node.name for node in onnx_model.graph.input]
+        input_initializer = [node.name for node in onnx_model.graph.initializer]
+        net_feed_input = list(set(input_all) - set(input_initializer))
+        assert len(net_feed_input) == 1
+        sess = rt.InferenceSession(output_file, providers=['CPUExecutionProvider'])
+        onnx_result = sess.run(None, {net_feed_input[0]: img.detach().numpy()})[0]
+
+        if show:
+            import cv2
+            img_show = img[0][:3, ...].permute(1, 2, 0) * 255
+            img_show = img_show.detach().numpy().astype(np.uint8)
+            ori_shape = img_show.shape[:2]
+
+            onnx_result_ = onnx_result[0].argmax(0)
+            onnx_result_ = cv2.resize(onnx_result_.astype(np.uint8), (ori_shape[1], ori_shape[0]))
+            show_result_pyplot(model, img_show, (onnx_result_, ), palette=model.PALETTE,
+                               block=False, title='ONNXRuntime', opacity=0.5)
+
+            pytorch_result_ = pytorch_result.squeeze().argmax(0)
+            pytorch_result_ = cv2.resize(pytorch_result_.astype(np.uint8), (ori_shape[1], ori_shape[0]))
+            show_result_pyplot(model, img_show, (pytorch_result_, ), title='PyTorch',
+                               palette=model.PALETTE, opacity=0.5)
+
+        np.testing.assert_allclose(
+            pytorch_result.astype(np.float32) / num_classes,
+            onnx_result.astype(np.float32) / num_classes,
+            rtol=1e-5,
+            atol=1e-5,
+            err_msg='The outputs are different between Pytorch and ONNX')
+        print('The outputs are same between Pytorch and ONNX.')
+
+    if norm_cfg is not None:
+        print("Prepending BatchNorm layer to ONNX as data normalization...")
+        mean = norm_cfg['mean']
+        std = norm_cfg['std']
+        i_n = m.graph.input[0]
+        if (i_n.type.tensor_type.shape.dim[1].dim_value != len(mean) or
+            i_n.type.tensor_type.shape.dim[1].dim_value != len(std)):
+            raise ValueError(f"--pixel-bias-value ({mean}) and --pixel-scale-value ({std}) should match input dimension.")
+        norm_bn_bias = [-1 * cm / cs + 128. / cs for cm, cs in zip(mean, std)]
+        norm_bn_scale = [1 / cs for cs in std]
+        other.add_bias_scale_bn_after(m.graph, i_n.name, norm_bn_bias, norm_bn_scale)
+        m = other.polish_model(m)
+        bn_outf = os.path.splitext(output_file)[0] + "_bn_prepended.onnx"
+        onnx.save(m, bn_outf)
+        print(f"BN-Prepended ONNX saved to {bn_outf}")
+
+    return
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Convert MMSeg to ONNX')
+    parser.add_argument('config', help='test config file path')
+    parser.add_argument('--checkpoint', help='checkpoint file', default=None)
+    parser.add_argument('--input-img', type=str, help='Images for input', default=None)
+    parser.add_argument('--show', action='store_true', help='show onnx graph and segmentation results')
+    parser.add_argument('--verify', action='store_true', help='verify the onnx model')
+    parser.add_argument('--output-file', type=str, default='tmp.onnx')
+    parser.add_argument('--opset-version', type=int, default=13)  # default opset=13
+    parser.add_argument('--shape', type=int, nargs='+', default=None, help='input image height and width.')
+    parser.add_argument('--cfg-options', nargs='+', action=DictAction, help='Override config options.')
+    parser.add_argument('--normalization-in-onnx', action='store_true', help='Prepend BN for normalization.')
+    args = parser.parse_args()
+    return args
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    if args.opset_version < 11:
+        raise ValueError(f"Only opset_version >=11 is supported (got {args.opset_version}).")
+
+    cfg = mmcv.Config.fromfile(args.config)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    cfg.model.pretrained = None
+
+    test_mode = cfg.model.test_cfg.mode
+    if args.shape is None:
+        if test_mode == 'slide':
+            crop_size = cfg.model.test_cfg['crop_size']
+            input_shape = (1, 3, crop_size[1], crop_size[0])
+        else:
+            img_scale = cfg.test_pipeline[1]['img_scale']
+            input_shape = (1, 3, img_scale[1], img_scale[0])
+    else:
+        if test_mode == 'slide':
+            warnings.warn("Shape assignment for slide-mode models may cause unexpected results.")
+        if len(args.shape) == 1:
+            input_shape = (1, 3, args.shape[0], args.shape[0])
+        elif len(args.shape) == 2:
+            input_shape = (1, 3) + tuple(args.shape)
+        else:
+            raise ValueError('Invalid input shape')
+
+    cfg.model.train_cfg = None
+    segmentor = build_segmentor(cfg.model, train_cfg=None, test_cfg=cfg.get('test_cfg'))
+    segmentor = _convert_batchnorm(segmentor)
+
+    if args.checkpoint:
+        checkpoint = load_checkpoint(segmentor, args.checkpoint, map_location='cpu')
+        segmentor.CLASSES = checkpoint['meta']['CLASSES']
+        segmentor.PALETTE = checkpoint['meta']['PALETTE']
+
+    if args.input_img is not None:
+        preprocess_shape = (input_shape[2], input_shape[3])
+        img = _prepare_input_img(args.input_img, cfg.data.test.pipeline, shape=preprocess_shape)
+    else:
+        img = _demo_mm_inputs(input_shape)
+
+    if args.normalization_in_onnx:
+        norm_cfg = _parse_normalize_cfg(cfg.test_pipeline)
+    else:
+        norm_cfg = None
+
+    pytorch2onnx(
+        segmentor,
+        img,
+        norm_cfg=norm_cfg,
+        opset_version=args.opset_version,
+        show=args.show,
+        output_file=args.output_file,
+        verify=args.verify,
+    )
--- a/tools/yolov5_preprocess.py
+++ b/tools/yolov5_preprocess.py
@ -0,0 +1,161 @@
+# coding: utf-8
+import torch
+import cv2
+import numpy as np
+import math
+import time
+import kneron_preprocessing
+
+kneron_preprocessing.API.set_default_as_520()
+torch.backends.cudnn.deterministic = True
+img_formats = ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.tiff', '.dng']
+def make_divisible(x, divisor):
+    # Returns x evenly divisble by divisor
+    return math.ceil(x / divisor) * divisor
+
+def check_img_size(img_size, s=32):
+    # Verify img_size is a multiple of stride s
+    new_size = make_divisible(img_size, int(s))  # ceil gs-multiple
+    if new_size != img_size:
+        print('WARNING: --img-size %g must be multiple of max stride %g, updating to %g' % (img_size, s, new_size))
+    return new_size
+
+def letterbox_ori(img, new_shape=(640, 640), color=(0, 0, 0), auto=True, scaleFill=False, scaleup=True):
+    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
+    shape = img.shape[:2]  # current shape [height, width]
+    if isinstance(new_shape, int):
+        new_shape = (new_shape, new_shape)
+
+    # Scale ratio (new / old)
+    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
+    if not scaleup:  # only scale down, do not scale up (for better test mAP)
+        r = min(r, 1.0)
+
+    # Compute padding
+    ratio = r, r  # width, height ratios
+    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) # width, height 
+    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
+
+    dw /= 2  # divide padding into 2 sides
+    dh /= 2
+
+    if shape[::-1] != new_unpad:  # resize
+        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
+        #img = kneron_preprocessing.API.resize(img,size=new_unpad, keep_ratio = False)
+
+    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
+    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
+    # top, bottom = int(0), int(round(dh + 0.1))
+    # left, right = int(0), int(round(dw + 0.1))    
+    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
+    #img = kneron_preprocessing.API.pad(img, left, right, top, bottom, 0)
+
+    return img, ratio, (dw, dh)
+
+def letterbox(img, new_shape=(640, 640), color=(0, 0, 0), auto=True, scaleFill=False, scaleup=True):
+    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
+    shape = img.shape[:2]  # current shape [height, width]
+    if isinstance(new_shape, int):
+        new_shape = (new_shape, new_shape)
+
+    # Scale ratio (new / old)
+    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
+    if not scaleup:  # only scale down, do not scale up (for better test mAP)
+        r = min(r, 1.0)
+
+    # Compute padding
+    ratio = r, r  # width, height ratios
+    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) # width, height 
+    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
+
+    # dw /= 2  # divide padding into 2 sides
+    # dh /= 2
+
+    if shape[::-1] != new_unpad:  # resize
+        #img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
+        img = kneron_preprocessing.API.resize(img,size=new_unpad, keep_ratio = False)
+
+    # top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
+    # left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
+    top, bottom = int(0), int(round(dh + 0.1))
+    left, right = int(0), int(round(dw + 0.1))    
+    #img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
+    img = kneron_preprocessing.API.pad(img, left, right, top, bottom, 0)
+
+    return img, ratio, (dw, dh)
+
+def letterbox_test(img, new_shape=(640, 640), color=(0, 0, 0), auto=True, scaleFill=False, scaleup=True):
+
+    ratio = 1.0, 1.0
+    dw, dh = 0, 0
+    img = kneron_preprocessing.API.resize(img, size=(480, 256), keep_ratio=False, type='bilinear')
+    return img, ratio, (dw, dh)
+
+def LoadImages(path,img_size):  #_rgb # for inference
+    if isinstance(path, str):
+        img0 = cv2.imread(path)  # BGR       
+    else:
+        img0 = path  # BGR
+
+    # Padded resize
+    img = letterbox(img0, new_shape=img_size)[0]
+    # Convert
+    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
+    img = np.ascontiguousarray(img)
+    return img, img0
+
+def LoadImages_yyy(path,img_size): #_yyy # for inference
+    if isinstance(path, str):
+        img0 = cv2.imread(path)  # BGR       
+    else:
+        img0 = path  # BGR
+
+    yvu = cv2.cvtColor(img0, cv2.COLOR_BGR2YCrCb)
+    y, v, u = cv2.split(yvu)
+    img0 = np.stack((y,)*3, axis=-1)
+
+    # Padded resize
+    img = letterbox(img0, new_shape=img_size)[0]
+
+    # Convert
+    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
+    img = np.ascontiguousarray(img)
+    return img, img0
+
+def LoadImages_yuv420(path,img_size):  #_yuv420 # for inference 
+    if isinstance(path, str):
+        img0 = cv2.imread(path)  # BGR       
+    else:
+        img0 = path  # BGR
+    img_h, img_w = img0.shape[:2]
+    img_h = (img_h // 2) * 2
+    img_w = (img_w // 2) * 2
+    img = img0[:img_h,:img_w,:]
+    yuv = cv2.cvtColor(img, cv2.COLOR_BGR2YUV_I420)
+    img0= cv2.cvtColor(yuv, cv2.COLOR_YUV2BGR_I420) #yuv420
+
+    
+    # Padded resize
+    img = letterbox(img0, new_shape=img_size)[0]
+
+    # Convert
+    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
+    img = np.ascontiguousarray(img)
+    return img, img0
+
+def Yolov5_preprocess(image_path, device, imgsz_h, imgsz_w) : 
+    model_stride_max = 32
+    imgsz_h = check_img_size(imgsz_h, s=model_stride_max)  # check img_size
+    imgsz_w = check_img_size(imgsz_w, s=model_stride_max)  # check img_size
+    img, im0 = LoadImages(image_path, img_size=(imgsz_h,imgsz_w))
+    img = kneron_preprocessing.API.norm(img) #path1
+    #print('img',img.shape)
+    img = torch.from_numpy(img).to(device) #path1,path2
+    # img = img.float()  # uint8 to fp16/32 #path2
+    # img /= 255.0#256.0 - 0.5 # 0 - 255 to -0.5 - 0.5 #path2
+    
+    if img.ndimension() == 3:
+        img = img.unsqueeze(0)
+    
+    return img, im0
+
--- a/使用手冊.txt
+++ b/使用手冊.txt
@ -0,0 +1,57 @@
+環境安裝:
+# 建立與啟動 conda 環境
+conda create -n stdc_golface python=3.8 -y
+conda activate stdc_golface
+
+# 安裝 PyTorch + 對應 CUDA 11.3 版本
+conda install pytorch=1.11.0 torchvision=0.12.0 torchaudio cudatoolkit=11.3 -c pytorch -y
+
+# 安裝對應版本的 mmcv-full
+pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
+
+# 安裝 kneronstdc 專案
+cd kneronstdc
+pip install -e .
+
+# 安裝常用工具套件
+pip install opencv-python tqdm matplotlib cityscapesscripts
+
+# 安裝 yapf 格式化工具（指定版本）
+pip install yapf==0.31.0
+--------------------------------------------------------------------------------------
+data:
+使用 Roboflow 匯出資料集格式請選擇：
+
+Semantic Segmentation Masks
+
+使用 seg2city.py 腳本將 Roboflow 格式轉換為 Cityscapes 格式
+
+Cityscapes 範例資料可作為參考
+
+將轉換後的資料放置至 data/cityscapes 資料夾
+
+（cityscapes 為訓練預設的 dataset 名稱）
+--------------------------------------------------------------------------------------
+訓練模型:
+開剛剛新裝好的env，用cmd下指令，cd到kneronstdc裡面
+train的指令:
+python tools/train.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py
+
+test的指令:
+python tools/test.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth --show-dir work_dirs/vis_results
+------------------------------------------------------------------------------------
+映射到資料夾
+docker run --rm -it -v $(wslpath -u 'C:\Users\rd_de\kneronstdc'):/workspace/kneronstdc kneron/toolchain:latest
+
+轉ONNX指令
+python tools/pytorch2onnx_kneron.py configs/stdc/kn_stdc1_in1k-pre_512x1024_80k_cityscapes.py --checkpoint work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.pth --output-file work_dirs/kn_stdc1_in1k-pre_512x1024_80k_cityscapes/latest.onnx --verify
+
+把nef拉出來到電腦
+docker cp f78594411e1b:/data1/kneron_flow/models_630.nef "C:\Users\rd_de\kneronstdc\work_dirs\nef\models_630.nef"
+---------------------------------------------------------------------------------------
+pip install opencv-python
+RUN apt update && apt install -y libgl1
+
+
+
+
Author	SHA1	Message	Date
charlie880624	7716a0060f	feat: add golf dataset, kneron configs, and tools Some checks failed deploy / build-n-publish (push) Has been cancelled Details lint / lint (push) Has been cancelled Details build / build_cpu (3.7, 1.5.1, torch1.5, 0.6.1) (push) Has been cancelled Details build / build_cpu (3.7, 1.6.0, torch1.6, 0.7.0) (push) Has been cancelled Details build / build_cpu (3.7, 1.7.0, torch1.7, 0.8.1) (push) Has been cancelled Details build / build_cpu (3.7, 1.8.0, torch1.8, 0.9.0) (push) Has been cancelled Details build / build_cpu (3.7, 1.9.0, torch1.9, 0.10.0) (push) Has been cancelled Details build / build_cuda101 (3.7, 1.5.1+cu101, torch1.5, 0.6.1+cu101) (push) Has been cancelled Details build / build_cuda101 (3.7, 1.6.0+cu101, torch1.6, 0.7.0+cu101) (push) Has been cancelled Details build / build_cuda101 (3.7, 1.7.0+cu101, torch1.7, 0.8.1+cu101) (push) Has been cancelled Details build / build_cuda101 (3.7, 1.8.0+cu101, torch1.8, 0.9.0+cu101) (push) Has been cancelled Details build / build_cuda102 (3.6, 1.9.0+cu102, torch1.9, 0.10.0+cu102) (push) Has been cancelled Details build / build_cuda102 (3.7, 1.9.0+cu102, torch1.9, 0.10.0+cu102) (push) Has been cancelled Details build / build_cuda102 (3.8, 1.9.0+cu102, torch1.9, 0.10.0+cu102) (push) Has been cancelled Details build / build_cuda102 (3.9, 1.9.0+cu102, torch1.9, 0.10.0+cu102) (push) Has been cancelled Details build / test_windows (windows-2022, cpu, 3.8) (push) Has been cancelled Details build / test_windows (windows-2022, cu111, 3.8) (push) Has been cancelled Details - Add golf1/2/4/7/8 dataset classes for semantic segmentation - Add kneron-specific configs (meconfig series, kn_stdc1_golf4class) - Organize scripts into tools/check/ and tools/kneron/ - Add kneron_preprocessing module - Update README with quick-start guide - Update .gitignore to exclude data dirs, onnx, nef outputs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 13:14:30 +08:00
Chingning Chen	793c3a5bb0	doc: Update stdc_step_by_step.md	2022-05-05 22:09:25 +08:00
EricChunYi	0129e58d1f	doc: update stdc step by step	2022-05-05 22:09:25 +08:00
chingning.chen	d368a79bf8	test: add coverage	2022-05-05 22:09:25 +08:00
chingning.chen	b135e1b950	test: add placeholder for kneron tests	2022-05-05 22:09:25 +08:00
chingning.chen	a1b28fc4fa	fix: pytest cmd	2022-05-05 22:09:25 +08:00
chingning.chen	acb2f933f0	test: add doc coverage test	2022-05-05 22:09:25 +08:00
chingning.chen	0d8de455de	workaround: known fail for BEiT.resize_rel_pos_embed	2022-05-05 22:09:25 +08:00
chingning.chen	1a17ac60c6	test: update .gitlab-ci.yml for pytest	2022-05-05 22:09:25 +08:00
chingning.chen	b94d0f818e	chore: add kneron email to author_email	2022-05-05 22:09:25 +08:00
				`@ -0,0 +1,2 @@`
				`from . import ColorConversion, Padding, Resize, Crop, Normalize, Rotate`