update files and add new datasets

This commit is contained in:
charlie880624 2026-03-18 17:44:11 +08:00
parent 048860986f
commit 65a56ec6a6
9270 changed files with 612087 additions and 252 deletions

308
README.md
View File

@ -1,270 +1,84 @@
<h1 align="center"> Object Detection </h1> # YOLOv5 訓練與部署流程
Object Detection task with YOLOv5 model.
This document contains the explanations of arguments of each script. ## 環境設置
使用 CMD 操作。
You can find the tutorial document for finetuning a pretrained model on COCO128 dataset under the `tutorial` folder, `tutorial/README.md`. 建立一個可以運行 YOLOv5 的 conda 環境。
## 資料集準備
The ipython notebook tutorial is also prepared under the `tutorial` folder as `tutorial/tutorial.ipynb`. You may upload and run this ipython notebook on Google colab. 1. 從 Roboflow 下載 **YOLOv8 格式**的資料集,放到專案目錄(例如 `data/` 下)
2. 修改資料集內的 `data.yaml`,依照以下格式調整路徑:
# Prerequisites ```yaml
- Python 3.8 or above path: C:/Users/rd_de/yolov5git/data/your-dataset
train: train/images
val: valid/images
test: test/images
# Installation nc: 3 # 類別數量
```bash names: ['class1', 'class2', 'class3']
$ pip install -U pip
$ pip install -r requirements.txt
``` ```
# Dataset & Preparation ## 訓練模型
The image data, annotations and dataset.yaml are required. `cd``yolov5/` 目錄,再執行:
## MS COCO
Our traning script accepts MS COCO dataset. You may download the dataset using the following link:
- Download [2017 MS COCO Dataset](https://cocodataset.org/#download)
## Custom Datasets
You can also train the model on a custom dataset.
### Annotations Format
After using a tool like [CVAT](https://github.com/openvinotoolkit/cvat), [makesense.ai](https://www.makesense.ai) or [Labelbox](https://labelbox.com) to label your images, export your labels to YOLO format, with one `*.txt` file per image (if no objects in image, no `*.txt` file is required). The `*.txt` file specifications are:
- One row per object
- Each row is `class x_center y_center width height` format.
- Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide `x_center` and `width` by image `width`, and `y_center` and `height` by image height.
- Class numbers are zero-indexed (start from 0).
<div align="center">
<img src="./tutorial/screenshots/readme_img.jpg" width="50%" />
</div>
The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):
<div align="center">
<img src="./tutorial/screenshots/readme_img2.png" width="40%" />
</div>
### Directory Organization
Your own datasets are expected to have the following structure. We assume `/dataset` is next to the `/yolov5` directory. YOLOv5 locates labels automatically for each image by replacing the last instance of `/images/` in each image path with `/labels/`.
```bash ```bash
- Dataset name python train.py \
-- images --data C:/Users/rd_de/yolov5git/data/10-02+10-01+10-038class/data.yaml \
-- train --weights for720best.pt \
--- img001.jpg --img 640 \
--- ... --batch-size 8 \
-- val --epochs 300 \
--- img002.jpg --device 0
--- ...
-- labels
-- train
--- img001.txt
--- ...
-- val
--- img002.txt
--- ...
- yolov5
- generate_npy
- exporting
``` ```
### dataset.yaml 訓練完成後,結果與權重檔位於:
```
runs/train/expX/weights/best.pt
```
## 推論測試
The yaml file for COCO dataset has been prepared in `./data/coco.yaml`. For custom dataset, you need to prepare the yaml file and save it under `./data/`. The yaml file is expected to have the following format:
```bash ```bash
# train and val datasets (image directory or *.txt file with image paths) python detect.py \
train: ./datasets/images/train/ --weights runs/train/exp9/weights/best.pt \
val: ./datasets/images/val/ --source test14data/test/images \
--img 640 \
# number of classes --conf 0.25 \
nc: number of classes --device 0
# class names
names: list of class names
``` ```
# Train ## 轉換 ONNX
For training on MS COCO, execute commands in the folder `yolov5`:
```shell
CUDA_VISIBLE_DEVICES='0' python train.py --data coco.yaml --cfg yolov5s-noupsample.yaml --weights '' --batch-size 64
```
`CUDA_VISIBLE_DEVICES='0'` indicates the gpu ids.
`--data` the yaml file. (located under `./data/`)
`--cfg` the model configuration. (located under `./model/`) (`yolov5s-noupsample.yaml` for 520, `yolov5s.yaml` for 720)
`--hyp` the path to hyperparameters file. (located under `./data/`)
`--weights` the path to pretained model weights. ('' if train from scratch)
`--epochs` the number of epochs to train. (Default: 300)
`--batch-size` batch size. (Default: 16)
`--img-size` the input size of the model. (Default: (640, 640))
`--workers` the maximum number of dataloader workers. (Default: 8)
By default, the trained models are saved under `./runs/train/`.
## Generating .npy for different model input
We can generating `.npy` for different model input by using `yolov5_generate_npy.py`. Execute commands in the folder `generate_npy`:
```shell
python yolov5_generate_npy.py --input-h 640 --input-w 640
```
`--input-h` the input height. (Default: 640)
`--input-w` the input width. (Default: 640)
We could get `*.npy`
# Configure the paths yaml file
You are expected to create a yaml file which stores all the paths related to the trained models. This yaml file will be used in the following sections. You can check and modify the `pretrained_paths_520.yaml` and `pretrained_paths_720.yaml` under `/yolov5/data/`. The yaml file is expected to contain the following information:
```shell
grid_dir: path_to_npy_file_directory
grid20_path: path_to_grid20_npy_file
grid40_path: path_to_grid40_npy_file
grid80_path: path_to_grid80_npy_file
yolov5_dir: path_to_yolov5_directory
path: path_to_pretrained_yolov5_model_weights_pt_file
yaml_path: path_to_the_model_configuration_yaml_file
pt_path: path_to_export_yolov5_model_weights_kneron_supported_file
onnx_export_file: path_to_export_yolov5_onnx_model_file
input_w: model_input_weight
input_h: model_input_height
nc: number_of_classes
names: list_of_class_names
```
# Save and Convert to ONNX
This section will introduce how to save the trained model for pytorch1.4 supported format and convert to ONNX.
## Exporting ONNX model in the PyTorch 1.7 environment
We can convert the model to onnx by using `yolov5_export.py`. Execute commands in the folder `yolov5`:
```shell
python ../exporting/yolov5_export.py --data path_to_pretrained_path_yaml_file
```
`--data` the path to pretrained model paths yaml file (Default: ../yolov5/data/pretrained_paths_520.yaml)
We could get onnx model.
## Converting onnx by tool chain
Pull the latest [ONNX converter](https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts) from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder `ONNX_Convertor/optimizer_scripts`:
(reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts)
```shell
python -m onnxsim input_onnx_model output_onnx_model
python pytorch2onnx.py input.pth output.onnx
```
We could get converted onnx model.
# Inference
Before model inference, we assume that the model has been converted to onnx model as in the previous section (even if only inference pth model). Create a yaml file containing the path information. For model inference on a single image, execute commands in the folder `yolov5`:
```shell
python inference.py --data path_to_pretrained_path_yaml_file --img-path path_to_image --save-path path_to_saved_image
```
`--img-path` the path to the image.
`--save-path` the path to draw and save the image with bbox.
`--data` the path to pretrained model paths yaml file. (Default: data/pretrained_paths_520.yaml)
`--conf_thres` the score threshold of bounding boxes. (Default: 0.3)
`--iou_thres` the iou threshold for NMS. (Default: 0.3)
`--onnx` whether is onnx model inference.
You could find preprocessing and postprocessing processes under the folder `exporting/yolov5/`.
# Evaluation
## Evaluation Metric
We will use mean Average Precision (mAP) for evaluation. You can find the script for computing mAP in `test.py`.
`mAP`: mAP is the average of Average Precision (AP). AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:
<img src="https://latex.codecogs.com/svg.image?AP&space;=&space;\sum_n&space;(R_n-R_{n-1})P_n&space;" title="AP = \sum_n (R_n-R_{n-1})P_n " />
where <img src="https://latex.codecogs.com/svg.image?R_n" title="R_n" /> and <img src="https://latex.codecogs.com/svg.image?P_n" title="P_n" /> are the precision and recall at the nth threshold. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.
## Evaluation on a Dataset
For evaluating the trained model on dataset:
```shell
python test.py --weights path_to_pth_model_weight --data path_to_data_yaml_file
```
`--weights` The path to pretrained model weight. (Defalut: best.pt)
`--data` The path to data yaml file. (Default: data/coco128.yaml)
`--img-size` Input shape of the model (Default: (640, 640))
`--conf-thres` Object confidence threshold. (Default: 0.001)
`--device` Cuda device, i.e. 0 or 0,1,2,3 or cpu. (Default: cpu)
`--verbose` Whether report mAP by class.
## End-to-End Evaluation
If you would like to perform an end-to-end test with an image dataset, you can use `inference_e2e.py` under the directory `yolov5` to obtain the prediction results.
You have to prepare an initial parameter yaml file for the inference runner. You may check `utils/init_params.yaml` for the format.
```shell
python inference_e2e.py --img-path path_to_dataset_folder --params path_to_init_params_file --save-path path_to_save_json_file
```
`--img-path` Path to the dataset directory
`--params` Path to initial parameter yaml file for the inference runner
`--save-path` Path to save the prediction to a json file
`--gpu` GPU id (-1 if cpu) (Default: -1)
The predictions will be saved into a json file that has the following structure:
```bash ```bash
[ python exporting/yolov5_export.py --data data/mepretrained_paths_720.yaml
{'img_path':image_path_1
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
{'img_path':image_path_2
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
...
]
``` ```
# Model
Backbone | Input Size | FPS on 520 | FPS on 720 | Model Size | mAP 簡化 ONNX 模型:
--- | --- |:---:|:---:|:---:|:---:
[YOLOv5s (no upsample)](https://github.com/kneron/Model_Zoo/tree/main/detection/yolov5/yolov5s-noupsample) | 640x640 | 4.91429 | - | 13.1M | 40.4%
[YOLOv5s (with upsample)](https://github.com/kneron/Model_Zoo/tree/main/detection/yolov5/yolov5s) | 640x640 | - | 24.4114 | 14.6M | 50.9%
[YOLOv5s (no upsample)](https://github.com/kneron/Model_Zoo/tree/main/detection/yolov5/yolov5s-noupsample) is the yolov5s model backbone without upsampling operation, since 520 hardware does not support upsampling operation. ```bash
python -m onnxsim \
runs/train/exp24/weights/best.onnx \
runs/train/exp24/weights/best_simplified.onnx
```
## Kneron ToolchainDocker
啟動 Kneron Toolchain 容器(在 WSL 中執行):
```bash
docker run --rm -it \
-v $(wslpath -u 'C:\Users\rd_de\golfaceyolov5\yolov5'):/workspace/yolov5 \
kneron/toolchain:latest
```
從容器複製編譯好的 `.nef` 模型到本機:
```bash
docker cp <container_id>:/data1/kneron_flow/runs/train/exp6/weights/models_630.nef \
C:\Users\rd_de\golfaceyolov5\yolov5\runs\train\exp6\weights
```

7
ai_training/.gitmodules vendored Normal file
View File

@ -0,0 +1,7 @@
[submodule "evaluation/kneron_eval/utils/kneron_globalconstant"]
path = evaluation/kneron_eval/utils/kneron_globalconstant
url = git@59.125.118.185:jenna/kneron_globalconstant.git
[submodule "mmdetection"]
path = mmdetection
url = git@59.125.118.185:eric_wu/mmdetection.git

0
ai_training/README.md Normal file
View File

View File

@ -0,0 +1 @@
d840d94c7201cd6a7596bb8f5dc54d7866cd16c3

View File

@ -0,0 +1,234 @@
<h1 align="center"> Image Classification </h1>
The tutorial explores the basis of image classification task. This document contains the explanations of arguments of each script.
You can find the tutorial for finetuning a pretrained model on custom dataset under the `tutorial` folder, `tutorial/README.md`.
The ipython notebook tutorial is also prepared under the `tutorial` folder as `tutorial/tutorial.ipynb`. You may upload and run this ipython notebook on Google colab.
Image Classification is a fundamental task that attempts to classify the image by assigning it to a specific label. Our AI training platform provides the training script to train a classification model for image classification task.
# Prerequisites
First of all, we have to install the libraries. Python 3.6 or above is required. For other libraries, you can check the `requirements.txt` file. Installing these packages is simple. You can install them by running:
```
pip install -r requirements.txt
```
# Dataset & Preparation
Next, we need a dataset for the training model.
## Custom Datasets
You can train the model on a custom dataset. Your own datasets are expected to have the following structure:
```shell
- Dataset name
-- train
--- Class1
--- Class2
-- val
--- Class1
--- Class2
```
## Example
Let's go through a toy example for preparing a custom dataset. Suppose we are going to classify bees and ants.
<div align="center">
<img src="./image_data/train/ants/0013035.jpg" width="33%" /> <img src="./image_data/train/bees/1092977343_cb42b38d62.jpg" width="33%" />
</div>
First of all, we have to split the images for bees and ants into train and validation set respectively (recommend 8:2). Then, we can move the images into difference folders with their class names. The dataset folder will have the following structure.
```shell
- image data
-- train
--- ants
--- bees
-- val
--- ants
--- bees
```
Now, we have finished preparing the dataset.
# Train
Let's look at how to train or finetune a model. There are several backbone models and arguments to choose. You can find the FPS results of these backbone models evaluated on 520 and 720 in the next section.
For training on a custom dataset, run:
```shell
python train.py --gpu -1 --backbone backbone_name --model-def-path path_to_model_definition_folder --snapshot path_to_pretrained_model_weights path_to_dataset_folder
```
`--gpu` which gpu to run. (-1 if cpu)
`--workers` the number of dataloader workers. (Default: 1)
`--backbone` which backbone model to use. Options: see Models(#Models).
`--freeze-backbone` whether freeze the backbone when the pretrained model is used. (Default: 0)
`--early-stop` whether early stopping when validation accuracy increases. (Default: 1)
`--patience` patience for early stopping. (Default: 7)
`--model-name` name of your model.
`--lr` learning rate. (Default: 1e-3)
`--model-def-path` path to pretrained model definition folder. (Default: './models/')
`--snapshot` path to the pretrained model. (Default: None)
`--epochs` number of epochs to train. (Default: 100)
`--batch-size` size of the batches. (Default: 64)
`--snapshot-path` path to store snapshots of models during training. (Default: 'snapshots/{}'.format(today))
`--optimizer` optimizer for training. Options: SGD, ASGD, ADAM. (Default: SGD)
`--loss` loss function. Options: cross_entropy. (Default: cross_entropy)
# Converting to ONNX
You may check the [Toolchain manual](http://doc.kneron.com/docs/#toolchain/manual/) for converting PyTorch model to ONNX model. Let's go through an example for converting FP_classifier PyTorch model to ONNX model.
Execute commands in the folder `classification`:
```shell
python pytorch2onnx.py --backbone backbone_name --num_classes the_number_of_classes --snapshot pytorch_model_path --save-path onnx_model_path
```
`--save-path` path to save the onnx model.
`--backbone` which backbone model to use. Options: see Models(#Models).
`--num_classes` the number of classes.
`--model-def-path` path to pretrained model definition
`--snapshot` path to the pretrained model.
We could get pytorch to onnx model.
Then, execute commands in the folder `ONNX_Convertor/optimizer_scripts`:
(reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts)
```shell
python pytorch2onnx.py onnx_model_path onnx_model_convert_path
```
We could get converted onnx model.
# Inference
In this section, we will go through using a trained network for inference. That is, we will use the function `inference.py` that takes an image and predict the class label for the image. `inference.py` returns the top $K$ most likely classes along with the probabilities.
For inference on a image, run:
```shell
python train.py --gpu -1 --backbone backbone_name --model-def-path path_to_model_definition_folder --snapshot path_to_pretrained_model_weights path_to_dataset_folder
```
`--gpu` which gpu to run. (-1 if cpu)
`--backbone` which backbone model to use. Options: see Models(#Models).
`--model-def-path` path to pretrained model definition folder. (Default: './models/')
`--snapshot` path to the pretrained model. (Default: None)
`--img-path` Path to the image.
`--class_id_path` path to the class id mapping file. (Default: './eval_utils/class_id.json')
`--save-path` path to save the classification result. (Default: 'inference_result.json')
`--onnx` whether inference onnx model
You could find preprocessing and postprocessing processes in `inference.py`.
# Evaluation
## Evaluation Metric
We will consider `top-K score`, `precision`, `recall` and `F1 score` for evaluating our model. You can find the script for computing these metrics in `eval_utils/eval.py`.
`top-K score`: This metric computes the number of times where the correct label is among the top k labels predicted (ranked by predicted scores). Note that the multilabel case isnt covered here.
`precision`: The precision is the ratio `tp / (tp + fp)` where `tp` is the number of true positives and `fp` the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0.
`recall`: The recall is the ratio `tp / (tp + fn)` where `tp` is the number of true positives and `fn` the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples. The best value is 1 and the worst value is 0.
`F1 score`: The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:
`F1 = 2 * (precision * recall) / (precision + recall)`.
## Evaluation on a dataset
In this section, we will go through evaluating a trained network on a dataset. Here, we are going to evaluate a pretrained model on the validation set of the custom dataset. The `./eval_utils/eval.py` will report the top-K score, precision, recall and F1 score for the model evaluated on a testing dataset. The evaluation statistics will be saved to `eval_results.txt`.
```shell
python eval_utils/eval.py --gpu -1 --backbone backbone_name --snapshot path_to_pretrained_model_weights --model-def-path path_to_model_definition_folder --data-dir path_to_dataset_folder
```
`--gpu` which gpu to run. (-1 if cpu)
`--backbone` which backbone model to use. Options: see Models(#Models).
`--model-def-path` path to pretrained model definition folder. (Default: './models/')
`--snapshot` path to the pretrained model weight. (Default: None)
`--data-dir` path to dataset folder. (Default: None)
## End-to-End Evaluation
For end-to-end testing, we expect that the prediction results are saved into json files, one json file for one image, with the following format:
```bash
{"img_path": image_path,
"0_0":[[score, label], [score, label], ...]
}
```
The prediction json files for all images are expected to saved under the same folder. The ground truth json file is expected to have the following format:
```bash
{image1_path: label,
image2_path: label,
...
}
```
To compute the evaluation statistics, execute commands in the folder `classification`:
```shell
python eval_utils/eval.py --preds path_to_predicted_results --gts path_to_ground_truth
```
`--preds` path to predicted results. (e2e eval)
`--gts` path to ground truth. (e2e eval)
The evaluation statistics will be saved to `eval_results.txt`.
# Models
Model | Input Size | FPS on 520 | FPS on 720 | Model Size
--- | :---: |:---:|:---:|:---:
[FP_classifier](https://github.com/kneron/Model_Zoo/tree/main/classification/FP_classifier)| 56x32 | 323.471 | 3370.47 | 5.1M
[mobilenetv2](https://github.com/kneron/Model_Zoo/tree/main/classification/MobileNetV2)| 224x224 | 58.9418 | 620.677 | 14M
[resnet18](https://github.com/kneron/Model_Zoo/tree/main/classification/ResNet18)| 224x224 | 20.4376 | 141.371 | 46.9M
[resnet50](https://github.com/kneron/Model_Zoo/tree/main/classification/ResNet50)| 224x224 | 6.32576 | 49.0828 | 102.9M
efficientnet-b0| 224x224 | 42.3118 | 157.482 | 18.6M
efficientnet-b1| 224x224 | 28.0051 | 110.907 | 26.7M
efficientnet-b2| 224x224 | 24.164 | 101.598 | 31.1M
efficientnet-b3| 224x224 | 18.4925 | 71.9006 | 41.4M
efficientnet-b4| 224x224 | 12.1506 | 52.3374 | 64.7M
efficientnet-b5| 224x224 | 7.7483 | 35.4869 | 100.7M
efficientnet-b6| 224x224 | 4.96453 | 26.5797 | 141.9M
efficientnet-b7| 224x224 | 3.35853 | 17.9795 | 217.4M
Note that for EfficientNet, Squeeze-and-Excitation layers are removed and Swish function is replaced by ReLU.
FP_classifier is a pretrained model for classifying person and background images. The class id label mapping file is saved as `./eval_utils/person_class_id.json`.
\ | FP_classifier | mobilenetv2 | resnet18 | resnet50
--- | :---: | :---: | :---: | :---:
Rank 1 | 94.13% | 69.82% | 66.46% | 72.80%
Rank 5 | - | 89.29% | 87.09% | 90.91%
Resnet50 is currently under training for Kneron preprocessing.

View File

@ -0,0 +1,56 @@
import numpy as np
import torch
import os
class EarlyStopping:
"""Early stops the training if validation loss doesn't improve after a given patience."""
def __init__(self, model_name = 'model_ft', patience=7, verbose=False, delta=0, path='./snapshots/'):
"""
Args:
patience (int): How long to wait after last time validation loss improved.
Default: 7
verbose (bool): If True, prints a message for each validation loss improvement.
Default: False
delta (float): Minimum change in the monitored quantity to qualify as an improvement.
Default: 0
path (str): Path for the checkpoint to be saved to.
Default: 'checkpoint.pt'
"""
self.model_name = model_name
self.patience = patience
self.verbose = verbose
self.counter = 0
self.best_score = None
self.early_stop = False
self.val_loss_min = np.Inf
self.delta = delta
self.path = path
def __call__(self, val_loss, model, epoch_label):
score = -val_loss
if self.best_score is None:
self.best_score = score
self.save_checkpoint(val_loss, model, epoch_label)
elif score < self.best_score + self.delta:
self.counter += 1
print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
if self.counter >= self.patience:
self.early_stop = True
else:
self.best_score = score
self.save_checkpoint(val_loss, model, epoch_label)
self.counter = 0
def save_checkpoint(self, val_loss, model, epoch_label):
'''Saves model when validation loss decrease.'''
if self.verbose:
print(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}). Saving model ...')
save_filename = self.model_name + '_%s.pth'% epoch_label
save_path = os.path.join(self.path,save_filename)
if not os.path.isdir(self.path):
os.makedirs(self.path)
torch.save(model.state_dict(), save_path)
self.val_loss_min = val_loss

View File

@ -0,0 +1,6 @@
top 1 accuracy: 1.0
Label Precision Recall F1 score
0 1.000 1.000 1.000
1 1.000 1.000 1.000
2 1.000 1.000 1.000

View File

@ -0,0 +1 @@
{"0": "ants", "1": "bees"}

View File

@ -0,0 +1,245 @@
import argparse
import os
import sys
import json
sys.path.append(os.getcwd())
import numpy as np
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, models, transforms
from load_model import initialize_model
from sklearn.metrics import f1_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
def accuracy(output, target, topk=(1,), e2e=False) :
"""
Computes the accuracy over the k top predictions for the specified values of k
In top-5 accuracy you give yourself credit for having the right answer
if the right answer appears in your top five guesses.
ref:
- https://pytorch.org/docs/stable/generated/torch.topk.html
- https://discuss.pytorch.org/t/imagenet-example-accuracy-calculation/7840
- https://gist.github.com/weiaicunzai/2a5ae6eac6712c70bde0630f3e76b77b
- https://discuss.pytorch.org/t/top-k-error-calculation/48815/2
- https://stackoverflow.com/questions/59474987/how-to-get-top-k-accuracy-in-semantic-segmentation-using-pytorch
:param output: output is the prediction of the model e.g. scores, logits, raw y_pred before normalization or getting classes
:param target: target is the truth
:param topk: tuple of topk's to compute e.g. (1, 2, 5) computes top 1, top 2 and top 5.
e.g. in top 2 it means you get a +1 if your models's top 2 predictions are in the right label.
So if your model predicts cat, dog (0, 1) and the true label was bird (3) you get zero
but if it were either cat or dog you'd accumulate +1 for that example.
:return: list of topk accuracy [top1st, top2nd, ...] depending on your topk input
"""
with torch.no_grad():
# ---- get the topk most likely labels according to your model
# get the largest k \in [n_classes] (i.e. the number of most likely probabilities we will use)
maxk = max(topk) # max number labels we will consider in the right choices for out model
batch_size = target.size(0)
# get top maxk indicies that correspond to the most likely probability scores
# (note _ means we don't care about the actual top maxk scores just their corresponding indicies/labels)
if e2e:
y_pred = output
else:
_, y_pred = output.topk(k=maxk, dim=1) # _, [B, n_classes] -> [B, maxk]
y_pred = y_pred.t() # [B, maxk] -> [maxk, B] Expects input to be <= 2-D tensor and transposes dimensions 0 and 1.
# - get the credit for each example if the models predictions is in maxk values (main crux of code)
# for any example, the model will get credit if it's prediction matches the ground truth
# for each example we compare if the model's best prediction matches the truth. If yes we get an entry of 1.
# if the k'th top answer of the model matches the truth we get 1.
# Note: this for any example in batch we can only ever get 1 match (so we never overestimate accuracy <1)
target_reshaped = target.view(1, -1).expand_as(y_pred) # [B] -> [B, 1] -> [maxk, B]
# compare every topk's model prediction with the ground truth & give credit if any matches the ground truth
correct = (y_pred == target_reshaped) # [maxk, B] were for each example we know which topk prediction matched truth
# original: correct = pred.eq(target.view(1, -1).expand_as(pred))
# -- get topk accuracy
list_topk_accs = [] # idx is topk1, topk2, ... etc
for k in topk:
# get tensor of which topk answer was right
ind_which_topk_matched_truth = correct[:k] # [maxk, B] -> [k, B]
# flatten it to help compute if we got it correct for each example in batch
flattened_indicator_which_topk_matched_truth = ind_which_topk_matched_truth.reshape(-1).float() # [k, B] -> [kB]
# get if we got it right for any of our top k prediction for each example in batch
tot_correct_topk = flattened_indicator_which_topk_matched_truth.float().sum(dim=0, keepdim=True) # [kB] -> [1]
# compute topk accuracy - the accuracy of the mode's ability to get it right within it's top k guesses/preds
topk_acc = tot_correct_topk / batch_size # topk accuracy for entire batch
list_topk_accs.append(topk_acc.cpu().numpy()[0])
return np.array(list_topk_accs) # array of topk accuracies for entire batch [topk1, topk2, ... etc]
def evaluate(data_dir, backbone, model_def_path, pretrained_path, device, topk=(1,)):
num_classes = len([f for f in os.listdir(data_dir) if not f.startswith('.')])
if max(topk) > num_classes:
topk = np.array(topk)
topk = topk[topk<=num_classes].tolist()
model_structure, input_size = initialize_model(backbone, num_classes, False, model_def_path)
model_structure.load_state_dict(torch.load(pretrained_path))
model = model_structure.eval()
model = model.to(device)
data_transforms = transforms.Compose([
transforms.Resize(input_size),
transforms.ToTensor(),
transforms.Normalize([0, 0, 0], [1/255.0, 1/255.0, 1/255.0]),
transforms.Normalize([0.5*256, 0.5*256, 0.5*256], [256.0, 256.0, 256.0])
#transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
image_datasets = datasets.ImageFolder(data_dir, data_transforms)
batch_size = 32
dataloaders = torch.utils.data.DataLoader(image_datasets, shuffle=False, batch_size=batch_size, num_workers=4)
list_topk_accs = np.zeros(len(topk))
y_preds = []
y_labels = []
for inputs, labels in dataloaders:
with torch.no_grad():
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
list_topk_accs += accuracy(outputs, labels, topk)* len(labels)
_, y_pred = outputs.topk(k=1)
y_preds += y_pred.cpu().numpy().tolist()
y_labels += labels.cpu().numpy().tolist()
print()
list_topk_accs = list_topk_accs/len(image_datasets)
with open('eval_results.txt', 'w') as writefile:
for i, k in enumerate(topk):
if k is None:
break
acc_str = 'top '+ str(k) + ' accuracy: ' + str(list_topk_accs[i])
print(acc_str)
writefile.write(acc_str)
print()
writefile.write('\n')
class_id = image_datasets.class_to_idx
class_id = dict([(value, key) for key, value in class_id.items()])
f1 = f1_score(y_labels, y_preds, average=None)
recall = recall_score(y_labels, y_preds, average=None)
precision = precision_score(y_labels, y_preds, average=None)
header = 'Label Precision Recall F1 score'
itn_line = '{:10} {:8.3f} {:8.3f} {:8.3f}'
writefile.write(header)
print(header )
for i, score in enumerate(f1):
res_str = itn_line.format(class_id[i], precision[i], recall[i], score)
print( res_str )
writefile.write(res_str)
return list_topk_accs, f1, recall,precision
def evaluate_e2e(gt_path, classification_path, topk=[1,5,10]):
preds = {}
for file in os.listdir(classification_path):
if file.split('.')[-1] != 'json':
continue
full_filename = os.path.join(classification_path, file)
with open(full_filename,'r') as fi:
dic = json.load(fi)
preds[dic['img_path'] ] = dic["0_0"] # {img_id: [[score1,label1], [score2,label2]]}
preds[dic['img_path'] ].sort(reverse=True)
with open(gt_path, 'r') as json_file2:
gts = json.load(json_file2) # {img_id: label}
pred_scores = []
pred_labels = []
pred_labels_ = []
y_true = []
for img_name in preds:
res = preds[img_name]
res0 = list(zip(*res))
pred_scores.append(list(res0[0]))
pred_labels.append(res0[1][0])
pred_labels_.append(res0[1])
y_true.append(gts[img_name])
nc = len(set(y_true))
if max(topk) > nc:
topk = np.array(topk)
topk = topk[topk<=nc].tolist()
list_topk_accs = accuracy(torch.FloatTensor(pred_labels_), torch.FloatTensor(y_true), topk=topk,e2e=True)
print()
with open('eval_results.txt', 'w') as writefile:
for i, k in enumerate(topk):
if k is None:
break
acc_str = 'top '+ str(k) + ' accuracy: ' + str(list_topk_accs[i])
print(acc_str)
writefile.write(acc_str+'\n')
print()
writefile.write('\n')
f1 = f1_score(y_true, pred_labels, average=None)
recall = recall_score(y_true, pred_labels, average=None)
precision = precision_score(y_true, pred_labels, average=None)
header = 'Label Precision Recall F1 score'
itn_line = '{:10} {:8.3f} {:8.3f} {:8.3f}'
writefile.write(header+'\n')
print(header )
for i, score in enumerate(f1):
res_str = itn_line.format(str(i), precision[i], recall[i], score)
print( res_str )
writefile.write(res_str+'\n')
return list_topk_accs, f1, recall,precision
def check_args(parsed_args):
""" Function to check for inherent contradictions within parsed arguments.
Args
parsed_args: parser.parse_args()
Returns
parsed_args
"""
if parsed_args.gpu >= 0 and torch.cuda.is_available() == False:
raise ValueError("No gpu is available")
return parsed_args
def parse_args(args):
"""
Parse the arguments.
"""
parser = argparse.ArgumentParser(description='Simple training script for training a image classification network.')
parser.add_argument('--data-dir', type=str, help='Path to the image directory')
parser.add_argument('--model-def-path', type=str, help='Path to pretrained model definition', default=None )
parser.add_argument('--backbone', help='Backbone model.', default='resnet18', type=str)
parser.add_argument('--snapshot', help='Path to the pretrained models.', default=None)
parser.add_argument('--gpu', help='Id of the GPU to use (as reported by nvidia-smi). (-1 for cpu)',type=int,default=-1)
parser.add_argument('--preds', help='path to predicted results',type=str,default=None)
parser.add_argument('--gts', help='path to ground truth',type=str,default=None)
print(vars(parser.parse_args(args)))
return check_args(parser.parse_args(args))
def main(args=None):
# parse arguments
if args is None:
args = sys.argv[1:]
args = parse_args(args)
device = "cuda:"+str(args.gpu) if args.gpu >= 0 else "cpu"
if args.preds is not None:
list_topk_accs, f1, recall,precision = evaluate_e2e(args.gts, args.preds)
else:
list_topk_accs, f1, recall,precision = evaluate(args.data_dir, args.backbone, args.model_def_path, args.snapshot, device, [1,5,10])
if __name__ == '__main__':
main()

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"0": "background", "1": "person"}

View File

@ -0,0 +1,160 @@
import os
import numpy as np
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, models, transforms
from load_model import initialize_model
from PIL import Image
import json
import argparse
import os
import sys
from datetime import date
import onnxruntime
def preprocess(image_path, input_size):
data_transforms = transforms.Compose([
transforms.Resize(input_size),
transforms.ToTensor(),
transforms.Normalize([0, 0, 0], [1/255.0, 1/255.0, 1/255.0]),
transforms.Normalize([0.5*256, 0.5*256, 0.5*256], [256.0, 256.0, 256.0])
])
with torch.no_grad():
img_data_pytorch = data_transforms(Image.open(image_path))
img_data_pytorch = img_data_pytorch.unsqueeze(0)
return img_data_pytorch.numpy()
def postprocess(pre_output):
score = softmax(pre_output)
labels = list(range(len(pre_output)))
score_labels = list(zip(score, labels))
score_labels.sort(reverse=True)
score, labels = list(zip(*score_labels))
return score, labels
def onnx_runner(image_path, model_path, class_id):
sess = onnxruntime.InferenceSession(model_path)
onnx_img_size_h = sess.get_inputs()[0].shape[2]
onnx_img_size_w = sess.get_inputs()[0].shape[3]
input_name = sess.get_inputs()[0].name
input_size = (onnx_img_size_h, onnx_img_size_w)
np_images = preprocess(image_path, input_size)
np_images = np_images.astype(np.float32)
pred_onnx = sess.run(None, {input_name: np_images })[0][0]
score, labels = postprocess(pred_onnx)
header = 'Label Probability'
itn_line = '{:10} {:8.3f} '
print(header)
for i in range(len(score)):
#print(itn_line.format( class_id[str(labels[i])], score[i]) )
print(itn_line.format( str(labels[i]), score[i]) )
return score, labels
def softmax(A):
e = np.exp(A)
return e / np.sum(e, axis=1, keepdims=True)
def inference(backbone, image_path, class_id, device, model_def_path, pretrained_path, topk = None):
num_classes = len(class_id)
model_structure, input_size = initialize_model(backbone, num_classes, False, model_def_path)
model_structure.load_state_dict(torch.load(pretrained_path))
model = model_structure.eval()
model = model.to(device)
data_transforms = transforms.Compose([
transforms.Resize(input_size),
transforms.ToTensor(),
transforms.Normalize([0, 0, 0], [1/255.0, 1/255.0, 1/255.0]),
transforms.Normalize([0.5*256, 0.5*256, 0.5*256], [256.0, 256.0, 256.0])
])
img_data_pytorch = data_transforms(Image.open(image_path))
img_data_pytorch = img_data_pytorch.to(device)
with torch.no_grad():
if topk == None or topk > num_classes:
topk = num_classes
outputs = model(img_data_pytorch[None, ...]).topk(topk)
scores = outputs[0].cpu().numpy()[0]
probs = softmax(scores)
preds = outputs[1].cpu().numpy()[0]
header = 'Label Probability'
itn_line = '{:10} {:8.3f} '
print(header)
for i in range(len(preds)):
print(itn_line.format( class_id[str(preds[i])], probs[i]) )
return probs, preds
def softmax(A):
e = np.exp(A)
return e / np.sum(e, keepdims=True)
def parse_args(args):
"""
Parse the arguments.
"""
today = str(date.today())
parser = argparse.ArgumentParser(description='Simple training script for training a image classification network.')
parser.add_argument('--img-path', type=str, help='Path to the image.')
parser.add_argument('--backbone', help='Backbone model.', default='resnet18', type=str)
parser.add_argument('--class_id_path', help='Path to the class id mapping file.', default='./eval_utils/class_id.json')
parser.add_argument('--gpu', help='Id of the GPU to use (as reported by nvidia-smi). (-1 for cpu)',type=int,default=-1)
parser.add_argument('--model-def-path', type=str, help='Path to pretrained model definition', default=None )
parser.add_argument('--snapshot', help='Path to the pretrained models.')
parser.add_argument('--save-path', help='Path to the classification result.', default='inference_result.json')
parser.add_argument('--onnx', help='inference onnx model',action='store_true')
print(vars(parser.parse_args(args)))
return check_args(parser.parse_args(args))
def check_args(parsed_args):
""" Function to check for inherent contradictions within parsed arguments.
Args
parsed_args: parser.parse_args()
Returns
parsed_args
"""
if parsed_args.gpu >= 0 and torch.cuda.is_available() == False:
raise ValueError("No gpu is available")
return parsed_args
def main(args=None):
# parse arguments
if args is None:
args = sys.argv[1:]
args = parse_args(args)
device = "cuda:"+str(args.gpu) if args.gpu >= 0 else "cpu"
with open(args.class_id_path,'r') as fp:
class_id = json.load(fp)
# Inference
if args.onnx:
probs, preds = onnx_runner(args.img_path, args.snapshot, class_id)
else:
probs, preds = inference(args.backbone, args.img_path, class_id, device, args.model_def_path, args.snapshot)
res = {}
res['img_path'] = os.path.abspath(args.img_path)
res['0_0'] = []
for i in range(len(probs)):
res['0_0'].append([ float(probs[i]), int(preds[i]) ])
with open(args.save_path, 'w') as fp:
json.dump(res, fp)
if __name__ == '__main__':
main()

View File

@ -0,0 +1 @@
{"img_path": "/home/ziyan/ai_training/classification/tutorial/image_data/val/bees/10870992_eebeeb3a12.jpg", "0_0": [[0.8352504968643188, 1], [0.16474944353103638, 0]]}

View File

@ -0,0 +1,41 @@
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, models, transforms
import json
def load_data(data_dir, batch_size, input_size, worker):
transform_train_list = [
transforms.RandomResizedCrop(input_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0, 0, 0], [1/255.0, 1/255.0, 1/255.0]),
transforms.Normalize([0.5*256, 0.5*256, 0.5*256], [256.0, 256.0, 256.0])
]
transform_val_list = [
transforms.Resize(input_size),
transforms.ToTensor(),
transforms.Normalize([0, 0, 0], [1/255.0, 1/255.0, 1/255.0]),
transforms.Normalize([0.5*256, 0.5*256, 0.5*256], [256.0, 256.0, 256.0])
]
data_transforms = {
'train': transforms.Compose(transform_train_list),
'val': transforms.Compose(transform_val_list)
}
print("Initializing Datasets and Dataloaders...")
# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=worker, pin_memory=True) for x in ['train','val']}
print('-------------Label mapping to Idx:--------------')
class_id = image_datasets['train'].class_to_idx
class_id = dict([(value, key) for key, value in class_id.items()])
print(class_id)
print('------------------------------------------------')
with open("./eval_utils/class_id.json", "w") as outfile:
json.dump(class_id, outfile)
return dataloaders

View File

@ -0,0 +1,10 @@
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
def load_lr_scheduler(optimizer_ft, mode = 'max', patience=5):
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer_ft, mode = mode, patience=patience)
return scheduler

View File

@ -0,0 +1,139 @@
import sys
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, models, transforms
def set_parameter_requires_grad(model, feature_extracting):
if feature_extracting:
for param in model.parameters():
param.requires_grad = False
def intersect_dicts(da, db, exclude=()):
# Dictionary intersection of matching keys and shapes, omitting 'exclude' keys, using da values
return {k: v for k, v in da.items() if k in db and not any(x in k for x in exclude) and v.shape == db[k].shape}
def initialize_weights(model_ft, pretrained=''):
state_dict = torch.load(pretrained) # load checkpoint
state_dict = intersect_dicts(state_dict, model_ft.state_dict()) # intersect
model_ft.load_state_dict(state_dict, strict=False) # load
print('Transferred %g/%g items from %s' % (len(state_dict), len(model_ft.state_dict()), pretrained)) # report
def initialize_model(model_name, num_classes, feature_extract, model_def_path=None, use_pretrained=None):
# Initialize these variables which will be set in this if statement. Each of these
# variables is model specific.
model_ft = None
input_size = 0
current_path=os.getcwd()
if model_name == 'FP_classifier':
if num_classes != 2:
print("Number of classes should be two, exiting...")
exit()
if model_def_path == None:
model_def_path = './models/FP_classifier/'
sys.path.append(model_def_path)
from Mobilenet_v2_small import mobile_net_v2
if use_pretrained:
model_ft = mobile_net_v2(num_classes)
model_ft.load_state_dict(torch.load(use_pretrained))
set_parameter_requires_grad(model_ft, feature_extract)
if feature_extract:
for param in model_ft.model.classifier[1].parameters():
param.requires_grad = True
else:
model_ft = mobile_net_v2(num_classes)
input_size = (56,32)
elif model_name == 'mobilenetv2':
""" Mobilenetv2
"""
if model_def_path == None:
model_def_path = './models/MobileNetV2/'
sys.path.append(model_def_path)
from Mobilenet_v2 import mobilenet_v2
if use_pretrained is not None and len(use_pretrained)>0:
model_ft = mobilenet_v2(num_classes)
initialize_weights(model_ft, use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
if feature_extract:
for param in model_ft.model.classifier[1].parameters():
param.requires_grad = True
else:
model_ft = mobilenet_v2(num_classes)
input_size = (224,224)
elif model_name == 'resnet18':
""" ResNet18
"""
if model_def_path == None:
model_def_path = './models/ResNet18/'
sys.path.append(model_def_path)
from ResNet18 import resnet18
if use_pretrained is not None and len(use_pretrained)>0:
model_ft = resnet18(num_classes)
initialize_weights(model_ft, use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
if feature_extract:
for param in model_ft.model.fc.parameters():
param.requires_grad = True
else:
model_ft = resnet18(num_classes)
input_size = (224,224)
elif model_name == 'resnet50':
""" ResNet50
"""
if model_def_path == None:
model_def_path = './models/ResNet50/'
sys.path.append(model_def_path)
from ResNet50 import resnet50
if use_pretrained is not None and len(use_pretrained)>0:
model_ft = resnet50(num_classes)
initialize_weights(model_ft, use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
if feature_extract:
for param in model_ft.model.fc.parameters():
param.requires_grad = True
else:
model_ft = resnet50(num_classes)
input_size = (224,224)
elif model_name in [ 'efficientnet-b0', 'efficientnet-b1', 'efficientnet-b2', 'efficientnet-b3', 'efficientnet-b4', 'efficientnet-b5', 'efficientnet-b6', 'efficientnet-b7']:
""" EfficientNet
"""
if model_def_path == None:
model_def_path = './models/EfficientNet/'
sys.path.append(sys.path.append(model_def_path))
from EfficientNet_520 import EfficientNet
if use_pretrained is not None and len(use_pretrained)>0:
model_ft = EfficientNet.from_name(model_name)
model_ft.set_swish(memory_efficient=False)
model_ft.load_state_dict(torch.load(use_pretrained) )
set_parameter_requires_grad(model_ft, feature_extract)
if imagenet != 0:
num_ftrs = model_ft._fc.in_features
model_ft._fc = nn.Linear(num_ftrs, num_classes, bias=True)
else:
model_ft = EfficientNet.from_name(model_name,num_classes=num_classes)
input_size = (224,224)
else:
print("Invalid model name, exiting...")
exit()
return model_ft, input_size
if __name__ == '__main__':
model_ft, input_size = initialize_model('resnet18', 1000, False, model_def_path=None, use_pretrained='ResNet18.pth')
print(model_ft)
#from save_model import save_model
#save_model(model_ft, 'mobilenetv2', 'exp/', 0, 'cpu')

View File

@ -0,0 +1,33 @@
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
def load_optimizer(model_ft, lr=0.001, momentum=0.9, freeze_backbone = True, op_type='SGD'):
params_to_update = model_ft.parameters()
print("Params to learn:")
if freeze_backbone:
params_to_update = []
for name,param in model_ft.named_parameters():
if param.requires_grad == True:
params_to_update.append(param)
print("\t",name)
else:
for name,param in model_ft.named_parameters():
if param.requires_grad == True:
print("\t",name)
if op_type == 'SGD':
optimizer_ft = optim.SGD(params_to_update, lr=lr, momentum=momentum)
elif op_type == 'ASGD':
optimizer_ft = optim.ASGD(params_to_update, lr=lr)
elif op_type == 'ADAM':
optim.Adam(params_to_update, lr=lr)
else:
print("Invalid optimizer name, exiting...")
exit()
return optimizer_ft

View File

@ -0,0 +1,10 @@
import torch
import torch.nn as nn
def load_loss_functions(loss_func = 'cross_entropy'):
if loss_func == 'cross_entropy':
criterion = nn.CrossEntropyLoss()
else:
print("Invalid loss function name, exiting...")
exit()
return criterion

View File

@ -0,0 +1,966 @@
import re
import math
import collections
from functools import partial
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils import model_zoo
# Parameters for the entire model (stem, all blocks, and head)
GlobalParams = collections.namedtuple('GlobalParams', [
'width_coefficient', 'depth_coefficient', 'image_size', 'dropout_rate',
'num_classes', 'batch_norm_momentum', 'batch_norm_epsilon',
'drop_connect_rate', 'depth_divisor', 'min_depth', 'include_top'])
# Parameters for an individual model block
BlockArgs = collections.namedtuple('BlockArgs', [
'num_repeat', 'kernel_size', 'stride', 'expand_ratio',
'input_filters', 'output_filters', 'se_ratio', 'id_skip'])
# Set GlobalParams and BlockArgs's defaults
GlobalParams.__new__.__defaults__ = (None,) * len(GlobalParams._fields)
BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields)
# An ordinary implementation of Swish function
class Swish(nn.Module):
def forward(self, x):
self.relu = nn.ReLU(inplace=False)
#return x * torch.sigmoid(x)
return self.relu(x)
# A memory-efficient implementation of Swish function
class SwishImplementation(torch.autograd.Function):
@staticmethod
def forward(ctx, i):
result = i * torch.sigmoid(i)
ctx.save_for_backward(i)
return result
@staticmethod
def backward(ctx, grad_output):
i = ctx.saved_tensors[0]
sigmoid_i = torch.sigmoid(i)
return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
class MemoryEfficientSwish(nn.Module):
def forward(self, x):
return SwishImplementation.apply(x)
def round_filters(filters, global_params):
"""Calculate and round number of filters based on width multiplier.
Use width_coefficient, depth_divisor and min_depth of global_params.
Args:
filters (int): Filters number to be calculated.
global_params (namedtuple): Global params of the model.
Returns:
new_filters: New filters number after calculating.
"""
multiplier = global_params.width_coefficient
if not multiplier:
return filters
# TODO: modify the params names.
# maybe the names (width_divisor,min_width)
# are more suitable than (depth_divisor,min_depth).
divisor = global_params.depth_divisor
min_depth = global_params.min_depth
filters *= multiplier
min_depth = min_depth or divisor # pay attention to this line when using min_depth
# follow the formula transferred from official TensorFlow implementation
new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor)
if new_filters < 0.9 * filters: # prevent rounding by more than 10%
new_filters += divisor
return int(new_filters)
def round_repeats(repeats, global_params):
"""Calculate module's repeat number of a block based on depth multiplier.
Use depth_coefficient of global_params.
Args:
repeats (int): num_repeat to be calculated.
global_params (namedtuple): Global params of the model.
Returns:
new repeat: New repeat number after calculating.
"""
multiplier = global_params.depth_coefficient
if not multiplier:
return repeats
# follow the formula transferred from official TensorFlow implementation
return int(math.ceil(multiplier * repeats))
def drop_connect(inputs, p, training):
"""Drop connect.
Args:
input (tensor: BCWH): Input of this structure.
p (float: 0.0~1.0): Probability of drop connection.
training (bool): The running mode.
Returns:
output: Output after drop connection.
"""
assert 0 <= p <= 1, 'p must be in range of [0,1]'
if not training:
return inputs
batch_size = inputs.shape[0]
keep_prob = 1 - p
# generate binary_tensor mask according to probability (p for 0, 1-p for 1)
random_tensor = keep_prob
random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype, device=inputs.device)
binary_tensor = torch.floor(random_tensor)
output = inputs / keep_prob * binary_tensor
return output
def get_width_and_height_from_size(x):
"""Obtain height and width from x.
Args:
x (int, tuple or list): Data size.
Returns:
size: A tuple or list (H,W).
"""
if isinstance(x, int):
return x, x
if isinstance(x, list) or isinstance(x, tuple):
return x
else:
raise TypeError()
def calculate_output_image_size(input_image_size, stride):
"""Calculates the output image size when using Conv2dSamePadding with a stride.
Necessary for static padding. Thanks to mannatsingh for pointing this out.
Args:
input_image_size (int, tuple or list): Size of input image.
stride (int, tuple or list): Conv2d operation's stride.
Returns:
output_image_size: A list [H,W].
"""
if input_image_size is None:
return None
image_height, image_width = get_width_and_height_from_size(input_image_size)
stride = stride if isinstance(stride, int) else stride[0]
image_height = int(math.ceil(image_height / stride))
image_width = int(math.ceil(image_width / stride))
return [image_height, image_width]
# Note:
# The following 'SamePadding' functions make output size equal ceil(input size/stride).
# Only when stride equals 1, can the output size be the same as input size.
# Don't be confused by their function names ! ! !
def get_same_padding_conv2d(image_size=None):
"""Chooses static padding if you have specified an image size, and dynamic padding otherwise.
Static padding is necessary for ONNX exporting of models.
Args:
image_size (int or tuple): Size of the image.
Returns:
Conv2dDynamicSamePadding or Conv2dStaticSamePadding.
"""
if image_size is None:
return Conv2dDynamicSamePadding
else:
return partial(Conv2dStaticSamePadding, image_size=image_size)
class Conv2dDynamicSamePadding(nn.Conv2d):
"""2D Convolutions like TensorFlow, for a dynamic image size.
The padding is operated in forward function by calculating dynamically.
"""
# Tips for 'SAME' mode padding.
# Given the following:
# i: width or height
# s: stride
# k: kernel size
# d: dilation
# p: padding
# Output after Conv2d:
# o = floor((i+p-((k-1)*d+1))/s+1)
# If o equals i, i = floor((i+p-((k-1)*d+1))/s+1),
# => p = (i-1)*s+((k-1)*d+1)-i
def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True):
super().__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)
self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2
def forward(self, x):
ih, iw = x.size()[-2:]
kh, kw = self.weight.size()[-2:]
sh, sw = self.stride
oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) # change the output size according to stride ! ! !
pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
if pad_h > 0 or pad_w > 0:
x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])
return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
class Conv2dStaticSamePadding(nn.Conv2d):
"""2D Convolutions like TensorFlow's 'SAME' mode, with the given input image size.
The padding mudule is calculated in construction function, then used in forward.
"""
# With the same calculation as Conv2dDynamicSamePadding
def __init__(self, in_channels, out_channels, kernel_size, stride=1, image_size=None, **kwargs):
super().__init__(in_channels, out_channels, kernel_size, stride, **kwargs)
self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2
# Calculate padding based on image size and save it
assert image_size is not None
ih, iw = (image_size, image_size) if isinstance(image_size, int) else image_size
kh, kw = self.weight.size()[-2:]
sh, sw = self.stride
oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
if pad_h > 0 or pad_w > 0:
self.static_padding = nn.ZeroPad2d((pad_w // 2, pad_w - pad_w // 2,
pad_h // 2, pad_h - pad_h // 2))
else:
self.static_padding = nn.Identity()
def forward(self, x):
x = self.static_padding(x)
x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
return x
def get_same_padding_maxPool2d(image_size=None):
"""Chooses static padding if you have specified an image size, and dynamic padding otherwise.
Static padding is necessary for ONNX exporting of models.
Args:
image_size (int or tuple): Size of the image.
Returns:
MaxPool2dDynamicSamePadding or MaxPool2dStaticSamePadding.
"""
if image_size is None:
return MaxPool2dDynamicSamePadding
else:
return partial(MaxPool2dStaticSamePadding, image_size=image_size)
class MaxPool2dDynamicSamePadding(nn.MaxPool2d):
"""2D MaxPooling like TensorFlow's 'SAME' mode, with a dynamic image size.
The padding is operated in forward function by calculating dynamically.
"""
def __init__(self, kernel_size, stride, padding=0, dilation=1, return_indices=False, ceil_mode=False):
super().__init__(kernel_size, stride, padding, dilation, return_indices, ceil_mode)
self.stride = [self.stride] * 2 if isinstance(self.stride, int) else self.stride
self.kernel_size = [self.kernel_size] * 2 if isinstance(self.kernel_size, int) else self.kernel_size
self.dilation = [self.dilation] * 2 if isinstance(self.dilation, int) else self.dilation
def forward(self, x):
ih, iw = x.size()[-2:]
kh, kw = self.kernel_size
sh, sw = self.stride
oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
if pad_h > 0 or pad_w > 0:
x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])
return F.max_pool2d(x, self.kernel_size, self.stride, self.padding,
self.dilation, self.ceil_mode, self.return_indices)
class MaxPool2dStaticSamePadding(nn.MaxPool2d):
"""2D MaxPooling like TensorFlow's 'SAME' mode, with the given input image size.
The padding mudule is calculated in construction function, then used in forward.
"""
def __init__(self, kernel_size, stride, image_size=None, **kwargs):
super().__init__(kernel_size, stride, **kwargs)
self.stride = [self.stride] * 2 if isinstance(self.stride, int) else self.stride
self.kernel_size = [self.kernel_size] * 2 if isinstance(self.kernel_size, int) else self.kernel_size
self.dilation = [self.dilation] * 2 if isinstance(self.dilation, int) else self.dilation
# Calculate padding based on image size and save it
assert image_size is not None
ih, iw = (image_size, image_size) if isinstance(image_size, int) else image_size
kh, kw = self.kernel_size
sh, sw = self.stride
oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
if pad_h > 0 or pad_w > 0:
self.static_padding = nn.ZeroPad2d((pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2))
else:
self.static_padding = nn.Identity()
def forward(self, x):
x = self.static_padding(x)
x = F.max_pool2d(x, self.kernel_size, self.stride, self.padding,
self.dilation, self.ceil_mode, self.return_indices)
return x
class BlockDecoder(object):
"""Block Decoder for readability,
straight from the official TensorFlow repository.
"""
@staticmethod
def _decode_block_string(block_string):
"""Get a block through a string notation of arguments.
Args:
block_string (str): A string notation of arguments.
Examples: 'r1_k3_s11_e1_i32_o16_se0.25_noskip'.
Returns:
BlockArgs: The namedtuple defined at the top of this file.
"""
assert isinstance(block_string, str)
ops = block_string.split('_')
options = {}
for op in ops:
splits = re.split(r'(\d.*)', op)
if len(splits) >= 2:
key, value = splits[:2]
options[key] = value
# Check stride
assert (('s' in options and len(options['s']) == 1) or
(len(options['s']) == 2 and options['s'][0] == options['s'][1]))
return BlockArgs(
num_repeat=int(options['r']),
kernel_size=int(options['k']),
stride=[int(options['s'][0])],
expand_ratio=int(options['e']),
input_filters=int(options['i']),
output_filters=int(options['o']),
se_ratio=float(options['se']) if 'se' in options else None,
id_skip=('noskip' not in block_string))
@staticmethod
def _encode_block_string(block):
"""Encode a block to a string.
Args:
block (namedtuple): A BlockArgs type argument.
Returns:
block_string: A String form of BlockArgs.
"""
args = [
'r%d' % block.num_repeat,
'k%d' % block.kernel_size,
's%d%d' % (block.strides[0], block.strides[1]),
'e%s' % block.expand_ratio,
'i%d' % block.input_filters,
'o%d' % block.output_filters
]
if 0 < block.se_ratio <= 1:
args.append('se%s' % block.se_ratio)
if block.id_skip is False:
args.append('noskip')
return '_'.join(args)
@staticmethod
def decode(string_list):
"""Decode a list of string notations to specify blocks inside the network.
Args:
string_list (list[str]): A list of strings, each string is a notation of block.
Returns:
blocks_args: A list of BlockArgs namedtuples of block args.
"""
assert isinstance(string_list, list)
blocks_args = []
for block_string in string_list:
blocks_args.append(BlockDecoder._decode_block_string(block_string))
return blocks_args
@staticmethod
def encode(blocks_args):
"""Encode a list of BlockArgs to a list of strings.
Args:
blocks_args (list[namedtuples]): A list of BlockArgs namedtuples of block args.
Returns:
block_strings: A list of strings, each string is a notation of block.
"""
block_strings = []
for block in blocks_args:
block_strings.append(BlockDecoder._encode_block_string(block))
return block_strings
def efficientnet_params(model_name):
"""Map EfficientNet model name to parameter coefficients.
Args:
model_name (str): Model name to be queried.
Returns:
params_dict[model_name]: A (width,depth,res,dropout) tuple.
"""
params_dict = {
# Coefficients: width,depth,res,dropout
'efficientnet-b0': (1.0, 1.0, 224, 0.2),
'efficientnet-b1': (1.0, 1.1, 240, 0.2),
'efficientnet-b2': (1.1, 1.2, 260, 0.3),
'efficientnet-b3': (1.2, 1.4, 300, 0.3),
'efficientnet-b4': (1.4, 1.8, 380, 0.4),
'efficientnet-b5': (1.6, 2.2, 456, 0.4),
'efficientnet-b6': (1.8, 2.6, 528, 0.5),
'efficientnet-b7': (2.0, 3.1, 600, 0.5),
'efficientnet-b8': (2.2, 3.6, 672, 0.5),
'efficientnet-l2': (4.3, 5.3, 800, 0.5),
}
return params_dict[model_name]
def efficientnet(width_coefficient=None, depth_coefficient=None, image_size=None,
dropout_rate=0.2, drop_connect_rate=0.2, num_classes=1000, include_top=True):
"""Create BlockArgs and GlobalParams for efficientnet model.
Args:
width_coefficient (float)
depth_coefficient (float)
image_size (int)
dropout_rate (float)
drop_connect_rate (float)
num_classes (int)
Meaning as the name suggests.
Returns:
blocks_args, global_params.
"""
# Blocks args for the whole model(efficientnet-b0 by default)
# It will be modified in the construction of EfficientNet Class according to model
blocks_args = [
'r1_k3_s11_e1_i32_o16_se0.25',
'r2_k3_s22_e6_i16_o24_se0.25',
'r2_k5_s22_e6_i24_o40_se0.25',
'r3_k3_s22_e6_i40_o80_se0.25',
'r3_k5_s11_e6_i80_o112_se0.25',
'r4_k5_s22_e6_i112_o192_se0.25',
'r1_k3_s11_e6_i192_o320_se0.25',
]
blocks_args = BlockDecoder.decode(blocks_args)
global_params = GlobalParams(
width_coefficient=width_coefficient,
depth_coefficient=depth_coefficient,
image_size=image_size,
dropout_rate=dropout_rate,
num_classes=num_classes,
batch_norm_momentum=0.99,
batch_norm_epsilon=1e-3,
drop_connect_rate=drop_connect_rate,
depth_divisor=8,
min_depth=None,
include_top=include_top,
)
return blocks_args, global_params
def get_model_params(model_name, override_params):
"""Get the block args and global params for a given model name.
Args:
model_name (str): Model's name.
override_params (dict): A dict to modify global_params.
Returns:
blocks_args, global_params
"""
if model_name.startswith('efficientnet'):
w, d, s, p = efficientnet_params(model_name)
# note: all models have drop connect rate = 0.2
blocks_args, global_params = efficientnet(
width_coefficient=w, depth_coefficient=d, dropout_rate=p, image_size=s)
else:
raise NotImplementedError('model name is not pre-defined: {}'.format(model_name))
if override_params:
# ValueError will be raised here if override_params has fields not included in global_params.
global_params = global_params._replace(**override_params)
return blocks_args, global_params
# train with Standard methods
# check more details in paper(EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks)
url_map = {
'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b0-355c32eb.pth',
'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b1-f1951068.pth',
'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b2-8bb594d6.pth',
'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b3-5fb5a3c3.pth',
'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b4-6ed6700e.pth',
'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b5-b6417697.pth',
'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b6-c76e70fd.pth',
'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b7-dcc49843.pth',
}
# train with Adversarial Examples(AdvProp)
# check more details in paper(Adversarial Examples Improve Image Recognition)
url_map_advprop = {
'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b0-b64d5a18.pth',
'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b1-0f3ce85a.pth',
'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b2-6e9d97e5.pth',
'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b3-cdd7c0f4.pth',
'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b4-44fb3a87.pth',
'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b5-86493f6b.pth',
'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b6-ac80338e.pth',
'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b7-4652b6dd.pth',
'efficientnet-b8': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b8-22a8fe65.pth',
}
# TODO: add the petrained weights url map of 'efficientnet-l2'
def load_pretrained_weights(model, model_name, weights_path=None, load_fc=True, advprop=False):
"""Loads pretrained weights from weights path or download using url.
Args:
model (Module): The whole model of efficientnet.
model_name (str): Model name of efficientnet.
weights_path (None or str):
str: path to pretrained weights file on the local disk.
None: use pretrained weights downloaded from the Internet.
load_fc (bool): Whether to load pretrained weights for fc layer at the end of the model.
advprop (bool): Whether to load pretrained weights
trained with advprop (valid when weights_path is None).
"""
if isinstance(weights_path, str):
state_dict = torch.load(weights_path)
else:
# AutoAugment or Advprop (different preprocessing)
url_map_ = url_map_advprop if advprop else url_map
state_dict = model_zoo.load_url(url_map_[model_name])
if load_fc:
ret = model.load_state_dict(state_dict, strict=False)
assert not ret.missing_keys, 'Missing keys when loading pretrained weights: {}'.format(ret.missing_keys)
else:
state_dict.pop('_fc.weight')
state_dict.pop('_fc.bias')
ret = model.load_state_dict(state_dict, strict=False)
assert set(ret.missing_keys) == set(
['_fc.weight', '_fc.bias']), 'Missing keys when loading pretrained weights: {}'.format(ret.missing_keys)
assert not ret.unexpected_keys, 'Missing keys when loading pretrained weights: {}'.format(ret.unexpected_keys)
print('Loaded pretrained weights for {}'.format(model_name))
VALID_MODELS = (
'efficientnet-b0', 'efficientnet-b1', 'efficientnet-b2', 'efficientnet-b3',
'efficientnet-b4', 'efficientnet-b5', 'efficientnet-b6', 'efficientnet-b7',
'efficientnet-b8',
# Support the construction of 'efficientnet-l2' without pretrained weights
'efficientnet-l2'
)
class MBConvBlock(nn.Module):
"""Mobile Inverted Residual Bottleneck Block.
Args:
block_args (namedtuple): BlockArgs, defined in utils.py.
global_params (namedtuple): GlobalParam, defined in utils.py.
image_size (tuple or list): [image_height, image_width].
References:
[1] https://arxiv.org/abs/1704.04861 (MobileNet v1)
[2] https://arxiv.org/abs/1801.04381 (MobileNet v2)
[3] https://arxiv.org/abs/1905.02244 (MobileNet v3)
"""
def __init__(self, block_args, global_params, image_size=None):
super().__init__()
self._block_args = block_args
self._bn_mom = 1 - global_params.batch_norm_momentum # pytorch's difference from tensorflow
self._bn_eps = global_params.batch_norm_epsilon
#self.has_se = (self._block_args.se_ratio is not None) and (0 < self._block_args.se_ratio <= 1)
self.has_se = False
self.id_skip = block_args.id_skip # whether to use skip connection and drop connect
# Expansion phase (Inverted Bottleneck)
inp = self._block_args.input_filters # number of input channels
oup = self._block_args.input_filters * self._block_args.expand_ratio # number of output channels
if self._block_args.expand_ratio != 1:
Conv2d = get_same_padding_conv2d(image_size=image_size)
self._expand_conv = Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False)
self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
# image_size = calculate_output_image_size(image_size, 1) <-- this wouldn't modify image_size
# Depthwise convolution phase
k = self._block_args.kernel_size
s = self._block_args.stride
Conv2d = get_same_padding_conv2d(image_size=image_size)
self._depthwise_conv = Conv2d(
in_channels=oup, out_channels=oup, groups=oup, # groups makes it depthwise
kernel_size=k, stride=s, bias=False)
self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
image_size = calculate_output_image_size(image_size, s)
# Squeeze and Excitation layer, if desired
if self.has_se:
Conv2d = get_same_padding_conv2d(image_size=(1, 1))
num_squeezed_channels = max(1, int(self._block_args.input_filters * self._block_args.se_ratio))
self._se_reduce = Conv2d(in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1)
self._se_expand = Conv2d(in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1)
# Pointwise convolution phase
final_oup = self._block_args.output_filters
Conv2d = get_same_padding_conv2d(image_size=image_size)
self._project_conv = Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False)
self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._bn_mom, eps=self._bn_eps)
self._swish = MemoryEfficientSwish()
def forward(self, inputs, drop_connect_rate=None):
"""MBConvBlock's forward function.
Args:
inputs (tensor): Input tensor.
drop_connect_rate (bool): Drop connect rate (float, between 0 and 1).
Returns:
Output of this block after processing.
"""
# Expansion and Depthwise Convolution
x = inputs
if self._block_args.expand_ratio != 1:
x = self._expand_conv(inputs)
x = self._bn0(x)
x = self._swish(x)
x = self._depthwise_conv(x)
x = self._bn1(x)
x = self._swish(x)
# Squeeze and Excitation
if self.has_se:
x_squeezed = F.adaptive_avg_pool2d(x, 1)
x_squeezed = self._se_reduce(x_squeezed)
x_squeezed = self._swish(x_squeezed)
x_squeezed = self._se_expand(x_squeezed)
x = torch.sigmoid(x_squeezed) * x
# Pointwise Convolution
x = self._project_conv(x)
x = self._bn2(x)
# Skip connection and drop connect
input_filters, output_filters = self._block_args.input_filters, self._block_args.output_filters
if self.id_skip and self._block_args.stride == 1 and input_filters == output_filters:
# The combination of skip connection and drop connect brings about stochastic depth.
if drop_connect_rate:
x = drop_connect(x, p=drop_connect_rate, training=self.training)
x = x + inputs # skip connection
return x
def set_swish(self, memory_efficient=True):
"""Sets swish function as memory efficient (for training) or standard (for export).
Args:
memory_efficient (bool): Whether to use memory-efficient version of swish.
"""
self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
class EfficientNet(nn.Module):
"""EfficientNet model.
Most easily loaded with the .from_name or .from_pretrained methods.
Args:
blocks_args (list[namedtuple]): A list of BlockArgs to construct blocks.
global_params (namedtuple): A set of GlobalParams shared between blocks.
References:
[1] https://arxiv.org/abs/1905.11946 (EfficientNet)
Example:
import torch
>>> from efficientnet.model import EfficientNet
>>> inputs = torch.rand(1, 3, 224, 224)
>>> model = EfficientNet.from_pretrained('efficientnet-b0')
>>> model.eval()
>>> outputs = model(inputs)
"""
def __init__(self, blocks_args=None, global_params=None):
super().__init__()
assert isinstance(blocks_args, list), 'blocks_args should be a list'
assert len(blocks_args) > 0, 'block args must be greater than 0'
self._global_params = global_params
self._blocks_args = blocks_args
# Batch norm parameters
bn_mom = 1 - self._global_params.batch_norm_momentum
bn_eps = self._global_params.batch_norm_epsilon
# Get stem static or dynamic convolution depending on image size
image_size = global_params.image_size
Conv2d = get_same_padding_conv2d(image_size=image_size)
# Stem
in_channels = 3 # rgb
out_channels = round_filters(32, self._global_params) # number of output channels
self._conv_stem = Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False)
self._bn0 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)
image_size = calculate_output_image_size(image_size, 2)
# Build blocks
self._blocks = nn.ModuleList([])
for block_args in self._blocks_args:
# Update block input and output filters based on depth multiplier.
block_args = block_args._replace(
input_filters=round_filters(block_args.input_filters, self._global_params),
output_filters=round_filters(block_args.output_filters, self._global_params),
num_repeat=round_repeats(block_args.num_repeat, self._global_params)
)
# The first block needs to take care of stride and filter size increase.
self._blocks.append(MBConvBlock(block_args, self._global_params, image_size=image_size))
image_size = calculate_output_image_size(image_size, block_args.stride)
if block_args.num_repeat > 1: # modify block_args to keep same output size
block_args = block_args._replace(input_filters=block_args.output_filters, stride=1)
for _ in range(block_args.num_repeat - 1):
self._blocks.append(MBConvBlock(block_args, self._global_params, image_size=image_size))
# image_size = calculate_output_image_size(image_size, block_args.stride) # stride = 1
# Head
in_channels = block_args.output_filters # output of final block
out_channels = round_filters(1280, self._global_params)
Conv2d = get_same_padding_conv2d(image_size=image_size)
self._conv_head = Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
self._bn1 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)
# Final linear layer
self._avg_pooling = nn.AdaptiveAvgPool2d(1)
self._dropout = nn.Dropout(self._global_params.dropout_rate)
self._fc = nn.Linear(out_channels, self._global_params.num_classes)
#self._swish = MemoryEfficientSwish()
self._swish = Swish()
def set_swish(self, memory_efficient=True):
"""Sets swish function as memory efficient (for training) or standard (for export).
Args:
memory_efficient (bool): Whether to use memory-efficient version of swish.
"""
self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
for block in self._blocks:
block.set_swish(memory_efficient)
def extract_endpoints(self, inputs):
"""Use convolution layer to extract features
from reduction levels i in [1, 2, 3, 4, 5].
Args:
inputs (tensor): Input tensor.
Returns:
Dictionary of last intermediate features
with reduction levels i in [1, 2, 3, 4, 5].
Example:
>>> import torch
>>> from efficientnet.model import EfficientNet
>>> inputs = torch.rand(1, 3, 224, 224)
>>> model = EfficientNet.from_pretrained('efficientnet-b0')
>>> endpoints = model.extract_endpoints(inputs)
>>> print(endpoints['reduction_1'].shape) # torch.Size([1, 16, 112, 112])
>>> print(endpoints['reduction_2'].shape) # torch.Size([1, 24, 56, 56])
>>> print(endpoints['reduction_3'].shape) # torch.Size([1, 40, 28, 28])
>>> print(endpoints['reduction_4'].shape) # torch.Size([1, 112, 14, 14])
>>> print(endpoints['reduction_5'].shape) # torch.Size([1, 1280, 7, 7])
"""
endpoints = dict()
# Stem
x = self._swish(self._bn0(self._conv_stem(inputs)))
prev_x = x
# Blocks
for idx, block in enumerate(self._blocks):
drop_connect_rate = self._global_params.drop_connect_rate
if drop_connect_rate:
drop_connect_rate *= float(idx) / len(self._blocks) # scale drop connect_rate
x = block(x, drop_connect_rate=drop_connect_rate)
if prev_x.size(2) > x.size(2):
endpoints['reduction_{}'.format(len(endpoints)+1)] = prev_x
prev_x = x
# Head
x = self._swish(self._bn1(self._conv_head(x)))
endpoints['reduction_{}'.format(len(endpoints)+1)] = x
return endpoints
def extract_features(self, inputs):
"""use convolution layer to extract feature .
Args:
inputs (tensor): Input tensor.
Returns:
Output of the final convolution
layer in the efficientnet model.
"""
# Stem
x = self._swish(self._bn0(self._conv_stem(inputs)))
# Blocks
for idx, block in enumerate(self._blocks):
drop_connect_rate = self._global_params.drop_connect_rate
if drop_connect_rate:
drop_connect_rate *= float(idx) / len(self._blocks) # scale drop connect_rate
x = block(x, drop_connect_rate=drop_connect_rate)
# Head
x = self._swish(self._bn1(self._conv_head(x)))
return x
def forward(self, inputs):
"""EfficientNet's forward function.
Calls extract_features to extract features, applies final linear layer, and returns logits.
Args:
inputs (tensor): Input tensor.
Returns:
Output of this model after processing.
"""
# Convolution layers
x = self.extract_features(inputs)
# Pooling and final linear layer
x = self._avg_pooling(x)
if self._global_params.include_top:
x = x.flatten(start_dim=1)
x = self._dropout(x)
x = self._fc(x)
return x
@classmethod
def from_name(cls, model_name, in_channels=3, **override_params):
"""create an efficientnet model according to name.
Args:
model_name (str): Name for efficientnet.
in_channels (int): Input data's channel number.
override_params (other key word params):
Params to override model's global_params.
Optional key:
'width_coefficient', 'depth_coefficient',
'image_size', 'dropout_rate',
'num_classes', 'batch_norm_momentum',
'batch_norm_epsilon', 'drop_connect_rate',
'depth_divisor', 'min_depth'
Returns:
An efficientnet model.
"""
cls._check_model_name_is_valid(model_name)
blocks_args, global_params = get_model_params(model_name, override_params)
model = cls(blocks_args, global_params)
model._change_in_channels(in_channels)
return model
@classmethod
def from_pretrained(cls, model_name, weights_path=None, advprop=False,
in_channels=3, num_classes=1000, **override_params):
"""create an efficientnet model according to name.
Args:
model_name (str): Name for efficientnet.
weights_path (None or str):
str: path to pretrained weights file on the local disk.
None: use pretrained weights downloaded from the Internet.
advprop (bool):
Whether to load pretrained weights
trained with advprop (valid when weights_path is None).
in_channels (int): Input data's channel number.
num_classes (int):
Number of categories for classification.
It controls the output size for final linear layer.
override_params (other key word params):
Params to override model's global_params.
Optional key:
'width_coefficient', 'depth_coefficient',
'image_size', 'dropout_rate',
'batch_norm_momentum',
'batch_norm_epsilon', 'drop_connect_rate',
'depth_divisor', 'min_depth'
Returns:
A pretrained efficientnet model.
"""
model = cls.from_name(model_name, num_classes=num_classes, **override_params)
load_pretrained_weights(model, model_name, weights_path=weights_path, load_fc=(num_classes == 1000), advprop=advprop)
model._change_in_channels(in_channels)
return model
@classmethod
def get_image_size(cls, model_name):
"""Get the input image size for a given efficientnet model.
Args:
model_name (str): Name for efficientnet.
Returns:
Input image size (resolution).
"""
cls._check_model_name_is_valid(model_name)
_, _, res, _ = efficientnet_params(model_name)
return res
@classmethod
def _check_model_name_is_valid(cls, model_name):
"""Validates model name.
Args:
model_name (str): Name for efficientnet.
Returns:
bool: Is a valid name or not.
"""
if model_name not in VALID_MODELS:
raise ValueError('model_name should be one of: ' + ', '.join(VALID_MODELS))
def _change_in_channels(self, in_channels):
"""Adjust model's first convolution layer to in_channels, if in_channels not equals 3.
Args:
in_channels (int): Input data's channel number.
"""
if in_channels != 3:
Conv2d = get_same_padding_conv2d(image_size=self._global_params.image_size)
out_channels = round_filters(32, self._global_params)
self._conv_stem = Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False)

View File

@ -0,0 +1,143 @@
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
class mobile_net_v2(nn.Module):
def __init__(self, num_classes=2):
super(mobile_net_v2, self).__init__()
self.model = models.mobilenet_v2(pretrained=False)
# replace the last FC layer by a FC layer for our model
#num_ftrs = self.mobile_model.classifier.in_features
num_ftrs = self.model.classifier[-1].in_features
#self.mobile_model.reset_classifier(0)
self.model.classifier[1] = nn.Linear(num_ftrs//4*3, num_classes, bias=True)
self.model.features[0][0] = nn.Conv2d(3, 32//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1),bias=False)
self.model.features[0][1] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[1].conv[0][0] = nn.Conv2d(32//4*3, 32//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32//4*3, bias=False)
self.model.features[1].conv[0][1] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[1].conv[1] = nn.Conv2d(32//4*3, 16//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[1].conv[2] = nn.BatchNorm2d(16//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[0][0] = nn.Conv2d(16//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[2].conv[0][1] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[1][0] = nn.Conv2d(96//4*3, 96//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96//4*3, bias=False)
self.model.features[2].conv[1][1] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[2] = nn.Conv2d(96//4*3, 24//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[2].conv[3] = nn.BatchNorm2d(24//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[0][0] = nn.Conv2d(24//4*3, 128//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[3].conv[0][1] = nn.BatchNorm2d(128//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[1][0] = nn.Conv2d(128//4*3, 128//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128//4*3, bias=False)
self.model.features[3].conv[1][1] = nn.BatchNorm2d(128//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[2] = nn.Conv2d(128//4*3, 24//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[3].conv[3] = nn.BatchNorm2d(24//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[0][0] = nn.Conv2d(24//4*3, 144//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[4].conv[0][1] = nn.BatchNorm2d(144//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[1][0] = nn.Conv2d(144//4*3, 144//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144//4*3, bias=False)
self.model.features[4].conv[1][1] = nn.BatchNorm2d(144//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[2] = nn.Conv2d(144//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[4].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[0][0] = nn.Conv2d(32//4*3, 176//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[5].conv[0][1] = nn.BatchNorm2d(176//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[1][0] = nn.Conv2d(176//4*3, 176//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=176//4*3, bias=False)
self.model.features[5].conv[1][1] = nn.BatchNorm2d(176//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[2] = nn.Conv2d(176//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[5].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[0][0] = nn.Conv2d(32//4*3, 192//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[6].conv[0][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[1][0] = nn.Conv2d(192//4*3, 192//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192//4*3, bias=False)
self.model.features[6].conv[1][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[2] = nn.Conv2d(192//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[6].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[0][0] = nn.Conv2d(32//4*3, 192//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[7].conv[0][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[1][0] = nn.Conv2d(192//4*3, 192//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192//4*3, bias=False)
self.model.features[7].conv[1][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[2] = nn.Conv2d(192//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[7].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[0][0] = nn.Conv2d(64//4*3, 368//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[8].conv[0][1] = nn.BatchNorm2d(368//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[1][0] = nn.Conv2d(368//4*3, 368//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=368//4*3, bias=False)
self.model.features[8].conv[1][1] = nn.BatchNorm2d(368//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[2] = nn.Conv2d(368//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[8].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[9].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[9].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[2] = nn.Conv2d(384//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[9].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[10].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[10].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[2] = nn.Conv2d(384//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[10].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[11].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[11].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[2] = nn.Conv2d(384//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[11].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[0][0] = nn.Conv2d(96//4*3, 560//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[12].conv[0][1] = nn.BatchNorm2d(560//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[1][0] = nn.Conv2d(560//4*3, 560//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=560//4*3, bias=False)
self.model.features[12].conv[1][1] = nn.BatchNorm2d(560//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[2] = nn.Conv2d(560//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[12].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[0][0] = nn.Conv2d(96//4*3, 576//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[13].conv[0][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[1][0] = nn.Conv2d(576//4*3, 576//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576//4*3, bias=False)
self.model.features[13].conv[1][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[2] = nn.Conv2d(576//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[13].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[0][0] = nn.Conv2d(96//4*3, 576//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[14].conv[0][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[1][0] = nn.Conv2d(576//4*3, 576//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576//4*3, bias=False)
self.model.features[14].conv[1][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[2] = nn.Conv2d(576//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[14].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[15].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[15].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[2] = nn.Conv2d(960//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[15].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[16].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[16].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[2] = nn.Conv2d(960//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[16].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[17].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[17].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[2] = nn.Conv2d(960//4*3, 320//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[17].conv[3] = nn.BatchNorm2d(320//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[18][0] = nn.Conv2d(320//4*3, 1280//4*3, kernel_size=(1, 1), stride=(1, 1),bias=False)
self.model.features[18][1] = nn.BatchNorm2d(1280//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
def forward(self, x):
f = self.model(x)
#y = self.classifier(f)
return f

View File

@ -0,0 +1,145 @@
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib
import time
class mobile_net_v2(nn.Module):
def __init__(self, num_classes=2):
super(mobile_net_v2, self).__init__()
self.model = models.mobilenet_v2(pretrained=False)
# replace the last FC layer by a FC layer for our model
#num_ftrs = self.mobile_model.classifier.in_features
num_ftrs = self.model.classifier[-1].in_features
#self.mobile_model.reset_classifier(0)
self.model.classifier[1] = nn.Linear(num_ftrs//4*3, num_classes, bias=True)
self.model.features[0][0] = nn.Conv2d(3, 32//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1),bias=False)
self.model.features[0][1] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[1].conv[0][0] = nn.Conv2d(32//4*3, 32//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32//4*3, bias=False)
self.model.features[1].conv[0][1] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[1].conv[1] = nn.Conv2d(32//4*3, 16//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[1].conv[2] = nn.BatchNorm2d(16//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[0][0] = nn.Conv2d(16//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[2].conv[0][1] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[1][0] = nn.Conv2d(96//4*3, 96//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96//4*3, bias=False)
self.model.features[2].conv[1][1] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[2] = nn.Conv2d(96//4*3, 24//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[2].conv[3] = nn.BatchNorm2d(24//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[0][0] = nn.Conv2d(24//4*3, 128//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[3].conv[0][1] = nn.BatchNorm2d(128//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[1][0] = nn.Conv2d(128//4*3, 128//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128//4*3, bias=False)
self.model.features[3].conv[1][1] = nn.BatchNorm2d(128//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[2] = nn.Conv2d(128//4*3, 24//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[3].conv[3] = nn.BatchNorm2d(24//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[0][0] = nn.Conv2d(24//4*3, 144//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[4].conv[0][1] = nn.BatchNorm2d(144//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[1][0] = nn.Conv2d(144//4*3, 144//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144//4*3, bias=False)
self.model.features[4].conv[1][1] = nn.BatchNorm2d(144//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[2] = nn.Conv2d(144//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[4].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[0][0] = nn.Conv2d(32//4*3, 176//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[5].conv[0][1] = nn.BatchNorm2d(176//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[1][0] = nn.Conv2d(176//4*3, 176//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=176//4*3, bias=False)
self.model.features[5].conv[1][1] = nn.BatchNorm2d(176//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[2] = nn.Conv2d(176//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[5].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[0][0] = nn.Conv2d(32//4*3, 192//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[6].conv[0][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[1][0] = nn.Conv2d(192//4*3, 192//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192//4*3, bias=False)
self.model.features[6].conv[1][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[2] = nn.Conv2d(192//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[6].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[0][0] = nn.Conv2d(32//4*3, 192//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[7].conv[0][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[1][0] = nn.Conv2d(192//4*3, 192//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192//4*3, bias=False)
self.model.features[7].conv[1][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[2] = nn.Conv2d(192//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[7].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[0][0] = nn.Conv2d(64//4*3, 368//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[8].conv[0][1] = nn.BatchNorm2d(368//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[1][0] = nn.Conv2d(368//4*3, 368//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=368//4*3, bias=False)
self.model.features[8].conv[1][1] = nn.BatchNorm2d(368//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[2] = nn.Conv2d(368//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[8].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[9].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[9].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[2] = nn.Conv2d(384//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[9].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[10].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[10].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[2] = nn.Conv2d(384//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[10].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[11].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[11].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[2] = nn.Conv2d(384//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[11].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[0][0] = nn.Conv2d(96//4*3, 560//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[12].conv[0][1] = nn.BatchNorm2d(560//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[1][0] = nn.Conv2d(560//4*3, 560//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=560//4*3, bias=False)
self.model.features[12].conv[1][1] = nn.BatchNorm2d(560//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[2] = nn.Conv2d(560//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[12].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[0][0] = nn.Conv2d(96//4*3, 576//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[13].conv[0][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[1][0] = nn.Conv2d(576//4*3, 576//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576//4*3, bias=False)
self.model.features[13].conv[1][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[2] = nn.Conv2d(576//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[13].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[0][0] = nn.Conv2d(96//4*3, 576//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[14].conv[0][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[1][0] = nn.Conv2d(576//4*3, 576//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576//4*3, bias=False)
self.model.features[14].conv[1][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[2] = nn.Conv2d(576//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[14].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[15].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[15].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[2] = nn.Conv2d(960//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[15].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[16].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[16].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[2] = nn.Conv2d(960//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[16].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[17].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[17].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[2] = nn.Conv2d(960//4*3, 320//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[17].conv[3] = nn.BatchNorm2d(320//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[18][0] = nn.Conv2d(320//4*3, 1280//4*3, kernel_size=(1, 1), stride=(1, 1),bias=False)
self.model.features[18][1] = nn.BatchNorm2d(1280//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
def forward(self, x):
f = self.model(x)
#y = self.classifier(f)
return f

View File

@ -0,0 +1,19 @@
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
class mobilenet_v2(nn.Module):
def __init__(self, num_classes):
super(mobilenet_v2, self).__init__()
self.model = models.mobilenet_v2(pretrained=False)
# replace the last FC layer by a FC layer for our model
num_ftrs = self.model.classifier[-1].in_features
self.model.classifier[1] = nn.Linear(num_ftrs, num_classes, bias=True)
nn.init.xavier_uniform_(self.model.classifier[1].weight)
self.model.classifier[1].bias.data.fill_(0.01)
def forward(self, x):
f = self.model(x)
return f

View File

@ -0,0 +1,145 @@
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib
import time
class mobile_net_v2(nn.Module):
def __init__(self, num_classes=2):
super(mobile_net_v2, self).__init__()
self.model = models.mobilenet_v2(pretrained=False)
# replace the last FC layer by a FC layer for our model
#num_ftrs = self.mobile_model.classifier.in_features
num_ftrs = self.model.classifier[-1].in_features
#self.mobile_model.reset_classifier(0)
self.model.classifier[1] = nn.Linear(num_ftrs//4*3, num_classes, bias=True)
self.model.features[0][0] = nn.Conv2d(3, 32//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1),bias=False)
self.model.features[0][1] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[1].conv[0][0] = nn.Conv2d(32//4*3, 32//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32//4*3, bias=False)
self.model.features[1].conv[0][1] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[1].conv[1] = nn.Conv2d(32//4*3, 16//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[1].conv[2] = nn.BatchNorm2d(16//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[0][0] = nn.Conv2d(16//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[2].conv[0][1] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[1][0] = nn.Conv2d(96//4*3, 96//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96//4*3, bias=False)
self.model.features[2].conv[1][1] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[2].conv[2] = nn.Conv2d(96//4*3, 24//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[2].conv[3] = nn.BatchNorm2d(24//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[0][0] = nn.Conv2d(24//4*3, 128//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[3].conv[0][1] = nn.BatchNorm2d(128//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[1][0] = nn.Conv2d(128//4*3, 128//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128//4*3, bias=False)
self.model.features[3].conv[1][1] = nn.BatchNorm2d(128//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[3].conv[2] = nn.Conv2d(128//4*3, 24//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[3].conv[3] = nn.BatchNorm2d(24//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[0][0] = nn.Conv2d(24//4*3, 144//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[4].conv[0][1] = nn.BatchNorm2d(144//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[1][0] = nn.Conv2d(144//4*3, 144//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144//4*3, bias=False)
self.model.features[4].conv[1][1] = nn.BatchNorm2d(144//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[4].conv[2] = nn.Conv2d(144//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[4].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[0][0] = nn.Conv2d(32//4*3, 176//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[5].conv[0][1] = nn.BatchNorm2d(176//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[1][0] = nn.Conv2d(176//4*3, 176//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=176//4*3, bias=False)
self.model.features[5].conv[1][1] = nn.BatchNorm2d(176//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[5].conv[2] = nn.Conv2d(176//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[5].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[0][0] = nn.Conv2d(32//4*3, 192//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[6].conv[0][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[1][0] = nn.Conv2d(192//4*3, 192//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192//4*3, bias=False)
self.model.features[6].conv[1][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[6].conv[2] = nn.Conv2d(192//4*3, 32//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[6].conv[3] = nn.BatchNorm2d(32//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[0][0] = nn.Conv2d(32//4*3, 192//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[7].conv[0][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[1][0] = nn.Conv2d(192//4*3, 192//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192//4*3, bias=False)
self.model.features[7].conv[1][1] = nn.BatchNorm2d(192//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[7].conv[2] = nn.Conv2d(192//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[7].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[0][0] = nn.Conv2d(64//4*3, 368//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[8].conv[0][1] = nn.BatchNorm2d(368//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[1][0] = nn.Conv2d(368//4*3, 368//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=368//4*3, bias=False)
self.model.features[8].conv[1][1] = nn.BatchNorm2d(368//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[8].conv[2] = nn.Conv2d(368//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[8].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[9].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[9].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[9].conv[2] = nn.Conv2d(384//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[9].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[10].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[10].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[10].conv[2] = nn.Conv2d(384//4*3, 64//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[10].conv[3] = nn.BatchNorm2d(64//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[0][0] = nn.Conv2d(64//4*3, 384//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[11].conv[0][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[1][0] = nn.Conv2d(384//4*3, 384//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384//4*3, bias=False)
self.model.features[11].conv[1][1] = nn.BatchNorm2d(384//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[11].conv[2] = nn.Conv2d(384//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[11].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[0][0] = nn.Conv2d(96//4*3, 560//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[12].conv[0][1] = nn.BatchNorm2d(560//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[1][0] = nn.Conv2d(560//4*3, 560//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=560//4*3, bias=False)
self.model.features[12].conv[1][1] = nn.BatchNorm2d(560//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[12].conv[2] = nn.Conv2d(560//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[12].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[0][0] = nn.Conv2d(96//4*3, 576//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[13].conv[0][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[1][0] = nn.Conv2d(576//4*3, 576//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576//4*3, bias=False)
self.model.features[13].conv[1][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[13].conv[2] = nn.Conv2d(576//4*3, 96//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[13].conv[3] = nn.BatchNorm2d(96//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[0][0] = nn.Conv2d(96//4*3, 576//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[14].conv[0][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[1][0] = nn.Conv2d(576//4*3, 576//4*3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576//4*3, bias=False)
self.model.features[14].conv[1][1] = nn.BatchNorm2d(576//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[14].conv[2] = nn.Conv2d(576//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[14].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[15].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[15].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[15].conv[2] = nn.Conv2d(960//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[15].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[16].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[16].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[16].conv[2] = nn.Conv2d(960//4*3, 160//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[16].conv[3] = nn.BatchNorm2d(160//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[0][0] = nn.Conv2d(160//4*3, 960//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[17].conv[0][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[1][0] = nn.Conv2d(960//4*3, 960//4*3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960//4*3, bias=False)
self.model.features[17].conv[1][1] = nn.BatchNorm2d(960//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[17].conv[2] = nn.Conv2d(960//4*3, 320//4*3, kernel_size=(1, 1), stride=(1, 1), bias=False)
self.model.features[17].conv[3] = nn.BatchNorm2d(320//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.model.features[18][0] = nn.Conv2d(320//4*3, 1280//4*3, kernel_size=(1, 1), stride=(1, 1),bias=False)
self.model.features[18][1] = nn.BatchNorm2d(1280//4*3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
def forward(self, x):
f = self.model(x)
#y = self.classifier(f)
return f

View File

@ -0,0 +1,19 @@
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
class resnet18(nn.Module):
def __init__(self, num_classes):
super(resnet18, self).__init__()
self.model = models.resnet18(pretrained=False)
# replace the last FC layer by a FC layer for our model
num_ftrs = self.model.fc.in_features
self.model.fc = nn.Linear(num_ftrs, num_classes, bias=True)
nn.init.xavier_uniform_(self.model.fc.weight)
self.model.fc.bias.data.fill_(0.01)
def forward(self, x):
f = self.model(x)
return f

View File

@ -0,0 +1,20 @@
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
class resnet50(nn.Module):
def __init__(self, num_classes):
super(resnet50, self).__init__()
self.model = models.resnet50(pretrained=False)
# replace the last FC layer by a FC layer for our model
num_ftrs = self.model.fc.in_features
self.model.fc = nn.Linear(num_ftrs, num_classes, bias=True)
nn.init.xavier_uniform_(self.model.fc.weight)
self.model.fc.bias.data.fill_(0.01)
def forward(self, x):
f = self.model(x)
return f

View File

@ -0,0 +1,39 @@
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, models, transforms
import numpy as np
from load_model import initialize_model
import argparse
import os
import sys
import scipy.io
import torch.onnx
def main(args=None):
parser = argparse.ArgumentParser(description='converter.')
parser.add_argument('--save-path', type=str, help='Path to the onnx model.', default=None)
parser.add_argument('--backbone', help='Backbone model.', default='resnet18', type=str)
parser.add_argument('--num_classes', help='the number of classes.', type = int, default=0)
parser.add_argument('--model-def-path', type=str, help='Path to pretrained model definition', default=None )
parser.add_argument('--snapshot', help='Path to the pretrained models.')
print(vars(parser.parse_args()))
args = parser.parse_args()
model_structure, input_size = initialize_model(args.backbone, args.num_classes, False, args.model_def_path)
model_structure.load_state_dict(torch.load(args.snapshot))
model = model_structure.eval()
dummy_input = torch.randn(1, 3, input_size[0],input_size[1])
save_path = args.save_path
if args.save_path is None:
save_path = args.backbone+'.onnx'
torch.onnx.export(model, dummy_input, save_path, keep_initializers_as_inputs=True, opset_version=11)
if __name__ == '__main__':
main()

View File

@ -0,0 +1,6 @@
numpy>=1.18.5
torch>=1.4.0
torchvision>=0.5.0
sklearn
onnx==1.6.0
onnxruntime

View File

@ -0,0 +1,12 @@
import torch
import os
def save_model(network, model_name, snapshot_path, epoch_label, device):
save_filename = model_name + '_%s.pth'% epoch_label
save_path = os.path.join(snapshot_path,save_filename)
if not os.path.isdir(snapshot_path):
os.makedirs(snapshot_path)
print('saving model ', save_path)
torch.save(network.cpu().state_dict(), save_path)
network = network.to(device)
return network

View File

@ -0,0 +1,87 @@
import argparse
import os
import sys
from datetime import date
import torch
from load_data import load_data
from loss_functions import load_loss_functions
from load_optimizer import load_optimizer
from load_lr_scheduler import load_lr_scheduler
from train_model import train_model
from load_model import initialize_model
from save_model import save_model
def makedirs(path):
# Intended behavior: try to create the directory,
# pass if the directory exists already, fails otherwise.
try:
os.makedirs(path)
except OSError:
if not os.path.isdir(path):
raise
def check_args(parsed_args):
""" Function to check for inherent contradictions within parsed arguments.
Args
parsed_args: parser.parse_args()
Returns
parsed_args
"""
if parsed_args.gpu >= 0 and torch.cuda.is_available() == False:
raise ValueError("No gpu is available")
return parsed_args
def parse_args(args):
"""
Parse the arguments.
"""
today = str(date.today())
parser = argparse.ArgumentParser(description='Simple training script for training a image classification network.')
parser.add_argument('data_dir', type=str, help='Path to your dataset')
parser.add_argument('--model-name', type=str, help='Name of your model', default='model_ft' )
parser.add_argument('--model-def-path', type=str, help='Path to pretrained model definition', default=None )
parser.add_argument('--lr', type=float, help='Learning rate', default=5e-3)
parser.add_argument('--backbone', help='Backbone model.', default='resnet18', type=str)
parser.add_argument('--gpu', help='Id of the GPU to use (as reported by nvidia-smi). (-1 for cpu)',type=int,default=-1)
parser.add_argument('--workers', help='The number of dataloader workers',type=int, default=1)
parser.add_argument('--epochs', help='Number of epochs to train.', type=int, default=100)
parser.add_argument('--freeze-backbone', help='Freeze training of backbone layers.', type=int, default=0)
parser.add_argument('--batch-size', help='Size of the batches.', default=128, type=int)
parser.add_argument('--snapshot', help='Path to the pretrained models.')
parser.add_argument('--snapshot-path', help='Path to store snapshots of models during training (defaults to \'snapshots\')', default='./snapshots/{}'.format(today))
parser.add_argument('--optimizer', help='Choose an optimizer from SGD, ASGD and ADAM', type=str, default='SGD')
parser.add_argument('--loss', help='Choose a loss function', type=str, default='cross_entropy')
parser.add_argument('--early-stop', help='Choose if early stopping', type=int, default=1)
parser.add_argument('--patience', help='Choose patience for early stopping',type=int, default=7)
print(vars(parser.parse_args(args)))
return check_args(parser.parse_args(args))
def main(args=None):
# parse arguments
if args is None:
args = sys.argv[1:]
args = parse_args(args)
device = "cuda:"+str(args.gpu) if args.gpu >= 0 else "cpu"
num_classes = len([f for f in os.listdir(os.path.join(args.data_dir, 'train')) if not f.startswith('.')])
model_ft, input_size = initialize_model(args.backbone, num_classes, args.freeze_backbone, model_def_path = args.model_def_path, use_pretrained=args.snapshot)
dataloaders_dict = load_data(args.data_dir, args.batch_size, input_size, args.workers)
optimizer_ft = load_optimizer(model_ft, lr=args.lr, freeze_backbone = args.freeze_backbone, op_type=args.optimizer)
lr_scheduler_ft = load_lr_scheduler(optimizer_ft)
criterion = load_loss_functions(loss_func = args.loss)
# Train
model_ft,_ = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, lr_scheduler_ft, device, args.snapshot_path, model_name = args.model_name, num_epochs=args.epochs,early_stop = args.early_stop, patience = args.patience)
save_model(model_ft, args.model_name, args.snapshot_path, 'best', device)
return model_ft
if __name__ == '__main__':
main()

View File

@ -0,0 +1,92 @@
import torch
import torch.nn as nn
import torch.optim as optim
import time
import copy
from save_model import save_model
from early_stopping import EarlyStopping
def train_model(model, dataloaders, criterion, optimizer, lr_scheduler, device, snapshot_path, model_name = 'model_ft', num_epochs=25, early_stop = False, patience = 7):
since = time.time()
val_acc_history = []
model = model.to(device)
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
# initialize the early_stopping object
early_stopping = EarlyStopping(model_name, patience=patience, verbose=True, path = snapshot_path)
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
# Get model outputs and calculate loss
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase].dataset)
epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'val' and epoch_acc >= best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
if phase == 'val':
val_acc_history.append(epoch_acc)
lr_scheduler.step(epoch_acc)
print()
if early_stop:
early_stopping(epoch_loss, model, epoch)
if early_stopping.early_stop:
print("Early stopping")
break
elif epoch%10 == 9:
save_model(model, model_name, snapshot_path, epoch, device)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model, val_acc_history

View File

@ -0,0 +1,186 @@
<h1 align="center"> Image Classification </h1>
The tutorial explores the basis of image classification task. In this document, we will go through a concrete example of how to train an image classification model via our AI training platform. The dataset containing bees and ants is provided.
Image Classification is a fundamental task that attempts to classify the image by assigning it to a specific label. Our AI training platform provides the training script to train a classification model for image classification task.
# Prerequisites
First of all, we have to install the libraries. Python 3.6 or above is required. For other libraries, you can check the `requirements.txt` file. Installing these packages is simple. You can install them by running:
```
pip install -r requirements.txt
```
# Dataset & Preparation
Next, we need a dataset for the training model.
## Custom Datasets
You can train the model on a custom dataset. Your own datasets are expected to have the following structure:
```shell
- Dataset name
-- train
--- Class1
--- Class2
-- val
--- Class1
--- Class2
```
## Example
Let's go through a toy example for preparing a custom dataset. Suppose we are going to classify bees and ants.
<div align="center">
<img src="./image_data/train/ants/0013035.jpg" width="33%" /> <img src="./image_data/train/bees/1092977343_cb42b38d62.jpg" width="33%" />
</div>
First of all, we have to split the images for bees and ants into train and validation set respectively (recommend 8:2). Then, we can move the images into difference folders with their class names. The dataset folder will have the following structure.
```shell
- image data
-- train
--- ants
--- bees
-- val
--- ants
--- bees
```
Now, we have finished preparing the dataset.
# Train
Following the examples in the previous section, let's finetune a pretrained model on our custom dataset. The pretrained model we used here is the MobileNet model. We download the pretrained model from [Model_Zoo](https://github.com/kneron/Model_Zoo/tree/main/classification/MobileNetV2) by:
```shell
wget https://raw.githubusercontent.com/kneron/Model_Zoo/main/classification/MobileNetV2/MobileNetV2.pth
```
Since our dataset is quite small, we choose to frezze the backbone model and only finetune the last layer. Following the instruction above, run:
```shell
python train.py --gpu -1 --freeze-backbone 1 --backbone mobilenetv2 --early-stop 1 --snapshot MobileNetV2.pth --snapshot-path snapshots/exp/ ./tutorial/image_data
```
The following training messages will be printed:
```shell
{'data_dir': './tutorial/image_data', 'model_name': 'model_ft', 'model_def_path': None, 'lr': 0.001, 'backbone': 'mobilenetv2', 'gpu': -1, 'epochs': 100, 'freeze_backbone': 1, 'batch_size': 64, 'snapshot': 'MobileNetV2.pth', 'snapshot_path': 'snapshots/exp/', 'optimizer': 'SGD', 'loss': 'cross_entropy', 'early_stop': 1, 'patience': 7}
Initializing Datasets and Dataloaders...
-------------Label mapping to Idx:--------------
{0: 'ants', 1: 'bees'}
------------------------------------------------
Params to learn:
model.classifier.1.weight
model.classifier.1.bias
Epoch 0/99
----------
train Loss: 0.7786 Acc: 0.4303
val Loss: 0.6739 Acc: 0.6056
Validation loss decreased (inf --> 0.673929). Saving model ...
...
```
When the validation mAP stops increasing for 7 epochs, the early stopping will be triggered and the training process will be terminated. The trained model is saved under `./snapshots/exp` folder. In addition, the class label to idx mapping is printed and automatically saved in `./eval_utils/class_id.json`.
# Converting to ONNX
You may check the [Toolchain manual](http://doc.kneron.com/docs/#toolchain/manual/) for converting PyTorch model to ONNX model. Let's go through an example for converting FP_classifier PyTorch model to ONNX model.
Execute commands in the folder `classification`:
```shell
python pytorch2onnx.py --backbone mobilenetv2 --num_classes 2 --snapshot snapshots/exp/model_ft_best.pth --save-path snapshots/exp/model_ft_best.onnx
```
We could get `model_ft_best.onnx`.
Execute commands in the folder `ONNX_Convertor/optimizer_scripts`:
(reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts)
```shell
git clone https://github.com/kneron/ONNX_Convertor.git
```
```shell
python ONNX_Convertor/optimizer_scripts/pytorch2onnx.py snapshots/exp/model_ft_best.onnx snapshots/exp/model_ft_best_convert.onnx
```
We could get `model_ft_best_convert.onnx`.
# Inference
In this section, we will go through an example of using a trained network for inference. That is, we will use the function `inference.py` that takes an image and predict the class label for the image. `inference.py` returns the top $K$ most likely classes along with the probabilities. Let's run our network on the following image, a bee image from our custom dataset:
<div align="center">
<img src="./image_data/val/bees/10870992_eebeeb3a12.jpg" width="30%" />
</div>
```shell
python inference.py --gpu -1 --backbone mobilenetv2 --snapshot snapshots/exp/model_ft_best.pth --model-def-path models/MobileNetV2/ --class_id_path eval_utils/class_id.json --img-path tutorial/image_data/val/bees/10870992_eebeeb3a12.jpg
{'img_path': 'tutorial/image_data/val/bees/10870992_eebeeb3a12.jpg', 'backbone': 'mobilenetv2', 'class_id_path': 'eval_utils/class_id.json', 'gpu': -1, 'model_def_path': 'models/MobileNetV2/', 'snapshot': 'snapshots/exp/model_ft_best.pth', 'save_path': 'inference_result.json', 'onnx': False}
Label Probability
bees 0.836
ants 0.164
```
Note that the class ID mapping file `eval_utils/class_id.json` was created during training process. After inference, we could get `inference_result.json`, which contains the following information:
```bash
{"img_path": "/home/ziyan/git_repo/ai_training/ai_training/classification/tutorial/image_data/val/bees/10870992_eebeeb3a12.jpg", "0_0": [[0.8359974026679993, 1], [0.16400262713432312, 0]]}
```
For onnx inference, add `--onnx` argument when execute `inference.py`:
```shell
python inference.py --img-path tutorial/image_data/val/bees/10870992_eebeeb3a12.jpg --snapshot snapshots/exp/model_ft_best_convert.onnx --onnx
{'img_path': 'tutorial/image_data/val/bees/10870992_eebeeb3a12.jpg', 'backbone': 'resnet18', 'class_id_path': './eval_utils/class_id.json', 'gpu': -1, 'model_def_path': None, 'snapshot': 'snapshots/exp/model_ft_best_convert.onnx', 'save_path': 'inference_result.json', 'onnx': True}
Label Probability
bees 0.836
ants 0.164
```
# Evaluation
## Evaluation on a dataset
In this section, we will go through an example of evaluating a trained network on a dataset. Here, we are going to evaluate a pretrained model on the validation set of our custom dataset. The `./eval_utils/eval.py` will report the top-K score and F1 score for the model evaluated on a testing dataset. The evaluation statistics will be saved to `eval_results.txt`.
```shell
python eval_utils/eval.py --gpu -1 --backbone mobilenetv2 --snapshot snapshots/exp/model_ft_best.pth --data-dir ./tutorial/image_data/val/
{'data_dir': './tutorial/image_data/val/', 'model_def_path': None, 'backbone': 'mobilenetv2', 'snapshot': 'snapshots/exp/model_ft_best.pth', 'gpu': -1, 'preds': None, 'gts': None}
top 1 accuracy: 0.9225352112676056
Label Precision Recall F1 score
ants 0.887 0.932 0.909
bees 0.950 0.916 0.933
```
## End-to-End Evaluation
For end-to-end testing, we expect that the prediction results are saved into json files, one json file for one image, with the following format:
```bash
{"img_path": image_path,
"0_0":[[score, label], [score, label], ...]
}
```
The prediction json files for all images are expected to saved under the same folder. The ground truth json file is expected to have the following format:
```bash
{image1_path: label,
image2_path: label,
...
}
```
For this tutorial, we generated some random prediction data saved under the folder `tutorial/eval_data/preds/`, and the ground turth is saved in `tutorial/eval_data/gts.json`. You may check these files for the format. To compute the evaluation statistics, execute commands in the folder `classification`:
```shell
python eval_utils/eval.py --preds tutorial/eval_data/preds/ --gts tutorial/eval_data/gts.json
{'model_def_path': None, 'data_dir': None, 'backbone': 'resnet18', 'preds': 'tutorial/eval_data/preds/', 'gts': 'tutorial/eval_data/gts.json', 'snapshot': None, 'gpu': -1}
top 1 accuracy: 1.0
Label Precision Recall F1 score
0 1.000 1.000 1.000
1 1.000 1.000 1.000
2 1.000 1.000 1.000
```
The evaluation statistics will be saved to `eval_results.txt`.

View File

@ -0,0 +1 @@
{"1.jpg": 1, "2.jpg": 0, "3.jpg": 2}

View File

@ -0,0 +1,3 @@
{"img_path": "1.jpg",
"0_0":[[0.8, 1], [0.1, 0], [0.1, 2]]
}

View File

@ -0,0 +1,2 @@
{"img_path": "2.jpg",
"0_0": [[0.8, 0], [0.1, 1], [0.1, 2]]}

View File

@ -0,0 +1 @@
{"img_path": "3.jpg", "0_0": [[0.8, 2], [0.1, 1], [0.1, 0]]}

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 140 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 149 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 155 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 174 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 208 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 184 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

Some files were not shown because too many files have changed in this diff Show More