Image Classification
The tutorial explores the basis of image classification task. This document contains the explanations of arguments of each script.
You can find the tutorial for finetuning a pretrained model on custom dataset under the tutorial folder, tutorial/README.md.
The ipython notebook tutorial is also prepared under the tutorial folder as tutorial/tutorial.ipynb. You may upload and run this ipython notebook on Google colab.
Image Classification is a fundamental task that attempts to classify the image by assigning it to a specific label. Our AI training platform provides the training script to train a classification model for image classification task.
Prerequisites
First of all, we have to install the libraries. Python 3.6 or above is required. For other libraries, you can check the requirements.txt file. Installing these packages is simple. You can install them by running:
pip install -r requirements.txt
Dataset & Preparation
Next, we need a dataset for the training model.
Custom Datasets
You can train the model on a custom dataset. Your own datasets are expected to have the following structure:
- Dataset name
-- train
--- Class1
--- Class2
-- val
--- Class1
--- Class2
Example
Let's go through a toy example for preparing a custom dataset. Suppose we are going to classify bees and ants.
First of all, we have to split the images for bees and ants into train and validation set respectively (recommend 8:2). Then, we can move the images into difference folders with their class names. The dataset folder will have the following structure.- image data
-- train
--- ants
--- bees
-- val
--- ants
--- bees
Now, we have finished preparing the dataset.
Train
Let's look at how to train or finetune a model. There are several backbone models and arguments to choose. You can find the FPS results of these backbone models evaluated on 520 and 720 in the next section.
For training on a custom dataset, run:
python train.py --gpu -1 --backbone backbone_name --model-def-path path_to_model_definition_folder --snapshot path_to_pretrained_model_weights path_to_dataset_folder
--gpu which gpu to run. (-1 if cpu)
--workers the number of dataloader workers. (Default: 1)
--backbone which backbone model to use. Options: see Models(#Models).
--freeze-backbone whether freeze the backbone when the pretrained model is used. (Default: 0)
--early-stop whether early stopping when validation accuracy increases. (Default: 1)
--patience patience for early stopping. (Default: 7)
--model-name name of your model.
--lr learning rate. (Default: 1e-3)
--model-def-path path to pretrained model definition folder. (Default: './models/')
--snapshot path to the pretrained model. (Default: None)
--epochs number of epochs to train. (Default: 100)
--batch-size size of the batches. (Default: 64)
--snapshot-path path to store snapshots of models during training. (Default: 'snapshots/{}'.format(today))
--optimizer optimizer for training. Options: SGD, ASGD, ADAM. (Default: SGD)
--loss loss function. Options: cross_entropy. (Default: cross_entropy)
Converting to ONNX
You may check the Toolchain manual for converting PyTorch model to ONNX model. Let's go through an example for converting FP_classifier PyTorch model to ONNX model.
Execute commands in the folder classification:
python pytorch2onnx.py --backbone backbone_name --num_classes the_number_of_classes --snapshot pytorch_model_path --save-path onnx_model_path
--save-path path to save the onnx model.
--backbone which backbone model to use. Options: see Models(#Models).
--num_classes the number of classes.
--model-def-path path to pretrained model definition
--snapshot path to the pretrained model.
We could get pytorch to onnx model.
Then, execute commands in the folder ONNX_Convertor/optimizer_scripts:
(reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts)
python pytorch2onnx.py onnx_model_path onnx_model_convert_path
We could get converted onnx model.
Inference
In this section, we will go through using a trained network for inference. That is, we will use the function inference.py that takes an image and predict the class label for the image. inference.py returns the top K most likely classes along with the probabilities.
For inference on a image, run:
python train.py --gpu -1 --backbone backbone_name --model-def-path path_to_model_definition_folder --snapshot path_to_pretrained_model_weights path_to_dataset_folder
--gpu which gpu to run. (-1 if cpu)
--backbone which backbone model to use. Options: see Models(#Models).
--model-def-path path to pretrained model definition folder. (Default: './models/')
--snapshot path to the pretrained model. (Default: None)
--img-path Path to the image.
--class_id_path path to the class id mapping file. (Default: './eval_utils/class_id.json')
--save-path path to save the classification result. (Default: 'inference_result.json')
--onnx whether inference onnx model
You could find preprocessing and postprocessing processes in inference.py.
Evaluation
Evaluation Metric
We will consider top-K score, precision, recall and F1 score for evaluating our model. You can find the script for computing these metrics in eval_utils/eval.py.
top-K score: This metric computes the number of times where the correct label is among the top k labels predicted (ranked by predicted scores). Note that the multilabel case isn’t covered here.
precision: The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0.
recall: The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples. The best value is 1 and the worst value is 0.
F1 score: The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:
F1 = 2 * (precision * recall) / (precision + recall).
Evaluation on a dataset
In this section, we will go through evaluating a trained network on a dataset. Here, we are going to evaluate a pretrained model on the validation set of the custom dataset. The ./eval_utils/eval.py will report the top-K score, precision, recall and F1 score for the model evaluated on a testing dataset. The evaluation statistics will be saved to eval_results.txt.
python eval_utils/eval.py --gpu -1 --backbone backbone_name --snapshot path_to_pretrained_model_weights --model-def-path path_to_model_definition_folder --data-dir path_to_dataset_folder
--gpu which gpu to run. (-1 if cpu)
--backbone which backbone model to use. Options: see Models(#Models).
--model-def-path path to pretrained model definition folder. (Default: './models/')
--snapshot path to the pretrained model weight. (Default: None)
--data-dir path to dataset folder. (Default: None)
End-to-End Evaluation
For end-to-end testing, we expect that the prediction results are saved into json files, one json file for one image, with the following format:
{"img_path": image_path,
"0_0":[[score, label], [score, label], ...]
}
The prediction json files for all images are expected to saved under the same folder. The ground truth json file is expected to have the following format:
{image1_path: label,
image2_path: label,
...
}
To compute the evaluation statistics, execute commands in the folder classification:
python eval_utils/eval.py --preds path_to_predicted_results --gts path_to_ground_truth
--preds path to predicted results. (e2e eval)
--gts path to ground truth. (e2e eval)
The evaluation statistics will be saved to eval_results.txt.
Models
| Model | Input Size | FPS on 520 | FPS on 720 | Model Size |
|---|---|---|---|---|
| FP_classifier | 56x32 | 323.471 | 3370.47 | 5.1M |
| mobilenetv2 | 224x224 | 58.9418 | 620.677 | 14M |
| resnet18 | 224x224 | 20.4376 | 141.371 | 46.9M |
| resnet50 | 224x224 | 6.32576 | 49.0828 | 102.9M |
| efficientnet-b0 | 224x224 | 42.3118 | 157.482 | 18.6M |
| efficientnet-b1 | 224x224 | 28.0051 | 110.907 | 26.7M |
| efficientnet-b2 | 224x224 | 24.164 | 101.598 | 31.1M |
| efficientnet-b3 | 224x224 | 18.4925 | 71.9006 | 41.4M |
| efficientnet-b4 | 224x224 | 12.1506 | 52.3374 | 64.7M |
| efficientnet-b5 | 224x224 | 7.7483 | 35.4869 | 100.7M |
| efficientnet-b6 | 224x224 | 4.96453 | 26.5797 | 141.9M |
| efficientnet-b7 | 224x224 | 3.35853 | 17.9795 | 217.4M |
Note that for EfficientNet, Squeeze-and-Excitation layers are removed and Swish function is replaced by ReLU.
FP_classifier is a pretrained model for classifying person and background images. The class id label mapping file is saved as ./eval_utils/person_class_id.json.
| \ | FP_classifier | mobilenetv2 | resnet18 | resnet50 |
|---|---|---|---|---|
| Rank 1 | 94.13% | 69.82% | 66.46% | 72.80% |
| Rank 5 | - | 89.29% | 87.09% | 90.91% |
Resnet50 is currently under training for Kneron preprocessing.

