Image Classification

The tutorial explores the basis of image classification task. This document contains the explanations of arguments of each script. You can find the tutorial for finetuning a pretrained model on custom dataset under the `tutorial` folder, `tutorial/README.md`. The ipython notebook tutorial is also prepared under the `tutorial` folder as `tutorial/tutorial.ipynb`. You may upload and run this ipython notebook on Google colab. Image Classification is a fundamental task that attempts to classify the image by assigning it to a specific label. Our AI training platform provides the training script to train a classification model for image classification task. # Prerequisites First of all, we have to install the libraries. Python 3.6 or above is required. For other libraries, you can check the `requirements.txt` file. Installing these packages is simple. You can install them by running: ``` pip install -r requirements.txt ``` # Dataset & Preparation Next, we need a dataset for the training model. ## Custom Datasets You can train the model on a custom dataset. Your own datasets are expected to have the following structure: ```shell - Dataset name -- train --- Class1 --- Class2 -- val --- Class1 --- Class2 ``` ## Example Let's go through a toy example for preparing a custom dataset. Suppose we are going to classify bees and ants.
First of all, we have to split the images for bees and ants into train and validation set respectively (recommend 8:2). Then, we can move the images into difference folders with their class names. The dataset folder will have the following structure. ```shell - image data -- train --- ants --- bees -- val --- ants --- bees ``` Now, we have finished preparing the dataset. # Train Let's look at how to train or finetune a model. There are several backbone models and arguments to choose. You can find the FPS results of these backbone models evaluated on 520 and 720 in the next section. For training on a custom dataset, run: ```shell python train.py --gpu -1 --backbone backbone_name --model-def-path path_to_model_definition_folder --snapshot path_to_pretrained_model_weights path_to_dataset_folder ``` `--gpu` which gpu to run. (-1 if cpu) `--workers` the number of dataloader workers. (Default: 1) `--backbone` which backbone model to use. Options: see Models(#Models). `--freeze-backbone` whether freeze the backbone when the pretrained model is used. (Default: 0) `--early-stop` whether early stopping when validation accuracy increases. (Default: 1) `--patience` patience for early stopping. (Default: 7) `--model-name` name of your model. `--lr` learning rate. (Default: 1e-3) `--model-def-path` path to pretrained model definition folder. (Default: './models/') `--snapshot` path to the pretrained model. (Default: None) `--epochs` number of epochs to train. (Default: 100) `--batch-size` size of the batches. (Default: 64) `--snapshot-path` path to store snapshots of models during training. (Default: 'snapshots/{}'.format(today)) `--optimizer` optimizer for training. Options: SGD, ASGD, ADAM. (Default: SGD) `--loss` loss function. Options: cross_entropy. (Default: cross_entropy) # Converting to ONNX You may check the [Toolchain manual](http://doc.kneron.com/docs/#toolchain/manual/) for converting PyTorch model to ONNX model. Let's go through an example for converting FP_classifier PyTorch model to ONNX model. Execute commands in the folder `classification`: ```shell python pytorch2onnx.py --backbone backbone_name --num_classes the_number_of_classes --snapshot pytorch_model_path --save-path onnx_model_path ``` `--save-path` path to save the onnx model. `--backbone` which backbone model to use. Options: see Models(#Models). `--num_classes` the number of classes. `--model-def-path` path to pretrained model definition `--snapshot` path to the pretrained model. We could get pytorch to onnx model. Then, execute commands in the folder `ONNX_Convertor/optimizer_scripts`: (reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts) ```shell python pytorch2onnx.py onnx_model_path onnx_model_convert_path ``` We could get converted onnx model. # Inference In this section, we will go through using a trained network for inference. That is, we will use the function `inference.py` that takes an image and predict the class label for the image. `inference.py` returns the top $K$ most likely classes along with the probabilities. For inference on a image, run: ```shell python train.py --gpu -1 --backbone backbone_name --model-def-path path_to_model_definition_folder --snapshot path_to_pretrained_model_weights path_to_dataset_folder ``` `--gpu` which gpu to run. (-1 if cpu) `--backbone` which backbone model to use. Options: see Models(#Models). `--model-def-path` path to pretrained model definition folder. (Default: './models/') `--snapshot` path to the pretrained model. (Default: None) `--img-path` Path to the image. `--class_id_path` path to the class id mapping file. (Default: './eval_utils/class_id.json') `--save-path` path to save the classification result. (Default: 'inference_result.json') `--onnx` whether inference onnx model You could find preprocessing and postprocessing processes in `inference.py`. # Evaluation ## Evaluation Metric We will consider `top-K score`, `precision`, `recall` and `F1 score` for evaluating our model. You can find the script for computing these metrics in `eval_utils/eval.py`. `top-K score`: This metric computes the number of times where the correct label is among the top k labels predicted (ranked by predicted scores). Note that the multilabel case isn’t covered here. `precision`: The precision is the ratio `tp / (tp + fp)` where `tp` is the number of true positives and `fp` the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0. `recall`: The recall is the ratio `tp / (tp + fn)` where `tp` is the number of true positives and `fn` the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples. The best value is 1 and the worst value is 0. `F1 score`: The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is: `F1 = 2 * (precision * recall) / (precision + recall)`. ## Evaluation on a dataset In this section, we will go through evaluating a trained network on a dataset. Here, we are going to evaluate a pretrained model on the validation set of the custom dataset. The `./eval_utils/eval.py` will report the top-K score, precision, recall and F1 score for the model evaluated on a testing dataset. The evaluation statistics will be saved to `eval_results.txt`. ```shell python eval_utils/eval.py --gpu -1 --backbone backbone_name --snapshot path_to_pretrained_model_weights --model-def-path path_to_model_definition_folder --data-dir path_to_dataset_folder ``` `--gpu` which gpu to run. (-1 if cpu) `--backbone` which backbone model to use. Options: see Models(#Models). `--model-def-path` path to pretrained model definition folder. (Default: './models/') `--snapshot` path to the pretrained model weight. (Default: None) `--data-dir` path to dataset folder. (Default: None) ## End-to-End Evaluation For end-to-end testing, we expect that the prediction results are saved into json files, one json file for one image, with the following format: ```bash {"img_path": image_path, "0_0":[[score, label], [score, label], ...] } ``` The prediction json files for all images are expected to saved under the same folder. The ground truth json file is expected to have the following format: ```bash {image1_path: label, image2_path: label, ... } ``` To compute the evaluation statistics, execute commands in the folder `classification`: ```shell python eval_utils/eval.py --preds path_to_predicted_results --gts path_to_ground_truth ``` `--preds` path to predicted results. (e2e eval) `--gts` path to ground truth. (e2e eval) The evaluation statistics will be saved to `eval_results.txt`. # Models Model | Input Size | FPS on 520 | FPS on 720 | Model Size --- | :---: |:---:|:---:|:---: [FP_classifier](https://github.com/kneron/Model_Zoo/tree/main/classification/FP_classifier)| 56x32 | 323.471 | 3370.47 | 5.1M [mobilenetv2](https://github.com/kneron/Model_Zoo/tree/main/classification/MobileNetV2)| 224x224 | 58.9418 | 620.677 | 14M [resnet18](https://github.com/kneron/Model_Zoo/tree/main/classification/ResNet18)| 224x224 | 20.4376 | 141.371 | 46.9M [resnet50](https://github.com/kneron/Model_Zoo/tree/main/classification/ResNet50)| 224x224 | 6.32576 | 49.0828 | 102.9M efficientnet-b0| 224x224 | 42.3118 | 157.482 | 18.6M efficientnet-b1| 224x224 | 28.0051 | 110.907 | 26.7M efficientnet-b2| 224x224 | 24.164 | 101.598 | 31.1M efficientnet-b3| 224x224 | 18.4925 | 71.9006 | 41.4M efficientnet-b4| 224x224 | 12.1506 | 52.3374 | 64.7M efficientnet-b5| 224x224 | 7.7483 | 35.4869 | 100.7M efficientnet-b6| 224x224 | 4.96453 | 26.5797 | 141.9M efficientnet-b7| 224x224 | 3.35853 | 17.9795 | 217.4M Note that for EfficientNet, Squeeze-and-Excitation layers are removed and Swish function is replaced by ReLU. FP_classifier is a pretrained model for classifying person and background images. The class id label mapping file is saved as `./eval_utils/person_class_id.json`. \ | FP_classifier | mobilenetv2 | resnet18 | resnet50 --- | :---: | :---: | :---: | :---: Rank 1 | 94.13% | 69.82% | 66.46% | 72.80% Rank 5 | - | 89.29% | 87.09% | 90.91% Resnet50 is currently under training for Kneron preprocessing.