355 lines
21 KiB
Markdown
355 lines
21 KiB
Markdown
<h1 align="center"> Object Detection </h1>
|
|
|
|
The tutorial explores the basis of object detection task. In this document, we will go through a concrete example of how to train an image detection model via our AI training platform. The COCO128 dataset is provided.
|
|
|
|
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Specifically, object detection algorithm predicts a bounding box with a category label for each instance of interest in an image. Our AI training platform provides the training script to train a FCOS model for object detection task.
|
|
|
|
# Prerequisites
|
|
Firstly, we have to install the libraries. Python 3.6 or above is required. For other libraries, you can check the `requirements.txt` file. Installing these packages is simple. You can install them by running:
|
|
```
|
|
$ pip install -U pip
|
|
$ pip install -r requirements.txt
|
|
$ python setup.py build_ext --inplace
|
|
```
|
|
|
|
# Dataset & Preparation
|
|
Next, we need a dataset for the training model. For this tutorial, we use COCO128 dataset.
|
|
|
|
## Custom Datasets
|
|
|
|
Let's go through a toy example for preparing a custom dataset. Please note that it is for pedagogical use only. The data preparation for FCOS is similar to preparation for YOLOv5. The images are stored under the `yolov5/coco128` folder. We used [makesense.ai](https://www.makesense.ai) to annotate the images.
|
|
|
|
(1) Upload images to [makesense.ai](https://www.makesense.ai) and select Object Detection option.
|
|
<div align="center">
|
|
<img src="./screenshots/make_sense_upload.jpg" width="40%" /> <img src="./screenshots/make_sense_det.jpg" width="40%" />
|
|
</div>
|
|
|
|
(2) Create labels, and then draw the bounding boxes and choose labels for each image.
|
|
<div align="center">
|
|
<img src="./screenshots/make_sense_label.jpg" width="30%" /> <img src="./screenshots/make_sense_img001.jpg" width="30%" /> <img src="./screenshots/make_sense_img002.jpg" width="30%" />
|
|
</div>
|
|
|
|
(3) Export the annotations with YOLO format.
|
|
<div align="center">
|
|
<img src="./screenshots/make_sense_out.jpg" width="40%" /> <img src="./screenshots/make_sense_export.jpg" width="40%" />
|
|
</div>
|
|
|
|
(4) Eventually, you should get *.txt file for each image. (if no objects in image, no *.txt file is created)
|
|
<div align="center">
|
|
<img src="./screenshots/make_sense_final.jpg" width="40%" />
|
|
</div>
|
|
|
|
## Directory Organization
|
|
The COCO128 datasets are expected to have the following structure.
|
|
|
|
```shell
|
|
- coco128
|
|
|
|
-- images
|
|
--- train2017
|
|
---- img001.jpg
|
|
---- img002.jpg
|
|
---- ...
|
|
--- val
|
|
---- img003.jpg
|
|
---- ...
|
|
|
|
-- labels2017
|
|
--- train
|
|
---- img001.txt
|
|
---- img002.txt
|
|
---- ...
|
|
--- val
|
|
---- img003.txt
|
|
---- ...
|
|
```
|
|
## dataset.yaml
|
|
You need to prepare the yaml file and save it under `./data/`. The yaml file for this coco128 dataset is expected to have the following format:
|
|
```shell
|
|
train: ../yolov5/coco128/images/train2017 # 128 images
|
|
val: ../yolov5/coco128/images/train2017 # 128 images
|
|
|
|
nc: 80
|
|
|
|
# class names
|
|
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
|
|
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
|
|
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
|
|
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
|
|
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
|
|
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
|
|
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
|
|
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
|
|
'hair drier', 'toothbrush']
|
|
```
|
|
|
|
# Train
|
|
Let's look at how to train or finetune a model. There are several options and arguments to choose. We provided four types of backbone models with three kinds of input size and two types of FPN layers. You can find the FPS results of these models evaluated on 520 and 720 in the Models section.
|
|
|
|
let's finetune a pretrained model on COCO128 custom dataset (located under `yolov5` folder). The data yaml file is also prepared under `data` folder as `coco128.yaml`.
|
|
|
|
The pretrained model we used here is the model with backbone Darknet53s, pan FPN type, trained on COCO dataset. We download the pretrained model from [Model_Zoo](https://github.com/kneron/Model_Zoo/tree/main/detection/fcos). Since COCO128 is small, we choose to freeze the pretrained model backbone. Execute commands:
|
|
```shell
|
|
wget https://raw.githubusercontent.com/kneron/Model_Zoo/main/detection/fcos/coco_yolov5_pan_3_11_1.9920_0.4832.h5
|
|
```
|
|
|
|
```shell
|
|
python train.py --backbone darknet53s --fpn pan --snapshot coco_yolov5_pan_3_11_1.9920_0.4832.h5 --batch-size 4 --gpu 0 --steps 5 --epochs 2 --freeze-backbone --snapshot-path snapshots/exp --data data/coco128.yaml
|
|
```
|
|
|
|
The following training messages will be printed:
|
|
```shell
|
|
...
|
|
|
|
{'data': 'data/coco128.yaml', 'snapshot': 'coco_yolov5_pan_3_11_1.9920_0.4832.h5', 'backbone': 'darknet53s', 'fpn': 'pan', 'reg_func': 'linear', 'stage': 3, 'head_type': 'simple', 'centerness_pos': 'reg', 'batch_size': 4, 'gpu': '0', 'epochs': 2, 'steps': 5, 'lr': 0.0001, 'snapshot_path': 'snapshots/exp', 'freeze_backbone': True, 'input_size': 512, 'compute_val_loss': False}
|
|
|
|
...
|
|
|
|
Epoch 1/2
|
|
5/5 [==============================] - 15s 3s/step - loss: 2.9818 - regression_loss: 0.6292 - classification_loss: 1.1851 - centerness_loss: 0.6721
|
|
100% (126 of 126) |##################################################################################################################################################| Elapsed Time: 0:00:10 Time: 0:00:10
|
|
100% (126 of 126) |##################################################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
|
|
100% (80 of 80) |####################################################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
|
|
254 instances of class person with average precision: 0.6755
|
|
6 instances of class bicycle with average precision: 0.2104
|
|
46 instances of class car with average precision: 0.1860
|
|
5 instances of class motorcycle with average precision: 1.0000
|
|
6 instances of class airplane with average precision: 0.8333
|
|
7 instances of class bus with average precision: 0.6078
|
|
3 instances of class train with average precision: 1.0000
|
|
12 instances of class truck with average precision: 0.1598
|
|
6 instances of class boat with average precision: 0.3508
|
|
14 instances of class traffic light with average precision: 0.2176
|
|
0 instances of class fire hydrant with average precision: 0.0000
|
|
2 instances of class stop sign with average precision: 0.8333
|
|
0 instances of class parking meter with average precision: 0.0000
|
|
9 instances of class bench with average precision: 0.3277
|
|
16 instances of class bird with average precision: 0.7248
|
|
4 instances of class cat with average precision: 0.9500
|
|
9 instances of class dog with average precision: 0.5672
|
|
2 instances of class horse with average precision: 1.0000
|
|
0 instances of class sheep with average precision: 0.0000
|
|
0 instances of class cow with average precision: 0.0000
|
|
17 instances of class elephant with average precision: 0.8676
|
|
1 instances of class bear with average precision: 1.0000
|
|
4 instances of class zebra with average precision: 1.0000
|
|
9 instances of class giraffe with average precision: 0.8176
|
|
6 instances of class backpack with average precision: 0.4042
|
|
18 instances of class umbrella with average precision: 0.6162
|
|
19 instances of class handbag with average precision: 0.0699
|
|
7 instances of class tie with average precision: 0.4048
|
|
4 instances of class suitcase with average precision: 0.7750
|
|
5 instances of class frisbee with average precision: 0.7600
|
|
1 instances of class skis with average precision: 1.0000
|
|
7 instances of class snowboard with average precision: 0.8101
|
|
6 instances of class sports ball with average precision: 0.1667
|
|
10 instances of class kite with average precision: 0.0400
|
|
4 instances of class baseball bat with average precision: 0.1750
|
|
7 instances of class baseball glove with average precision: 0.0714
|
|
5 instances of class skateboard with average precision: 0.4107
|
|
0 instances of class surfboard with average precision: 0.0000
|
|
7 instances of class tennis racket with average precision: 0.3263
|
|
18 instances of class bottle with average precision: 0.1790
|
|
16 instances of class wine glass with average precision: 0.5150
|
|
36 instances of class cup with average precision: 0.3527
|
|
6 instances of class fork with average precision: 0.1807
|
|
16 instances of class knife with average precision: 0.4001
|
|
22 instances of class spoon with average precision: 0.2750
|
|
28 instances of class bowl with average precision: 0.5735
|
|
1 instances of class banana with average precision: 0.1111
|
|
0 instances of class apple with average precision: 0.0000
|
|
2 instances of class sandwich with average precision: 0.3333
|
|
4 instances of class orange with average precision: 0.4333
|
|
11 instances of class broccoli with average precision: 0.1912
|
|
24 instances of class carrot with average precision: 0.5854
|
|
2 instances of class hot dog with average precision: 0.7500
|
|
5 instances of class pizza with average precision: 0.9250
|
|
14 instances of class donut with average precision: 0.7913
|
|
4 instances of class cake with average precision: 0.9500
|
|
35 instances of class chair with average precision: 0.4005
|
|
6 instances of class couch with average precision: 0.4900
|
|
14 instances of class potted plant with average precision: 0.3834
|
|
3 instances of class bed with average precision: 0.7556
|
|
13 instances of class dining table with average precision: 0.4553
|
|
2 instances of class toilet with average precision: 0.5238
|
|
2 instances of class tv with average precision: 0.6667
|
|
3 instances of class laptop with average precision: 0.5556
|
|
2 instances of class mouse with average precision: 0.0000
|
|
8 instances of class remote with average precision: 0.5312
|
|
0 instances of class keyboard with average precision: 0.0000
|
|
8 instances of class cell phone with average precision: 0.0893
|
|
3 instances of class microwave with average precision: 0.8333
|
|
5 instances of class oven with average precision: 0.2833
|
|
0 instances of class toaster with average precision: 0.0000
|
|
6 instances of class sink with average precision: 0.1250
|
|
5 instances of class refrigerator with average precision: 0.8500
|
|
29 instances of class book with average precision: 0.1549
|
|
9 instances of class clock with average precision: 0.7222
|
|
2 instances of class vase with average precision: 0.4000
|
|
1 instances of class scissors with average precision: 0.1429
|
|
21 instances of class teddy bear with average precision: 0.5500
|
|
0 instances of class hair drier with average precision: 0.0000
|
|
5 instances of class toothbrush with average precision: 0.4310
|
|
mAP: 0.5106
|
|
|
|
Epoch 00001: mAP improved from -inf to 0.51057, saving model to snapshots/exp/csv_darknet53s_pan_3_01.h5
|
|
|
|
...
|
|
|
|
```
|
|
|
|
As we can see from the messages, the traininng losses and time are printed and the validation mAP were reported in each epoch. The trained model will be saved to `snapshots/exp` folder.
|
|
|
|
# Inference
|
|
In this section, we will go through an example of using a trained network for inference. That is, we'll pass an image into the network and detect and classify the object in the image. We will use the function `inference.py` that takes an image and a model, then returns the detection information. The output format is a list of list, [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id] ...]. We can also draw the bbox on the image if the save path is given.
|
|
|
|
Let's run our network on the following image, a screenshot from a movie, with the following code:
|
|
```shell
|
|
python inference.py --snapshot snapshots/exp/csv_darknet53s_pan_3_01.h5 --score-thres 0.6 --gpu 0 --class-id-path utils/coco_id_class_map.json --img-path tutorial/demo/fcos_demo.jpg --save-path tutorial/demo/out.jpg
|
|
|
|
...
|
|
|
|
{'img_path': 'tutorial/demo/fcos_demo.jpg', 'class_id_path': 'utils/coco_id_class_map.json', 'gpu': 0, 'snapshot': '/mnt/models/Model_Zoo/Backbone/FCOS/coco_yolov5_pan_3_11_1.9920_0.4832.h5', 'input_shape': [416, 416], 'max_objects': 100, 'score_thres': 0.6, 'iou_thres': 0.5, 'nms': 1, 'save_path': 'tutorial/demo/out.jpg'}
|
|
|
|
...
|
|
|
|
[[612.0043802261353, 158.87532234191895, 237.44566440582275, 855.7113647460938, 0.685798168182373, 0.0], [861.6132688522339, 120.55203437805176, 290.30023097991943, 876.4039421081543, 0.624030590057373, 0.0]]
|
|
|
|
```
|
|
Here we choose a model trained on COCO dataset and provide our class id mapping file. The original image and processed image are shown below.
|
|
<div align="center">
|
|
<img src="./demo/fcos_demo.jpg" width="45%" /> <img src="./demo/out.jpg" width="45%" />
|
|
</div>
|
|
|
|
We also get the inference result saved in a json file located under the same folder as the input image, `./tutorial/demo/fcos_demo_preds.json`.
|
|
|
|
Note that the class id mapping file is either a json or a csv file. The choice of class id mapping file depends on the dataset on which model was training. If the model was trained on COCO, it is ok to use `./utils/coco_id_class_map.json`. If the model was train on custom dataset, we shall use the csv file which was created during the dataset preparation process. You may check `./utils/coco_id_class_map.json` or `./tutorial/class_id.csv` for the format.
|
|
|
|
# Convert to ONNX
|
|
|
|
Pull the latest [ONNX converter](https://github.com/kneron/ONNX_Convertor/tree/master/keras-onnx) from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder `ONNX_Convertor/keras-onnx`:
|
|
(reference: https://github.com/kneron/ONNX_Convertor/tree/master/keras-onnx)
|
|
```shell
|
|
!git clone https://github.com/kneron/ONNX_Convertor.git
|
|
```
|
|
|
|
```shell
|
|
python ONNX_Convertor/keras-onnx/generate_onnx.py -o snapshots/exp/csv_darknet53s_pan_3_01_converted.onnx snapshots/exp/csv_darknet53s_pan_3_01.h5
|
|
```
|
|
|
|
The converted onnx model is `snapshots/exp/csv_darknet53s_pan_3_01_converted.onnx`.
|
|
|
|
# Evaluation
|
|
|
|
## Evaluation on COCO128
|
|
In this section, we will go through an example of evaluating a trained network on a dataset. Here, we will evaluate a pretrained model on MS COCO128 dataset. The `utils/eval.py` will report the mAP score for the model evaluated on the testing dataset.
|
|
|
|
```shell
|
|
python utils/eval.py --snapshot snapshots/exp/csv_darknet53s_pan_3_01.h5 --gpu 1 --conf-thres 0.15 --save-path bboxes_result.json --data data/coco128.yaml
|
|
|
|
{'data': 'data/coco128.yaml', 'gpu': 1, 'snapshot': 'snapshots/exp/csv_darknet53s_pan_3_01.h5', 'input_shape': [512, 512], 'save_path': 'bboxes_result.json', 'detections_path': None, 'conf_thres': 0.15, 'iou_thres': 0.35}
|
|
|
|
100% (126 of 126) |##################################################################################################################################################| Elapsed Time: 0:00:10 Time: 0:00:10
|
|
100% (126 of 126) |##################################################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
|
|
100% (80 of 80) |####################################################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
|
|
254 instances of class person with average precision: 0.7489
|
|
6 instances of class bicycle with average precision: 0.1667
|
|
46 instances of class car with average precision: 0.2072
|
|
5 instances of class motorcycle with average precision: 0.6833
|
|
6 instances of class airplane with average precision: 0.9048
|
|
7 instances of class bus with average precision: 0.6321
|
|
3 instances of class train with average precision: 0.6667
|
|
12 instances of class truck with average precision: 0.0278
|
|
6 instances of class boat with average precision: 0.4184
|
|
14 instances of class traffic light with average precision: 0.2823
|
|
0 instances of class fire hydrant with average precision: 0.0000
|
|
2 instances of class stop sign with average precision: 0.8333
|
|
0 instances of class parking meter with average precision: 0.0000
|
|
9 instances of class bench with average precision: 0.2108
|
|
16 instances of class bird with average precision: 0.5609
|
|
4 instances of class cat with average precision: 0.5000
|
|
9 instances of class dog with average precision: 0.3657
|
|
2 instances of class horse with average precision: 0.5000
|
|
0 instances of class sheep with average precision: 0.0000
|
|
0 instances of class cow with average precision: 0.0000
|
|
17 instances of class elephant with average precision: 0.9606
|
|
1 instances of class bear with average precision: 1.0000
|
|
4 instances of class zebra with average precision: 1.0000
|
|
9 instances of class giraffe with average precision: 0.9181
|
|
6 instances of class backpack with average precision: 0.0333
|
|
18 instances of class umbrella with average precision: 0.7202
|
|
19 instances of class handbag with average precision: 0.1372
|
|
7 instances of class tie with average precision: 0.6458
|
|
4 instances of class suitcase with average precision: 0.6929
|
|
5 instances of class frisbee with average precision: 0.8000
|
|
1 instances of class skis with average precision: 1.0000
|
|
7 instances of class snowboard with average precision: 0.6264
|
|
6 instances of class sports ball with average precision: 0.1790
|
|
10 instances of class kite with average precision: 0.2250
|
|
4 instances of class baseball bat with average precision: 0.4076
|
|
7 instances of class baseball glove with average precision: 0.2857
|
|
5 instances of class skateboard with average precision: 0.4200
|
|
0 instances of class surfboard with average precision: 0.0000
|
|
7 instances of class tennis racket with average precision: 0.4389
|
|
18 instances of class bottle with average precision: 0.1204
|
|
16 instances of class wine glass with average precision: 0.6377
|
|
36 instances of class cup with average precision: 0.4123
|
|
6 instances of class fork with average precision: 0.0972
|
|
16 instances of class knife with average precision: 0.4170
|
|
22 instances of class spoon with average precision: 0.4226
|
|
28 instances of class bowl with average precision: 0.5763
|
|
1 instances of class banana with average precision: 0.0909
|
|
0 instances of class apple with average precision: 0.0000
|
|
2 instances of class sandwich with average precision: 0.0000
|
|
4 instances of class orange with average precision: 0.4688
|
|
11 instances of class broccoli with average precision: 0.2327
|
|
24 instances of class carrot with average precision: 0.6571
|
|
2 instances of class hot dog with average precision: 0.0000
|
|
5 instances of class pizza with average precision: 0.8000
|
|
14 instances of class donut with average precision: 0.7384
|
|
4 instances of class cake with average precision: 0.6250
|
|
35 instances of class chair with average precision: 0.4219
|
|
6 instances of class couch with average precision: 0.1056
|
|
14 instances of class potted plant with average precision: 0.4855
|
|
3 instances of class bed with average precision: 0.6667
|
|
13 instances of class dining table with average precision: 0.5797
|
|
2 instances of class toilet with average precision: 0.6667
|
|
2 instances of class tv with average precision: 0.6667
|
|
3 instances of class laptop with average precision: 0.3333
|
|
2 instances of class mouse with average precision: 0.0714
|
|
8 instances of class remote with average precision: 0.4332
|
|
0 instances of class keyboard with average precision: 0.0000
|
|
8 instances of class cell phone with average precision: 0.0774
|
|
3 instances of class microwave with average precision: 0.8333
|
|
5 instances of class oven with average precision: 0.4594
|
|
0 instances of class toaster with average precision: 0.0000
|
|
6 instances of class sink with average precision: 0.0000
|
|
5 instances of class refrigerator with average precision: 0.8000
|
|
29 instances of class book with average precision: 0.1919
|
|
9 instances of class clock with average precision: 0.8765
|
|
2 instances of class vase with average precision: 0.3333
|
|
1 instances of class scissors with average precision: 0.0000
|
|
21 instances of class teddy bear with average precision: 0.6512
|
|
0 instances of class hair drier with average precision: 0.0000
|
|
5 instances of class toothbrush with average precision: 0.4462
|
|
mAP: 0.4732
|
|
|
|
```
|
|
|
|
## End-to-End Evaluation
|
|
Let's look at an example of end-to-end inference on COCO128. First, we need an initial parameter yaml file for the inference runner:
|
|
```bash
|
|
checkpoint: snapshots/exp/csv_darknet53s_pan_3_01.h5 # trained model
|
|
|
|
input_h: 512
|
|
input_w: 512
|
|
max_objects: 100
|
|
score_thres: 0.6
|
|
iou_thres: 0.5
|
|
```
|
|
This initial parameter yaml file is saved as `utils/init_params.yaml`. Then, execute the following command under `fcos` directory:
|
|
```shell
|
|
python inference_e2e.py --img-path ../yolov5/coco128/images/train2017/ --params utils/init_params.yaml --save-path coco128_preds.json
|
|
```
|
|
The predictions are stored in `coco128_preds.json`.
|