231 lines
8.4 KiB
Markdown

<h1 align="center"> Object Detection </h1>
Object Detection task with fcos model.
This document contains the explanations of arguments of each script.
You can find the tutorial for finetuning a pretrained model on custom dataset under the `tutorial` folder, `tutorial/README.md`.
The ipython notebook tutorial is also prepared under the `tutorial` folder as `tutorial/tutorial.ipynb`. You may upload and run this ipython notebook on Google colab.
# Prerequisites
- Python = 3.6 or 3.7
# Installation
To install the dependencies, run
```
$ pip install -U pip
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace
```
# Dataset & Preparation
## Standard Datasets
Our traning script accepts standard PASCAL VOC dataset and MS COCO dataset. You may download the dataset using the following link:
- Download [2012 PASCAL VOC Dataset](http://host.robots.ox.ac.uk/pascal/VOC/)
- Download [2017 MS COCO Dataset](https://cocodataset.org/#download)
## Custom Datasets
You can also train the model on a custom dataset. The custom dataset is expected to follow the YOLO format. You may visit yolov5 document for more details.
### Annotation Tools
You can use [makesense.ai](https://www.makesense.ai) to create bounding boxes and labels for your images. For more details, you may visit [makesense.ai](https://www.makesense.ai) and check their documents. An example of using [makesense.ai](https://www.makesense.ai) to annotate custom data is also provided in the tutorial document.
### dataset.yaml
For COCO dataset, you need to prepare the yaml file and save it under `./data/coco.yaml`. The yaml file is expected to have the following format:
```shell
data_root: path to coco dataset dirtory
# type of dataset
dataset_type: coco
val_set_name: val2017
train_set_name: train2017
train_annotations_path: path to coco training annotations path
val_annotations_path: path to coco training validation path
```
For Pascal VOC dataset, you need to prepare the yaml file and save it under `./data/pascal.yaml`. The yaml file is expected to have the following format:
```shell
data_root: path_to_voc_dataset/VOCdevkit/VOC2012
train: 'trainval'
val: 'val'
# type of dataset
dataset_type: pascal
```
For custom dataset, you need to prepare the yaml file and save it under `./data/`. The yaml file is expected to have the following format (same as yolov5):
```shell
train: path to training dataset directory
val: path to validation dataset directory
nc: number of class
names: list of class names
```
# Train
All outputs (log files and checkpoints) will be saved to the snapshot directory,
which is specified by `--snapshot-path`. For training, execute the following command in `fcos` directory:
```shell
python train.py --backbone backbone_model_name --snapshot path_to_pretrained_model --freeze-backbone --batch-size 4 --gpu 0 --data path_to_data_yaml_file
```
`--backbone` Which backbone model to use.
`--snapshot` The path to pretrained model
`--freeze-backbone` Whether freeze the backbone when the pretrained model is used (True/False)
`--gpu` Which gpu to run. (-1 if cpu)
`--batch-size` Batch size. (Default: 4)
`--epochs` Number of epochs to train. (Default: 100)
`--steps` Number of steps per epoch. (Default: 5000)
`--lr` Learning rate. (Default: 1e-4)
`--fpn` The type of fpn model. Options: bifpn, dla, fpn, pan, simple (Default: simple) (Recommend: simple or pan)
`--reg-func` The type of regression function. Options: exp, simple (Default: simple)
`--stage` The num of stages. Options: 3, 5 (Default: 3)
`--head-type` The type of head. Options: ori, simple (Default: simple)
`--centerness-pos` Centerness branch position. Options: cls, reg (Default: reg)
`--snapshot-path` Path to store snapshots of models during training (Default: 'snapshots/{}'.format(today))
`--input-size` Input size of the model (Default: (512, 512))
`--data` The path to data yaml file
When the validation mAP stops increasing for 5 epochs, the early stopping will be triggered and the training process will be terminated.
# Inference
For model infernce on a single image:
```shell
python inference.py --snapshot path_to_pretrained_model --input-shape model_input_size --gpu 0 --class-id-path path_to_class_id_mapping_file --img-path path_to_image --save-path path_to_saved_image
```
`--snapshot` the path to pretrained model
`--gpu` which gpu to run. (-1 if cpu) (Default: -1)
`--input-shape` Input shape of the model (Default: (512, 512))
`--class-id-path` Path to the class id mapping file.
`--img-path` Path to the image.
`--save-path` Path to draw and save the image with bbox.
`--save-preds-path` Path to save the inference bbox results.
`--class-id-path` Path to the class id mapping file. (Default: COCO class id mapping)
`--max-objects` The maximum number of objects in the image. (Default: 100)
`--score-thres` The score threshold of bounding boxes. (Default: 0.6)
`--iou-thres` the iou threshold for NMS. (Default: 0.5)
`--max-objects` Whether use Non-maximum Suppression (Default: 1)
You could find preprocessing and postprocessing processes in `fcos/utils/fcos_det_preprocess.py` and `fcos/utils/fcos_det_postprocess.py`.
# Convert to ONNX
Pull the latest [ONNX converter](https://github.com/kneron/ONNX_Convertor/tree/master/keras-onnx) from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder `ONNX_Convertor/keras-onnx`:
```shell
python generated_onnx.py -o outputfile.onnx inputfile.h5
```
# Evaluation
## Evaluation Metric
We will use mean Average Precision (mAP) for evaluation. You can find the script for computing mAP in `utils/eval.py`.
`mAP`: mAP is the average of Average Precision (AP). AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:
<img src="https://latex.codecogs.com/svg.image?AP&space;=&space;\sum_n&space;(R_n-R_{n-1})P_n&space;" title="AP = \sum_n (R_n-R_{n-1})P_n " />
where <img src="https://latex.codecogs.com/svg.image?R_n" title="R_n" /> and <img src="https://latex.codecogs.com/svg.image?P_n" title="P_n" /> are the precision and recall at the nth threshold. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.
## Evaluation on a Dataset
For evaluating the trained model on dataset:
```shell
python utils/eval.py --snapshot path_to_pretrained_model --gpu 0 --input-shape model_input_size --data path_to_data_yaml_file
```
`--snapshot` Path to pretrained model
`--gpu` Which gpu to run. (-1 if cpu) (Default: -1)
`--input-shape` Input shape of the model (Default: (512, 512))
`--class-id-path` Path to the class id mapping file.
`--data` The path to data yaml file
## End-to-End Evaluation
If you would like to perform an end-to-end test with an image dataset, you can use `inference_e2e.py` under the directory `fcos` to obtain the prediction results.
You have to prepare an initial parameter yaml file for the inference runner. You may check `utils/init_params.json` for the format.
```shell
python inference_e2e.py --img-path path_to_dataset_folder --params path_to_init_params_file --save-path path_to_save_json_file
```
`--img-path` Path to the dataset directory
`--params` Path to initial parameter yaml file for the inference runner
`--save-path` Path to save the prediction to a json file
`--gpu` GPU id (-1 if cpu) (Default: -1)
The predictions will be saved into a json file that has the following structure:
```bash
[
{'img_path':image_path_1
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
{'img_path':image_path_2
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
...
]
```
# Models
Backbone | Input Size | FPN Type | FPS on 520 | FPS on 720 | Model Size
--- | --- | --- |:---:|:---: |:---:
darknet53s | 512 | simple | 5.96303 | 36.6844 | 25.3M
[darknet53s](https://github.com/kneron/Model_Zoo/tree/main/detection/fcos) | 416 | pan | 7.27369 | 48.8437 | 33.9M
darknet53ss | 416 | simple | 20.6361 | 136.093 | 6.9M
darknet53ss | 320 | simple | 33.9502 | 252.713 | 6.9M
resnet18 | 512 | simple | 5.75156 | 33.9144 | 25.2M
resnet18 | 416 | simple | 8.04252 | 52.9392 | 25.2M
resnet18 | 320 | simple | 13.0232 | 94.5782 | 25.2M
resnet18 | 512 | pan | 4.88634 | 30.1866 | 33.8M
resnet18 | 416 | pan | 6.8977 | 46.9993 | 33.8M
resnet18 | 320 | pan | 10.9281 | 82.4277 | 33.8M
\ | darknet53s |
--- |:---:
mAP | 44.8% |