8.4 KiB
Object Detection
Object Detection task with fcos model.This document contains the explanations of arguments of each script.
You can find the tutorial for finetuning a pretrained model on custom dataset under the tutorial folder, tutorial/README.md.
The ipython notebook tutorial is also prepared under the tutorial folder as tutorial/tutorial.ipynb. You may upload and run this ipython notebook on Google colab.
Prerequisites
- Python = 3.6 or 3.7
Installation
To install the dependencies, run
$ pip install -U pip
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace
Dataset & Preparation
Standard Datasets
Our traning script accepts standard PASCAL VOC dataset and MS COCO dataset. You may download the dataset using the following link:
- Download 2012 PASCAL VOC Dataset
- Download 2017 MS COCO Dataset
Custom Datasets
You can also train the model on a custom dataset. The custom dataset is expected to follow the YOLO format. You may visit yolov5 document for more details.
Annotation Tools
You can use makesense.ai to create bounding boxes and labels for your images. For more details, you may visit makesense.ai and check their documents. An example of using makesense.ai to annotate custom data is also provided in the tutorial document.
dataset.yaml
For COCO dataset, you need to prepare the yaml file and save it under ./data/coco.yaml. The yaml file is expected to have the following format:
data_root: path to coco dataset dirtory
# type of dataset
dataset_type: coco
val_set_name: val2017
train_set_name: train2017
train_annotations_path: path to coco training annotations path
val_annotations_path: path to coco training validation path
For Pascal VOC dataset, you need to prepare the yaml file and save it under ./data/pascal.yaml. The yaml file is expected to have the following format:
data_root: path_to_voc_dataset/VOCdevkit/VOC2012
train: 'trainval'
val: 'val'
# type of dataset
dataset_type: pascal
For custom dataset, you need to prepare the yaml file and save it under ./data/. The yaml file is expected to have the following format (same as yolov5):
train: path to training dataset directory
val: path to validation dataset directory
nc: number of class
names: list of class names
Train
All outputs (log files and checkpoints) will be saved to the snapshot directory,
which is specified by --snapshot-path. For training, execute the following command in fcos directory:
python train.py --backbone backbone_model_name --snapshot path_to_pretrained_model --freeze-backbone --batch-size 4 --gpu 0 --data path_to_data_yaml_file
--backbone Which backbone model to use.
--snapshot The path to pretrained model
--freeze-backbone Whether freeze the backbone when the pretrained model is used (True/False)
--gpu Which gpu to run. (-1 if cpu)
--batch-size Batch size. (Default: 4)
--epochs Number of epochs to train. (Default: 100)
--steps Number of steps per epoch. (Default: 5000)
--lr Learning rate. (Default: 1e-4)
--fpn The type of fpn model. Options: bifpn, dla, fpn, pan, simple (Default: simple) (Recommend: simple or pan)
--reg-func The type of regression function. Options: exp, simple (Default: simple)
--stage The num of stages. Options: 3, 5 (Default: 3)
--head-type The type of head. Options: ori, simple (Default: simple)
--centerness-pos Centerness branch position. Options: cls, reg (Default: reg)
--snapshot-path Path to store snapshots of models during training (Default: 'snapshots/{}'.format(today))
--input-size Input size of the model (Default: (512, 512))
--data The path to data yaml file
When the validation mAP stops increasing for 5 epochs, the early stopping will be triggered and the training process will be terminated.
Inference
For model infernce on a single image:
python inference.py --snapshot path_to_pretrained_model --input-shape model_input_size --gpu 0 --class-id-path path_to_class_id_mapping_file --img-path path_to_image --save-path path_to_saved_image
--snapshot the path to pretrained model
--gpu which gpu to run. (-1 if cpu) (Default: -1)
--input-shape Input shape of the model (Default: (512, 512))
--class-id-path Path to the class id mapping file.
--img-path Path to the image.
--save-path Path to draw and save the image with bbox.
--save-preds-path Path to save the inference bbox results.
--class-id-path Path to the class id mapping file. (Default: COCO class id mapping)
--max-objects The maximum number of objects in the image. (Default: 100)
--score-thres The score threshold of bounding boxes. (Default: 0.6)
--iou-thres the iou threshold for NMS. (Default: 0.5)
--max-objects Whether use Non-maximum Suppression (Default: 1)
You could find preprocessing and postprocessing processes in fcos/utils/fcos_det_preprocess.py and fcos/utils/fcos_det_postprocess.py.
Convert to ONNX
Pull the latest ONNX converter from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder ONNX_Convertor/keras-onnx:
python generated_onnx.py -o outputfile.onnx inputfile.h5
Evaluation
Evaluation Metric
We will use mean Average Precision (mAP) for evaluation. You can find the script for computing mAP in utils/eval.py.
mAP: mAP is the average of Average Precision (AP). AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:
where and
are the precision and recall at the nth threshold. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.
Evaluation on a Dataset
For evaluating the trained model on dataset:
python utils/eval.py --snapshot path_to_pretrained_model --gpu 0 --input-shape model_input_size --data path_to_data_yaml_file
--snapshot Path to pretrained model
--gpu Which gpu to run. (-1 if cpu) (Default: -1)
--input-shape Input shape of the model (Default: (512, 512))
--class-id-path Path to the class id mapping file.
--data The path to data yaml file
End-to-End Evaluation
If you would like to perform an end-to-end test with an image dataset, you can use inference_e2e.py under the directory fcos to obtain the prediction results.
You have to prepare an initial parameter yaml file for the inference runner. You may check utils/init_params.json for the format.
python inference_e2e.py --img-path path_to_dataset_folder --params path_to_init_params_file --save-path path_to_save_json_file
--img-path Path to the dataset directory
--params Path to initial parameter yaml file for the inference runner
--save-path Path to save the prediction to a json file
--gpu GPU id (-1 if cpu) (Default: -1)
The predictions will be saved into a json file that has the following structure:
[
{'img_path':image_path_1
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
{'img_path':image_path_2
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
...
]
Models
| Backbone | Input Size | FPN Type | FPS on 520 | FPS on 720 | Model Size |
|---|---|---|---|---|---|
| darknet53s | 512 | simple | 5.96303 | 36.6844 | 25.3M |
| darknet53s | 416 | pan | 7.27369 | 48.8437 | 33.9M |
| darknet53ss | 416 | simple | 20.6361 | 136.093 | 6.9M |
| darknet53ss | 320 | simple | 33.9502 | 252.713 | 6.9M |
| resnet18 | 512 | simple | 5.75156 | 33.9144 | 25.2M |
| resnet18 | 416 | simple | 8.04252 | 52.9392 | 25.2M |
| resnet18 | 320 | simple | 13.0232 | 94.5782 | 25.2M |
| resnet18 | 512 | pan | 4.88634 | 30.1866 | 33.8M |
| resnet18 | 416 | pan | 6.8977 | 46.9993 | 33.8M |
| resnet18 | 320 | pan | 10.9281 | 82.4277 | 33.8M |
| \ | darknet53s |
|---|---|
| mAP | 44.8% |