2026-03-11 16:13:59 +08:00
..
2026-03-11 16:13:59 +08:00
2026-03-11 16:13:59 +08:00
2026-03-11 16:13:59 +08:00
2026-03-11 16:13:59 +08:00

Object Detection with YOLOv5

This tutorial will go through a concrete example of how to train a YOLOv5 object detection model via our AI training platform. The coco128 dataset is provided.

Prerequisites

First of all, we have to install the libraries. Python>=3.8 is required. For other libraries, you can check the requirements.txt file. Installing these packages is simple. You can install them by running:

$ pip install -U pip
$ pip install -r requirements.txt

Dataset & Preparation

Next, we need a dataset for the training model. For this tutorial, we use COCO128 dataset.

Annotations Format

After using a tool like CVAT, makesense.ai or Labelbox to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

  • One row per object
  • Each row is class x_center y_center width height format.
  • Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
  • Class numbers are zero-indexed (start from 0).

Here, let's go through a toy example for preparing the annotation files via makesense.ai.

(1) Upload images to makesense.ai and select Object Detection option.

(2) Create labels, and then draw the bounding boxes and choose labels for each image.

(3) Export the annotations with YOLO format.

(4) Eventually, you should get *.txt file for each image. (if no objects in image, no *.txt file is created)

Directory Organization

Your own datasets are expected to have the following structure. We assume /coco128 is next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/.

dataset.yaml

The yaml file for COCO dataset has been prepared in ./data/coco.yaml. For custom dataset, you need to prepare the yaml file and save it under ./data/. The yaml file is expected to have the following format:

Train

Let's look at how to train or finetune a model. There are several options and arguments to choose. We provided two types of backbone models, one for 520 (without upsampling) and one for 720 (with upsampling).

For training on custom dataset, let's use the COCO 128 dataset. Following the instructions in the dataset preparation section, we put the data folder /coco128 next to the /yolov5 directory and prepare coco128.yaml saved under the folder /yolov5/data/. We download the pretrained model from Model_Zoo. Suppose we would like to finetune a pretrained model for 520 and just run 2 epochs. Execute commands in the folder yolov5:

!wget https://raw.githubusercontent.com/kneron/Model_Zoo/main/detection/yolov5/yolov5s-noupsample/best.pt
CUDA_VISIBLE_DEVICES='0' python train.py --data coco128.yaml --cfg yolov5s-noupsample.yaml --weights 'best.pt' --batch-size 2 --epoch 2

We get the trained model weights in ./runs/train/exp/weights/best.pt.

Note that video uses input (640w x 352h) to run faster. Coco has high or flat wide images, so it is better to use input (640w x 640h)

Generating .npy for different model input

We can generating .npy for different model input by using yolov5_generate_npy.py. Execute commands in the folder generate_npy:

python yolov5_generate_npy.py

We could get *.npy.

Configure the paths yaml file

You are expected to create a yaml file which stores all the paths related to the trained models. You can check and modify the pretrained_paths_520.yaml and pretrained_paths_720.yaml under /yolov5/data/. Here is the config for our model trained on COCO128 model_paths_520_coco128.yaml:

grid_dir: ../generate_npy/
grid20_path: ../generate_npy/20_640x640.npy
grid40_path: ../generate_npy/40_640x640.npy
grid80_path: ../generate_npy/80_640x640.npy

yolov5_dir: ./
path: ./runs/train/exp/weights/best.pt
yaml_path: ./models/yolov5s-noupsample.yaml
pt_path: ./yolov5s-noupsample-coco128.pt   # pytorch 1.4
onnx_export_file: ./yolov5s-noupsample-coco128.onnx

input_w: 640
input_h: 640
# number of classes
nc: 80
# class names
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
        'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
        'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
        'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
        'hair drier', 'toothbrush']

Save and Convert to ONNX

For now, we have trained the YOLOv5 model. This section will walk you through how to save the trained model for onnx converter supported format and convert to ONNX.

Exporting onnx model in the pytorch1.7 environment

We can convert the model to onnx by using yolov5_export.py. Execute commands in the folder yolov5:

python ../exporting/yolov5_export.py --data ../yolov5/data/model_paths_520_coco128.yaml

We could get yolov5s-noupsample-coco128.onnx under the folder yolov5.

Converting onnx by tool chain

Pull the latest ONNX converter from github. Execute commands in the folder ONNX_Convertor/optimizer_scripts: (reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts)

python -m onnxsim yolov5s-noupsample-coco128.onnx yolov5s-noupsample-coco128.onnx

git clone https://github.com/kneron/ONNX_Convertor.git

python ONNX_Convertor/optimizer_scripts/pytorch2onnx.py yolov5s-noupsample-coco128.onnx yolov5s-noupsample-coco128_convert.onnx

We could get yolov5s-noupsample-coco128_convert.onnx.

Inference

In this section, we will go through an example of using a trained network for inference. That is, we'll pass an image into the network and detect and classify the object in the image. Before model inference, we assume that the model has been converted to onnx model as in the previous section. We will use the function inference.py that takes an image and a model, then returns the detection information. The output format is a list of list, [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id] ...]. We can also draw the bbox on the image if the save path is given. You could find preprocessing and postprocessing processes under the folder exporting/yolov5/.

In this tutorial, we choose to run our yolov5 model on 520. First, we would like to save the model path information in a yaml file, called pretrained_paths_520.yaml, under the folder data. Here, we could reuse the yaml file which was created when we convert PyTorch model to ONNX.

For model inference on a single image, execute commands in the folder yolov5 and the outputs are as following:

python inference.py --data data/model_paths_520_coco128.yaml --conf_thres 0.6 --img-path tutorial/demo/yolo_demo.jpg --save-path tutorial/demo/out.jpg

[[934.0, 183.0, 284.0, 751.0, 0.8913591504096985, 0.0], [670.0, 225.0, 224.0, 696.0, 0.8750525712966919, 0.0]]

Here we choose a model trained on COCO128 dataset and class labels and pretrained model paths are defined in the yaml file data/model_paths_520_coco128.yaml. The original image and processed image are shown below.

Note that if the model was train on custom dataset, you have to modify the yaml file.

If you would like to use ONNX model for inference, you need to add --onnx arguments when you execute inference.py.

Evaluation

In this section, we will evaluate our trained model on COCO128 dataset. Execute commands in the folder yolov5 and the outputs are as following:

python test.py --weights runs/train/exp/weights/best.pt --verbose

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='data/coco128.yaml', device='cpu', exist_ok=False, img_size=640, iou_thres=0.65, name='exp', project='runs/test', save_conf=False, save_json=False, save_txt=False, single_cls=False, task='val', verbose=True, weights=['runs/train/exp/weights/best.pt'])
Using torch 1.7.0 CPU

Fusing layers... 
Model Summary: 164 layers, 6772285 parameters, 0 gradients
***cache_path ../coco128/labels/train2017.cache
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 9335.42it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [01:07<00:00, 16.95s/it]
                 all         128         929       0.284       0.562       0.492       0.307
              person         128         254        0.37       0.764       0.718       0.437
             bicycle         128           6       0.373         0.5        0.36       0.217
                 car         128          46       0.286       0.326       0.275       0.145
          motorcycle         128           5       0.433           1       0.962       0.701
            airplane         128           6       0.559       0.833       0.824       0.539
                 bus         128           7       0.412       0.714         0.7       0.588
               train         128           3       0.209       0.667       0.552       0.269
               truck         128          12       0.452       0.412       0.376       0.135
                boat         128           6       0.109       0.333       0.229      0.0458
       traffic light         128          14      0.0488      0.0714       0.096      0.0599
           stop sign         128           2       0.636           1       0.995       0.747
               bench         128           9       0.152       0.222       0.171      0.0814
                bird         128          16       0.459       0.562       0.538        0.28
                 cat         128           4       0.353           1       0.725       0.548
                 dog         128           9       0.532       0.667       0.632       0.422
               horse         128           2        0.31           1       0.995       0.473
            elephant         128          17       0.666       0.824        0.84       0.606
                bear         128           1       0.323           1       0.995       0.896
               zebra         128           4       0.721           1       0.995       0.921
             giraffe         128           9       0.459       0.889       0.928       0.551
            backpack         128           6       0.291       0.333       0.386       0.193
            umbrella         128          18       0.394         0.5       0.458       0.208
             handbag         128          19       0.101       0.105       0.112      0.0483
                 tie         128           7         0.3       0.714         0.6       0.355
            suitcase         128           4       0.672         0.5       0.697       0.193
             frisbee         128           5       0.315         0.8       0.665       0.416
                skis         128           1       0.103           1       0.498      0.0498
           snowboard         128           7       0.534       0.821       0.674        0.36
         sports ball         128           6       0.165         0.5       0.258       0.155
                kite         128          10       0.225         0.2       0.133      0.0334
        baseball bat         128           4       0.016       0.052       0.055      0.0275
      baseball glove         128           7      0.0989       0.286       0.292       0.146
          skateboard         128           5       0.323         0.4       0.376       0.259
       tennis racket         128           7       0.105       0.429       0.327       0.164
              bottle         128          18       0.202       0.611       0.372       0.214
          wine glass         128          16        0.22       0.438       0.397       0.252
                 cup         128          36       0.297       0.389       0.345       0.206
                fork         128           6      0.0841       0.167       0.177       0.135
               knife         128          16       0.301         0.5       0.408       0.143
               spoon         128          22       0.232       0.273        0.31        0.12
                bowl         128          28       0.393       0.714       0.591       0.393
              banana         128           1        0.13           1       0.332      0.0332
            sandwich         128           2       0.183       0.459       0.115       0.103
              orange         128           4       0.096        0.25       0.125      0.0856
            broccoli         128          11       0.107      0.0909       0.116      0.0998
              carrot         128          24       0.198       0.708       0.409       0.231
             hot dog         128           2       0.274           1       0.828       0.746
               pizza         128           5       0.588         0.6        0.66       0.473
               donut         128          14       0.249           1       0.858        0.66
                cake         128           4       0.388           1       0.788       0.547
               chair         128          35       0.174         0.6       0.331       0.156
               couch         128           6       0.367       0.667       0.678       0.403
        potted plant         128          14       0.249       0.571        0.49         0.3
                 bed         128           3       0.623       0.667       0.677       0.224
        dining table         128          13        0.26       0.538       0.449       0.289
              toilet         128           2      0.0943         0.5       0.497       0.397
                  tv         128           2       0.198           1       0.995       0.696
              laptop         128           3           0           0      0.0184      0.0111
               mouse         128           2           0           0           0           0
              remote         128           8       0.339         0.5       0.512        0.33
          cell phone         128           8      0.0833       0.125      0.0382      0.0208
           microwave         128           3       0.248           1       0.995       0.502
                oven         128           5       0.143         0.4       0.336       0.222
                sink         128           6       0.106       0.167      0.0876       0.078
        refrigerator         128           5        0.35         0.6       0.564       0.403
                book         128          29       0.143       0.138       0.139      0.0655
               clock         128           9       0.435       0.889       0.848       0.679
                vase         128           2      0.0816           1       0.995       0.846
            scissors         128           1           0           0      0.0524     0.00524
          teddy bear         128          21       0.495       0.514       0.522       0.249
          toothbrush         128           5         0.3         0.4        0.44       0.186
Speed: 243.3/124.9/368.3 ms inference/NMS/total per 640x640 image at batch-size 32
Results saved to runs/test/exp