Lite-HRNet: A Lightweight High-Resolution Network

Introduction

In this tutorial, we will provide an example for training and evaluating Lite-HRNet model on COCO dataset for pose estimation task.

Installation

Firstly, we have to install the necessary libraries. Python 3.6+, CUDA 9.2+, GCC 5+ and Pytorch 1.3+ are required. For installing Pytorch, you have to check your CUDA version and select the correct Pytorch version. You can check your CUDA version by executing nvidia-smi in your terminal. For this tutorial, we choose to install Pytorch 1.7.0 by executing:

$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch

For other libraries, you can check the requirements.txt file. Installing these packages is simple. You can install them by running:

$ pip install -r requirements.txt

Moreover, we have to install mmcv with the version 1.3.3 with pytroch version 1.7.0 and CUDA version 11.0:

pip install mmcv-full==1.3.3 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html

Dataset & Preparation

For this tutorial, we will use MS COCO dataset. You can download it from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. HRNet-Human-Pose-Estimation provides person detection result of COCO val2017 to reproduce the multi-person pose estimation results. Please download from OneDrive Download and extract them under litehrnet/data, and make them look like this:

lite_hrnet
├── configs
├── models
├── tools
`── data
    │── coco
        │-- annotations
        │   │-- person_keypoints_train2017.json
        │   |-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        │-- train2017
        │   │-- 000000000009.jpg
        │   │-- 000000000025.jpg
        │   │-- 000000000030.jpg
        │   │-- ...
        `-- val2017
            │-- 000000000139.jpg
            │-- 000000000285.jpg
            │-- 000000000632.jpg
            │-- ...

Modify MMPose for Kneron PPP

To use Kneron pre-post-processing during training and testing, we replaced some files in the mmpose package in the python/anaconda env by the files in the mmpose_replacement folder, by excuting:

$ cp mmpose_replacement/post_transforms.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/core/post_processing/post_transforms.py
$ cp mmpose_replacement/top_down_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/top_down_transform.py
$ cp mmpose_replacement/loading.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/loading.py
$ cp mmpose_replacement/shared_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/shared_transform.py

Moreover, we copy prepostprocess/kneron_preprocessing/ to our python/anaconda env:

$ cp -r prepostprocess/kneron_preprocessing/ .

Train

For this tutorial, we choose to use the config file /litehrnet/configs/top_down/lite_hrnet/coco//litehrnet_30_coco_256x192.py. Before training our model, let's take a quick look of the basic settings in the config file:

log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=10)
evaluation = dict(interval=10, metric='mAP')

optimizer = dict(
    type='Adam',
    lr=2e-3,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    # warmup=None,
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[170, 200])
total_epochs = 210
log_config = dict(
    interval=50, hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

Here, we train the model from scratch. If you would like to fine-tune a pretrained model, you can assign the pretrained model path to load_from or resume_from.

The difference between resume-from and load-from in CONFIG_FILE:
resume-from loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. load-from only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.

We will save the model every 10 epochs (as stated in checkpoint_config), and validate the model every 10 epochs with metric mAP (as stated in evaluation). The optimizer used here is Adam with base learning rate 2e-3, and a learning rate scheduler is set up in lr_config. The total number of epoch is 210.

Here is settings for the datasets in config file.

data_cfg = dict(
    image_size=[192, 256],
    heatmap_size=[48, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'],
    soft_nms=False,
    nms_thr=1.0,
    oks_thr=0.9,
    vis_thr=0.2,
    bbox_thr=1.0,
    use_gt_bbox=False,
    image_thr=0.0,
    bbox_file='data/coco/person_detection_results/'
    'COCO_val2017_detections_AP_H_56_person.json',
)

...

data_root = 'data/coco'
data = dict(
    samples_per_gpu=64,
    workers_per_gpu=4,
    train=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
        img_prefix=f'{data_root}/train2017/',
        data_cfg=data_cfg,
        pipeline=train_pipeline),
    val=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
        img_prefix=f'{data_root}/val2017/',
        data_cfg=val_data_cfg,
        pipeline=val_pipeline),
    test=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
        img_prefix=f'{data_root}/val2017/',
        data_cfg=data_cfg,
        pipeline=val_pipeline),
)

The input image size of the model is 192x256 and the corresponding hyper-parameters are specified in data_cfg dictionary. The data root is given in data_root as 'data/coco'. The batch size per gpu is 64 and workers per gpu is 4 (in data dictionary). The annotation file paths are also saved in data dictionary.

After setting up all the configurations, we are ready to train our model:

# train with a signle GPU
python train.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py

All outputs (log files and checkpoints) will be saved to the working directory (default: /litehrnet/work_dirs/).

Convert to ONNX

To export onnx model, we have to modify a forward function in the mmpose package. The specific file is site-packages/mmpose/models/detectors/top_down.py in your python/anaconda env. You can use python -m site to check you env. Change the forward function in line 81 from:

def forward(self,
            img,
            target=None,
            target_weight=None,
            img_metas=None,
            return_loss=True,
            return_heatmap=False,
            **kwargs):
    """Calls either forward_train or forward_test depending on whether
    return_loss=True. Note this setting will change the expected inputs.
    When `return_loss=True`, img and img_meta are single-nested (i.e.
    Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
    should be double nested (i.e.  List[Tensor], List[List[dict]]), with
    the outer list indicating test time augmentations.

    Note:
        batch_size: N
        num_keypoints: K
        num_img_channel: C (Default: 3)
        img height: imgH
        img width: imgW
        heatmaps height: H
        heatmaps weight: W

    Args:
        img (torch.Tensor[NxCximgHximgW]): Input images.
        target (torch.Tensor[NxKxHxW]): Target heatmaps.
        target_weight (torch.Tensor[NxKx1]): Weights across
            different joint types.
        img_metas (list(dict)): Information about data augmentation
            By default this includes:
            - "image_file: path to the image file
            - "center": center of the bbox
            - "scale": scale of the bbox
            - "rotation": rotation of the bbox
            - "bbox_score": score of bbox
        return_loss (bool): Option to `return loss`. `return loss=True`
            for training, `return loss=False` for validation & test.
        return_heatmap (bool) : Option to return heatmap.
    
    Returns:
        dict|tuple: if `return loss` is true, then return losses.
          Otherwise, return predicted poses, boxes, image paths
              and heatmaps.
    """
    if return_loss:
        return self.forward_train(img, target, target_weight, img_metas,
                                  **kwargs)
    return self.forward_test(
        img, img_metas, return_heatmap=return_heatmap, **kwargs)

def forward(self,
            img,
            target=None,
            target_weight=None,
            img_metas=None,
            return_loss=True,
            return_heatmap=False,
            **kwargs):
    """Calls either forward_train or forward_test depending on whether
    return_loss=True. Note this setting will change the expected inputs.
    When `return_loss=True`, img and img_meta are single-nested (i.e.
    Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
    should be double nested (i.e.  List[Tensor], List[List[dict]]), with
    the outer list indicating test time augmentations.

    Note:
        batch_size: N
        num_keypoints: K
        num_img_channel: C (Default: 3)
        img height: imgH
        img width: imgW
        heatmaps height: H
        heatmaps weight: W

    Args:
        img (torch.Tensor[NxCximgHximgW]): Input images.
        target (torch.Tensor[NxKxHxW]): Target heatmaps.
        target_weight (torch.Tensor[NxKx1]): Weights across
            different joint types.
        img_metas (list(dict)): Information about data augmentation
            By default this includes:
            - "image_file: path to the image file
            - "center": center of the bbox
            - "scale": scale of the bbox
            - "rotation": rotation of the bbox
            - "bbox_score": score of bbox
        return_loss (bool): Option to `return loss`. `return loss=True`
            for training, `return loss=False` for validation & test.
        return_heatmap (bool) : Option to return heatmap.

    Returns:
        dict|tuple: if `return loss` is true, then return losses.
          Otherwise, return predicted poses, boxes, image paths
              and heatmaps.
    """
    return self.forward_dummy(img)

Then, execute the following command under the directory litehrnet:

python export2onnx.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py work_dirs/litehrnet_30_coco_256x192/epoch_210.pth

Inference

We created initial parameter yaml files in the utils folder for each model runner. For model inference on a single image, execute commands under the folder litehrnet and the results are:

python inference.py --img-path ../../detection/fcos/tutorial/demo/fcos_demo.jpg --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml 

{'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'yolov5_params': 'utils/yolov5_init_params.yaml', 'rsn_affine_params': 'utils/rsn_affine_init_params.yaml', 'lite_hrnet_params': 'utils/lite_hrnet_init_params.yaml'}
{'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'lmk_coco_body_17pts': [[963.994140625, 270.521484375, 963.994140625, 255.560546875, 971.474609375, 248.080078125, 978.955078125, 255.560546875, 1016.357421875, 248.080078125, 993.916015625, 330.365234375, 1098.642578125, 315.404296875, 986.435546875, 442.572265625, 1165.966796875, 412.650390625, 956.513671875, 517.376953125, 1180.927734375, 532.337890625, 1016.357421875, 509.896484375, 1076.201171875, 517.376953125, 971.474609375, 689.427734375, 1068.720703125, 696.908203125, 1031.318359375, 861.478515625, 1106.123046875, 876.439453125], [828.330078125, 293.443359375, 821.279296875, 272.291015625, 828.330078125, 279.341796875, 786.025390625, 286.392578125, 771.923828125, 279.341796875, 757.822265625, 349.849609375, 750.771484375, 342.798828125, 764.873046875, 455.611328125, 764.873046875, 441.509765625, 835.380859375, 413.306640625, 828.330078125, 420.357421875, 757.822265625, 568.423828125, 736.669921875, 568.423828125, 786.025390625, 723.541015625, 764.873046875, 723.541015625, 743.720703125, 878.658203125, 743.720703125, 871.607421875]]}

End-to-End Evaluation

We would like to perform an end-to-end test with an image dataset, using inference_e2e.py under the directory litehrnet to obtain the prediction results.

Here, yolov5 is used for detecting person bbox.

python inference_e2e.py --img-path /mnt/testdata/coco_val2017_resize_1280_720/ --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml  --save-path coco_preds.json

The predictions will be saved into coco_preds.json that has the following structure:

[
    {'img_path':image_path_1
    'lmk_coco_body_17pts': [...]
    },
    {'img_path':image_path_2
    'lmk_coco_body_17pts': [...]
    },
    ...
]

Note that your image path has to be the same as the image path in ground truth json.