Lite-HRNet: A Lightweight High-Resolution Network
Introduction
In this tutorial, we will provide an example for training and evaluating Lite-HRNet model on COCO dataset for pose estimation task.
Installation
Firstly, we have to install the necessary libraries. Python 3.6+, CUDA 9.2+, GCC 5+ and Pytorch 1.3+ are required.
For installing Pytorch, you have to check your CUDA version and select the correct Pytorch version.
You can check your CUDA version by executing nvidia-smi in your terminal. For this tutorial, we choose to install Pytorch 1.7.0 by executing:
$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
For other libraries, you can check the requirements.txt file. Installing these packages is simple. You can install them by running:
$ pip install -r requirements.txt
Moreover, we have to install mmcv with the version 1.3.3 with pytroch version 1.7.0 and CUDA version 11.0:
pip install mmcv-full==1.3.3 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
Dataset & Preparation
For this tutorial, we will use MS COCO dataset.
You can download it from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation.
HRNet-Human-Pose-Estimation provides person detection result of COCO val2017 to reproduce the multi-person pose estimation results. Please download from OneDrive
Download and extract them under litehrnet/data, and make them look like this:
lite_hrnet
├── configs
├── models
├── tools
`── data
│── coco
│-- annotations
│ │-- person_keypoints_train2017.json
│ |-- person_keypoints_val2017.json
|-- person_detection_results
| |-- COCO_val2017_detections_AP_H_56_person.json
│-- train2017
│ │-- 000000000009.jpg
│ │-- 000000000025.jpg
│ │-- 000000000030.jpg
│ │-- ...
`-- val2017
│-- 000000000139.jpg
│-- 000000000285.jpg
│-- 000000000632.jpg
│-- ...
Modify MMPose for Kneron PPP
To use Kneron pre-post-processing during training and testing, we replaced some files in the mmpose package in the python/anaconda env by the files in the mmpose_replacement folder, by excuting:
$ cp mmpose_replacement/post_transforms.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/core/post_processing/post_transforms.py
$ cp mmpose_replacement/top_down_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/top_down_transform.py
$ cp mmpose_replacement/loading.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/loading.py
$ cp mmpose_replacement/shared_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/shared_transform.py
Moreover, we copy prepostprocess/kneron_preprocessing/ to our python/anaconda env:
$ cp -r prepostprocess/kneron_preprocessing/ .
Train
For this tutorial, we choose to use the config file /litehrnet/configs/top_down/lite_hrnet/coco//litehrnet_30_coco_256x192.py.
Before training our model, let's take a quick look of the basic settings in the config file:
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=10)
evaluation = dict(interval=10, metric='mAP')
optimizer = dict(
type='Adam',
lr=2e-3,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
# warmup=None,
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50, hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
Here, we train the model from scratch. If you would like to fine-tune a pretrained model, you can assign the pretrained model path to load_from or resume_from.
The difference between resume-from and load-from in CONFIG_FILE:
resume-from loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
load-from only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
We will save the model every 10 epochs (as stated in checkpoint_config), and validate the model every 10 epochs with metric mAP (as stated in evaluation).
The optimizer used here is Adam with base learning rate 2e-3, and a learning rate scheduler is set up in lr_config.
The total number of epoch is 210.
Here is settings for the datasets in config file.
data_cfg = dict(
image_size=[192, 256],
heatmap_size=[48, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
soft_nms=False,
nms_thr=1.0,
oks_thr=0.9,
vis_thr=0.2,
bbox_thr=1.0,
use_gt_bbox=False,
image_thr=0.0,
bbox_file='data/coco/person_detection_results/'
'COCO_val2017_detections_AP_H_56_person.json',
)
...
data_root = 'data/coco'
data = dict(
samples_per_gpu=64,
workers_per_gpu=4,
train=dict(
type='TopDownCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
img_prefix=f'{data_root}/train2017/',
data_cfg=data_cfg,
pipeline=train_pipeline),
val=dict(
type='TopDownCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
img_prefix=f'{data_root}/val2017/',
data_cfg=val_data_cfg,
pipeline=val_pipeline),
test=dict(
type='TopDownCocoDataset',
ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline),
)
The input image size of the model is 192x256 and the corresponding hyper-parameters are specified in data_cfg dictionary.
The data root is given in data_root as 'data/coco'. The batch size per gpu is 64 and workers per gpu is 4 (in data dictionary).
The annotation file paths are also saved in data dictionary.
After setting up all the configurations, we are ready to train our model:
# train with a signle GPU
python train.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py
All outputs (log files and checkpoints) will be saved to the working directory (default: /litehrnet/work_dirs/).
Convert to ONNX
To export onnx model, we have to modify a forward function in the mmpose package.
The specific file is site-packages/mmpose/models/detectors/top_down.py in your python/anaconda env. You can use python -m site to check you env.
Change the forward function in line 81 from:
def forward(self,
img,
target=None,
target_weight=None,
img_metas=None,
return_loss=True,
return_heatmap=False,
**kwargs):
"""Calls either forward_train or forward_test depending on whether
return_loss=True. Note this setting will change the expected inputs.
When `return_loss=True`, img and img_meta are single-nested (i.e.
Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
should be double nested (i.e. List[Tensor], List[List[dict]]), with
the outer list indicating test time augmentations.
Note:
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W
Args:
img (torch.Tensor[NxCximgHximgW]): Input images.
target (torch.Tensor[NxKxHxW]): Target heatmaps.
target_weight (torch.Tensor[NxKx1]): Weights across
different joint types.
img_metas (list(dict)): Information about data augmentation
By default this includes:
- "image_file: path to the image file
- "center": center of the bbox
- "scale": scale of the bbox
- "rotation": rotation of the bbox
- "bbox_score": score of bbox
return_loss (bool): Option to `return loss`. `return loss=True`
for training, `return loss=False` for validation & test.
return_heatmap (bool) : Option to return heatmap.
Returns:
dict|tuple: if `return loss` is true, then return losses.
Otherwise, return predicted poses, boxes, image paths
and heatmaps.
"""
if return_loss:
return self.forward_train(img, target, target_weight, img_metas,
**kwargs)
return self.forward_test(
img, img_metas, return_heatmap=return_heatmap, **kwargs)
to
def forward(self,
img,
target=None,
target_weight=None,
img_metas=None,
return_loss=True,
return_heatmap=False,
**kwargs):
"""Calls either forward_train or forward_test depending on whether
return_loss=True. Note this setting will change the expected inputs.
When `return_loss=True`, img and img_meta are single-nested (i.e.
Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
should be double nested (i.e. List[Tensor], List[List[dict]]), with
the outer list indicating test time augmentations.
Note:
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W
Args:
img (torch.Tensor[NxCximgHximgW]): Input images.
target (torch.Tensor[NxKxHxW]): Target heatmaps.
target_weight (torch.Tensor[NxKx1]): Weights across
different joint types.
img_metas (list(dict)): Information about data augmentation
By default this includes:
- "image_file: path to the image file
- "center": center of the bbox
- "scale": scale of the bbox
- "rotation": rotation of the bbox
- "bbox_score": score of bbox
return_loss (bool): Option to `return loss`. `return loss=True`
for training, `return loss=False` for validation & test.
return_heatmap (bool) : Option to return heatmap.
Returns:
dict|tuple: if `return loss` is true, then return losses.
Otherwise, return predicted poses, boxes, image paths
and heatmaps.
"""
return self.forward_dummy(img)
Then, execute the following command under the directory litehrnet:
python export2onnx.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py work_dirs/litehrnet_30_coco_256x192/epoch_210.pth
Inference
We created initial parameter yaml files in the utils folder for each model runner. For model inference on a single image, execute commands under the folder litehrnet and the results are:
python inference.py --img-path ../../detection/fcos/tutorial/demo/fcos_demo.jpg --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml
{'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'yolov5_params': 'utils/yolov5_init_params.yaml', 'rsn_affine_params': 'utils/rsn_affine_init_params.yaml', 'lite_hrnet_params': 'utils/lite_hrnet_init_params.yaml'}
{'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'lmk_coco_body_17pts': [[963.994140625, 270.521484375, 963.994140625, 255.560546875, 971.474609375, 248.080078125, 978.955078125, 255.560546875, 1016.357421875, 248.080078125, 993.916015625, 330.365234375, 1098.642578125, 315.404296875, 986.435546875, 442.572265625, 1165.966796875, 412.650390625, 956.513671875, 517.376953125, 1180.927734375, 532.337890625, 1016.357421875, 509.896484375, 1076.201171875, 517.376953125, 971.474609375, 689.427734375, 1068.720703125, 696.908203125, 1031.318359375, 861.478515625, 1106.123046875, 876.439453125], [828.330078125, 293.443359375, 821.279296875, 272.291015625, 828.330078125, 279.341796875, 786.025390625, 286.392578125, 771.923828125, 279.341796875, 757.822265625, 349.849609375, 750.771484375, 342.798828125, 764.873046875, 455.611328125, 764.873046875, 441.509765625, 835.380859375, 413.306640625, 828.330078125, 420.357421875, 757.822265625, 568.423828125, 736.669921875, 568.423828125, 786.025390625, 723.541015625, 764.873046875, 723.541015625, 743.720703125, 878.658203125, 743.720703125, 871.607421875]]}
End-to-End Evaluation
We would like to perform an end-to-end test with an image dataset, using inference_e2e.py under the directory litehrnet to obtain the prediction results.
Here, yolov5 is used for detecting person bbox.
python inference_e2e.py --img-path /mnt/testdata/coco_val2017_resize_1280_720/ --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml --save-path coco_preds.json
The predictions will be saved into coco_preds.json that has the following structure:
[
{'img_path':image_path_1
'lmk_coco_body_17pts': [...]
},
{'img_path':image_path_2
'lmk_coco_body_17pts': [...]
},
...
]
Note that your image path has to be the same as the image path in ground truth json.