# Lite-HRNet: A Lightweight High-Resolution Network ## Introduction In this tutorial, we will provide an example for training and evaluating Lite-HRNet model on COCO dataset for pose estimation task. # Installation Firstly, we have to install the necessary libraries. Python 3.6+, CUDA 9.2+, GCC 5+ and Pytorch 1.3+ are required. For installing [Pytorch](https://pytorch.org), you have to check your CUDA version and select the correct [Pytorch version](https://pytorch.org/get-started/previous-versions/). You can check your CUDA version by executing `nvidia-smi` in your terminal. For this tutorial, we choose to install Pytorch 1.7.0 by executing: ```shell $ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch ``` For other libraries, you can check the `requirements.txt` file. Installing these packages is simple. You can install them by running: ```shell $ pip install -r requirements.txt ``` Moreover, we have to install [mmcv](https://github.com/open-mmlab/mmcv) with the version 1.3.3 with pytroch version 1.7.0 and CUDA version 11.0: ```shell pip install mmcv-full==1.3.3 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html ``` # Dataset & Preparation For this tutorial, we will use [MS COCO](http://cocodataset.org/#download) dataset. You can download it from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. [HRNet-Human-Pose-Estimation](https://github.com/HRNet/HRNet-Human-Pose-Estimation) provides person detection result of COCO val2017 to reproduce the multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) Download and extract them under `litehrnet/data`, and make them look like this: ``` lite_hrnet ├── configs ├── models ├── tools `── data │── coco │-- annotations │ │-- person_keypoints_train2017.json │ |-- person_keypoints_val2017.json |-- person_detection_results | |-- COCO_val2017_detections_AP_H_56_person.json │-- train2017 │ │-- 000000000009.jpg │ │-- 000000000025.jpg │ │-- 000000000030.jpg │ │-- ... `-- val2017 │-- 000000000139.jpg │-- 000000000285.jpg │-- 000000000632.jpg │-- ... ``` ## Modify MMPose for Kneron PPP To use Kneron pre-post-processing during training and testing, we replaced some files in the `mmpose` package in the python/anaconda env by the files in the `mmpose_replacement` folder, by excuting: ```shell $ cp mmpose_replacement/post_transforms.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/core/post_processing/post_transforms.py $ cp mmpose_replacement/top_down_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/top_down_transform.py $ cp mmpose_replacement/loading.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/loading.py $ cp mmpose_replacement/shared_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/shared_transform.py ``` Moreover, we copy `prepostprocess/kneron_preprocessing/` to our python/anaconda env: ```shell $ cp -r prepostprocess/kneron_preprocessing/ . ``` # Train For this tutorial, we choose to use the config file `/litehrnet/configs/top_down/lite_hrnet/coco//litehrnet_30_coco_256x192.py`. Before training our model, let's take a quick look of the basic settings in the config file: ``` log_level = 'INFO' load_from = None resume_from = None dist_params = dict(backend='nccl') workflow = [('train', 1)] checkpoint_config = dict(interval=10) evaluation = dict(interval=10, metric='mAP') optimizer = dict( type='Adam', lr=2e-3, ) optimizer_config = dict(grad_clip=None) # learning policy lr_config = dict( policy='step', # warmup=None, warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[170, 200]) total_epochs = 210 log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook') ]) ``` Here, we train the model from scratch. If you would like to fine-tune a pretrained model, you can assign the pretrained model path to `load_from` or `resume_from`. The difference between `resume-from` and `load-from` in `CONFIG_FILE`:\ `resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. `load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning. We will save the model every 10 epochs (as stated in `checkpoint_config`), and validate the model every 10 epochs with metric mAP (as stated in `evaluation`). The optimizer used here is Adam with base learning rate 2e-3, and a learning rate scheduler is set up in `lr_config`. The total number of epoch is 210. Here is settings for the datasets in config file. ``` data_cfg = dict( image_size=[192, 256], heatmap_size=[48, 64], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, bbox_thr=1.0, use_gt_bbox=False, image_thr=0.0, bbox_file='data/coco/person_detection_results/' 'COCO_val2017_detections_AP_H_56_person.json', ) ... data_root = 'data/coco' data = dict( samples_per_gpu=64, workers_per_gpu=4, train=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_train2017.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline), val=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', img_prefix=f'{data_root}/val2017/', data_cfg=val_data_cfg, pipeline=val_pipeline), test=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', img_prefix=f'{data_root}/val2017/', data_cfg=data_cfg, pipeline=val_pipeline), ) ``` The input image size of the model is `192x256` and the corresponding hyper-parameters are specified in `data_cfg` dictionary. The data root is given in `data_root` as 'data/coco'. The batch size per gpu is 64 and workers per gpu is 4 (in `data` dictionary). The annotation file paths are also saved in `data` dictionary. After setting up all the configurations, we are ready to train our model: ```shell # train with a signle GPU python train.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py ``` All outputs (log files and checkpoints) will be saved to the working directory (default: `/litehrnet/work_dirs/`). # Convert to ONNX To export onnx model, we have to modify a forward function in the `mmpose` package. The specific file is `site-packages/mmpose/models/detectors/top_down.py` in your python/anaconda env. You can use `python -m site` to check you env. Change the `forward` function in line 81 from: ```shell def forward(self, img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs): """Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When `return_loss=True`, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations. Note: batch_size: N num_keypoints: K num_img_channel: C (Default: 3) img height: imgH img width: imgW heatmaps height: H heatmaps weight: W Args: img (torch.Tensor[NxCximgHximgW]): Input images. target (torch.Tensor[NxKxHxW]): Target heatmaps. target_weight (torch.Tensor[NxKx1]): Weights across different joint types. img_metas (list(dict)): Information about data augmentation By default this includes: - "image_file: path to the image file - "center": center of the bbox - "scale": scale of the bbox - "rotation": rotation of the bbox - "bbox_score": score of bbox return_loss (bool): Option to `return loss`. `return loss=True` for training, `return loss=False` for validation & test. return_heatmap (bool) : Option to return heatmap. Returns: dict|tuple: if `return loss` is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps. """ if return_loss: return self.forward_train(img, target, target_weight, img_metas, **kwargs) return self.forward_test( img, img_metas, return_heatmap=return_heatmap, **kwargs) ``` to ```shell def forward(self, img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs): """Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When `return_loss=True`, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations. Note: batch_size: N num_keypoints: K num_img_channel: C (Default: 3) img height: imgH img width: imgW heatmaps height: H heatmaps weight: W Args: img (torch.Tensor[NxCximgHximgW]): Input images. target (torch.Tensor[NxKxHxW]): Target heatmaps. target_weight (torch.Tensor[NxKx1]): Weights across different joint types. img_metas (list(dict)): Information about data augmentation By default this includes: - "image_file: path to the image file - "center": center of the bbox - "scale": scale of the bbox - "rotation": rotation of the bbox - "bbox_score": score of bbox return_loss (bool): Option to `return loss`. `return loss=True` for training, `return loss=False` for validation & test. return_heatmap (bool) : Option to return heatmap. Returns: dict|tuple: if `return loss` is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps. """ return self.forward_dummy(img) ``` Then, execute the following command under the directory `litehrnet`: ```shell python export2onnx.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py work_dirs/litehrnet_30_coco_256x192/epoch_210.pth ``` # Inference We created initial parameter yaml files in the `utils` folder for each model runner. For model inference on a single image, execute commands under the folder `litehrnet` and the results are: ```shell python inference.py --img-path ../../detection/fcos/tutorial/demo/fcos_demo.jpg --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml {'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'yolov5_params': 'utils/yolov5_init_params.yaml', 'rsn_affine_params': 'utils/rsn_affine_init_params.yaml', 'lite_hrnet_params': 'utils/lite_hrnet_init_params.yaml'} {'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'lmk_coco_body_17pts': [[963.994140625, 270.521484375, 963.994140625, 255.560546875, 971.474609375, 248.080078125, 978.955078125, 255.560546875, 1016.357421875, 248.080078125, 993.916015625, 330.365234375, 1098.642578125, 315.404296875, 986.435546875, 442.572265625, 1165.966796875, 412.650390625, 956.513671875, 517.376953125, 1180.927734375, 532.337890625, 1016.357421875, 509.896484375, 1076.201171875, 517.376953125, 971.474609375, 689.427734375, 1068.720703125, 696.908203125, 1031.318359375, 861.478515625, 1106.123046875, 876.439453125], [828.330078125, 293.443359375, 821.279296875, 272.291015625, 828.330078125, 279.341796875, 786.025390625, 286.392578125, 771.923828125, 279.341796875, 757.822265625, 349.849609375, 750.771484375, 342.798828125, 764.873046875, 455.611328125, 764.873046875, 441.509765625, 835.380859375, 413.306640625, 828.330078125, 420.357421875, 757.822265625, 568.423828125, 736.669921875, 568.423828125, 786.025390625, 723.541015625, 764.873046875, 723.541015625, 743.720703125, 878.658203125, 743.720703125, 871.607421875]]} ``` ## End-to-End Evaluation We would like to perform an end-to-end test with an image dataset, using `inference_e2e.py` under the directory `litehrnet` to obtain the prediction results. Here, yolov5 is used for detecting person bbox. ```shell python inference_e2e.py --img-path /mnt/testdata/coco_val2017_resize_1280_720/ --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml --save-path coco_preds.json ``` The predictions will be saved into `coco_preds.json` that has the following structure: ```bash [ {'img_path':image_path_1 'lmk_coco_body_17pts': [...] }, {'img_path':image_path_2 'lmk_coco_body_17pts': [...] }, ... ] ``` Note that your image path has to be the same as the image path in ground truth json.