# Lite-HRNet: A Lightweight High-Resolution Network

## Introduction
In this tutorial, we will provide an example for training and evaluating Lite-HRNet model 
on COCO dataset for pose estimation task.

# Installation
Firstly, we have to install the necessary libraries. Python 3.6+, CUDA 9.2+, GCC 5+ and Pytorch 1.3+ are required. 
For installing [Pytorch](https://pytorch.org), you have to check your CUDA version and select the correct [Pytorch version](https://pytorch.org/get-started/previous-versions/).
You can check your CUDA version by executing `nvidia-smi` in your terminal. For this tutorial, we choose to install Pytorch 1.7.0 by executing:
```shell
$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
```
For other libraries, you can check the `requirements.txt` file. Installing these packages is simple. You can install them by running:
```shell
$ pip install -r requirements.txt
```
Moreover, we have to install [mmcv](https://github.com/open-mmlab/mmcv) with the version 1.3.3 with pytroch version 1.7.0 and CUDA version 11.0:
```shell
pip install mmcv-full==1.3.3 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
```

# Dataset & Preparation
For this tutorial, we will use [MS COCO](http://cocodataset.org/#download) dataset.
You can download it from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. 
[HRNet-Human-Pose-Estimation](https://github.com/HRNet/HRNet-Human-Pose-Estimation) provides person detection result of COCO val2017 to reproduce the multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-)
Download and extract them under `litehrnet/data`, and make them look like this:

```
lite_hrnet
├── configs
├── models
├── tools
`── data
    │── coco
        │-- annotations
        │   │-- person_keypoints_train2017.json
        │   |-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        │-- train2017
        │   │-- 000000000009.jpg
        │   │-- 000000000025.jpg
        │   │-- 000000000030.jpg
        │   │-- ...
        `-- val2017
            │-- 000000000139.jpg
            │-- 000000000285.jpg
            │-- 000000000632.jpg
            │-- ...

```

## Modify MMPose for Kneron PPP 
To use Kneron pre-post-processing during training and testing, we replaced some files in the `mmpose` package in the python/anaconda env by the files in the `mmpose_replacement` folder, by excuting:
```shell
$ cp mmpose_replacement/post_transforms.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/core/post_processing/post_transforms.py
$ cp mmpose_replacement/top_down_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/top_down_transform.py
$ cp mmpose_replacement/loading.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/loading.py
$ cp mmpose_replacement/shared_transform.py /home/ziyan_zhu/anaconda3/envs/litehrnet/lib/python3.8/site-packages/mmpose/datasets/pipelines/shared_transform.py
```
Moreover, we copy `prepostprocess/kneron_preprocessing/` to our python/anaconda env:
```shell
$ cp -r prepostprocess/kneron_preprocessing/ .
```

# Train

For this tutorial, we choose to use the config file `/litehrnet/configs/top_down/lite_hrnet/coco//litehrnet_30_coco_256x192.py`. 
Before training our model, let's take a quick look of the basic settings in the config file:
```
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=10)
evaluation = dict(interval=10, metric='mAP')

optimizer = dict(
    type='Adam',
    lr=2e-3,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    # warmup=None,
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[170, 200])
total_epochs = 210
log_config = dict(
    interval=50, hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
```
Here, we train the model from scratch. If you would like to fine-tune a pretrained model, you can assign the pretrained model path to `load_from` or `resume_from`.

The difference between `resume-from` and `load-from` in `CONFIG_FILE`:\
`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.

We will save the model every 10 epochs (as stated in `checkpoint_config`), and validate the model every 10 epochs with metric mAP (as stated in `evaluation`).
The optimizer used here is Adam with base learning rate 2e-3, and a learning rate scheduler is set up in `lr_config`. 
The total number of epoch is 210.

Here is settings for the datasets in config file.
```
data_cfg = dict(
    image_size=[192, 256],
    heatmap_size=[48, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'],
    soft_nms=False,
    nms_thr=1.0,
    oks_thr=0.9,
    vis_thr=0.2,
    bbox_thr=1.0,
    use_gt_bbox=False,
    image_thr=0.0,
    bbox_file='data/coco/person_detection_results/'
    'COCO_val2017_detections_AP_H_56_person.json',
)

...

data_root = 'data/coco'
data = dict(
    samples_per_gpu=64,
    workers_per_gpu=4,
    train=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
        img_prefix=f'{data_root}/train2017/',
        data_cfg=data_cfg,
        pipeline=train_pipeline),
    val=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
        img_prefix=f'{data_root}/val2017/',
        data_cfg=val_data_cfg,
        pipeline=val_pipeline),
    test=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
        img_prefix=f'{data_root}/val2017/',
        data_cfg=data_cfg,
        pipeline=val_pipeline),
)
```
The input image size of the model is `192x256` and the corresponding hyper-parameters are specified in `data_cfg` dictionary.
The data root is given in `data_root` as 'data/coco'. The batch size per gpu is 64 and workers per gpu is 4 (in `data` dictionary).
The annotation file paths are also saved in `data` dictionary.

After setting up all the configurations, we are ready to train our model:
```shell
# train with a signle GPU
python train.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py
```
All outputs (log files and checkpoints) will be saved to the working directory (default: `/litehrnet/work_dirs/`).

# Convert to ONNX
To export onnx model, we have to modify a forward function in the `mmpose` package. 
The specific file is `site-packages/mmpose/models/detectors/top_down.py` in your python/anaconda env. You can use `python -m site` to check you env.
Change the `forward` function in line 81 from:
```shell
def forward(self,
            img,
            target=None,
            target_weight=None,
            img_metas=None,
            return_loss=True,
            return_heatmap=False,
            **kwargs):
    """Calls either forward_train or forward_test depending on whether
    return_loss=True. Note this setting will change the expected inputs.
    When `return_loss=True`, img and img_meta are single-nested (i.e.
    Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
    should be double nested (i.e.  List[Tensor], List[List[dict]]), with
    the outer list indicating test time augmentations.

    Note:
        batch_size: N
        num_keypoints: K
        num_img_channel: C (Default: 3)
        img height: imgH
        img width: imgW
        heatmaps height: H
        heatmaps weight: W

    Args:
        img (torch.Tensor[NxCximgHximgW]): Input images.
        target (torch.Tensor[NxKxHxW]): Target heatmaps.
        target_weight (torch.Tensor[NxKx1]): Weights across
            different joint types.
        img_metas (list(dict)): Information about data augmentation
            By default this includes:
            - "image_file: path to the image file
            - "center": center of the bbox
            - "scale": scale of the bbox
            - "rotation": rotation of the bbox
            - "bbox_score": score of bbox
        return_loss (bool): Option to `return loss`. `return loss=True`
            for training, `return loss=False` for validation & test.
        return_heatmap (bool) : Option to return heatmap.
    
    Returns:
        dict|tuple: if `return loss` is true, then return losses.
          Otherwise, return predicted poses, boxes, image paths
              and heatmaps.
    """
    if return_loss:
        return self.forward_train(img, target, target_weight, img_metas,
                                  **kwargs)
    return self.forward_test(
        img, img_metas, return_heatmap=return_heatmap, **kwargs)
```
to 
```shell
def forward(self,
            img,
            target=None,
            target_weight=None,
            img_metas=None,
            return_loss=True,
            return_heatmap=False,
            **kwargs):
    """Calls either forward_train or forward_test depending on whether
    return_loss=True. Note this setting will change the expected inputs.
    When `return_loss=True`, img and img_meta are single-nested (i.e.
    Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
    should be double nested (i.e.  List[Tensor], List[List[dict]]), with
    the outer list indicating test time augmentations.

    Note:
        batch_size: N
        num_keypoints: K
        num_img_channel: C (Default: 3)
        img height: imgH
        img width: imgW
        heatmaps height: H
        heatmaps weight: W

    Args:
        img (torch.Tensor[NxCximgHximgW]): Input images.
        target (torch.Tensor[NxKxHxW]): Target heatmaps.
        target_weight (torch.Tensor[NxKx1]): Weights across
            different joint types.
        img_metas (list(dict)): Information about data augmentation
            By default this includes:
            - "image_file: path to the image file
            - "center": center of the bbox
            - "scale": scale of the bbox
            - "rotation": rotation of the bbox
            - "bbox_score": score of bbox
        return_loss (bool): Option to `return loss`. `return loss=True`
            for training, `return loss=False` for validation & test.
        return_heatmap (bool) : Option to return heatmap.

    Returns:
        dict|tuple: if `return loss` is true, then return losses.
          Otherwise, return predicted poses, boxes, image paths
              and heatmaps.
    """
    return self.forward_dummy(img)
```
Then, execute the following command under the directory `litehrnet`:

```shell
python export2onnx.py configs/top_down/lite_hrnet/coco/litehrnet_30_coco_256x192.py work_dirs/litehrnet_30_coco_256x192/epoch_210.pth
```

# Inference 
We created initial parameter yaml files in the `utils` folder for each model runner. For model inference on a single image, execute commands under the folder `litehrnet` and the results are:
```shell
python inference.py --img-path ../../detection/fcos/tutorial/demo/fcos_demo.jpg --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml 

{'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'yolov5_params': 'utils/yolov5_init_params.yaml', 'rsn_affine_params': 'utils/rsn_affine_init_params.yaml', 'lite_hrnet_params': 'utils/lite_hrnet_init_params.yaml'}
{'img_path': '../../detection/fcos/tutorial/demo/fcos_demo.jpg', 'lmk_coco_body_17pts': [[963.994140625, 270.521484375, 963.994140625, 255.560546875, 971.474609375, 248.080078125, 978.955078125, 255.560546875, 1016.357421875, 248.080078125, 993.916015625, 330.365234375, 1098.642578125, 315.404296875, 986.435546875, 442.572265625, 1165.966796875, 412.650390625, 956.513671875, 517.376953125, 1180.927734375, 532.337890625, 1016.357421875, 509.896484375, 1076.201171875, 517.376953125, 971.474609375, 689.427734375, 1068.720703125, 696.908203125, 1031.318359375, 861.478515625, 1106.123046875, 876.439453125], [828.330078125, 293.443359375, 821.279296875, 272.291015625, 828.330078125, 279.341796875, 786.025390625, 286.392578125, 771.923828125, 279.341796875, 757.822265625, 349.849609375, 750.771484375, 342.798828125, 764.873046875, 455.611328125, 764.873046875, 441.509765625, 835.380859375, 413.306640625, 828.330078125, 420.357421875, 757.822265625, 568.423828125, 736.669921875, 568.423828125, 786.025390625, 723.541015625, 764.873046875, 723.541015625, 743.720703125, 878.658203125, 743.720703125, 871.607421875]]}

```

## End-to-End Evaluation
We would like to perform an end-to-end test with an image dataset, using `inference_e2e.py` under the directory `litehrnet` to obtain the prediction results. 

Here, yolov5 is used for detecting person bbox.

```shell
python inference_e2e.py --img-path /mnt/testdata/coco_val2017_resize_1280_720/ --yolov5_param utils/yolov5_init_params.yaml --rsn_affine_params utils/rsn_affine_init_params.yaml --lite_hrnet_params utils/lite_hrnet_init_params.yaml  --save-path coco_preds.json
```

The predictions will be saved into `coco_preds.json` that has the following structure:
```bash
[
    {'img_path':image_path_1
    'lmk_coco_body_17pts': [...]
    },
    {'img_path':image_path_2
    'lmk_coco_body_17pts': [...]
    },
    ...
]
```
Note that your image path has to be the same as the image path in ground truth json.