* add isprs potsdam dataset * add isprs dataset configs * fix lint error * fix potsdam conversion bug * fix error in potsdam class * fix error in potsdam class * add vaihingen dataset * add vaihingen dataset * add vaihingen dataset * fix some description errors. * fix some description errors. * fix some description errors. * upload models & logs of Potsdam * remove vaihingen and add unit test * add chinese readme * add pseudodataset * use mmcv and add class_names * use f-string * add new dataset unittest * add docstring and remove global variables args * fix metafile error in PSPNet * fix pretrained value * Add dataset info * fix typo Co-authored-by: MengzhangLI <mcmong@pku.edu.cn>
Pyramid Scene Parsing Network
Introduction
Abstract
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
PSPNet (CVPR'2017)
@inproceedings{zhao2017pspnet,
title={Pyramid Scene Parsing Network},
author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
booktitle={CVPR},
year={2017}
}
Results and models
Cityscapes
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-50-D8 | 512x1024 | 40000 | 6.1 | 4.07 | 77.85 | 79.18 | config | model | log |
| PSPNet | R-101-D8 | 512x1024 | 40000 | 9.6 | 2.68 | 78.34 | 79.74 | config | model | log |
| PSPNet | R-50-D8 | 769x769 | 40000 | 6.9 | 1.76 | 78.26 | 79.88 | config | model | log |
| PSPNet | R-101-D8 | 769x769 | 40000 | 10.9 | 1.15 | 79.08 | 80.28 | config | model | log |
| PSPNet | R-18-D8 | 512x1024 | 80000 | 1.7 | 15.71 | 74.87 | 76.04 | config | model | log |
| PSPNet | R-50-D8 | 512x1024 | 80000 | - | - | 78.55 | 79.79 | config | model | log |
| PSPNet | R-101-D8 | 512x1024 | 80000 | - | - | 79.76 | 81.01 | config | model | log |
| PSPNet (FP16) | R-101-D8 | 512x1024 | 80000 | 5.34 | 8.77 | 79.46 | - | config | model | log |
| PSPNet | R-18-D8 | 769x769 | 80000 | 1.9 | 6.20 | 75.90 | 77.86 | config | model | log |
| PSPNet | R-50-D8 | 769x769 | 80000 | - | - | 79.59 | 80.69 | config | model | log |
| PSPNet | R-101-D8 | 769x769 | 80000 | - | - | 79.77 | 81.06 | config | model | log |
| PSPNet | R-18b-D8 | 512x1024 | 80000 | 1.5 | 16.28 | 74.23 | 75.79 | config | model | log |
| PSPNet | R-50b-D8 | 512x1024 | 80000 | 6.0 | 4.30 | 78.22 | 79.46 | config | model | log |
| PSPNet | R-101b-D8 | 512x1024 | 80000 | 9.5 | 2.76 | 79.69 | 80.79 | config | model | log |
| PSPNet | R-18b-D8 | 769x769 | 80000 | 1.7 | 6.41 | 74.92 | 76.90 | config | model | log |
| PSPNet | R-50b-D8 | 769x769 | 80000 | 6.8 | 1.88 | 78.50 | 79.96 | config | model | log |
| PSPNet | R-101b-D8 | 769x769 | 80000 | 10.8 | 1.17 | 78.87 | 80.04 | config | model | log |
ADE20K
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-50-D8 | 512x512 | 80000 | 8.5 | 23.53 | 41.13 | 41.94 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 80000 | 12 | 15.30 | 43.57 | 44.35 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 160000 | - | - | 42.48 | 43.44 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 160000 | - | - | 44.39 | 45.35 | config | model | log |
Pascal VOC 2012 + Aug
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-50-D8 | 512x512 | 20000 | 6.1 | 23.59 | 76.78 | 77.61 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 20000 | 9.6 | 15.02 | 78.47 | 79.25 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 40000 | - | - | 77.29 | 78.48 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 40000 | - | - | 78.52 | 79.57 | config | model | log |
Pascal Context
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-101-D8 | 480x480 | 40000 | 8.8 | 9.68 | 46.60 | 47.78 | config | model | log |
| PSPNet | R-101-D8 | 480x480 | 80000 | - | - | 46.03 | 47.15 | config | model | log |
Pascal Context 59
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-101-D8 | 480x480 | 40000 | - | - | 52.02 | 53.54 | config | model | log |
| PSPNet | R-101-D8 | 480x480 | 80000 | - | - | 52.47 | 53.99 | config | model | log |
Dark Zurich and Nighttime Driving
We support evaluation results on these two datasets using models above trained on Cityscapes training set.
| Method | Backbone | Training Dataset | Test Dataset | mIoU | config | evaluation checkpoint |
|---|---|---|---|---|---|---|
| PSPNet | R-50-D8 | Cityscapes Training set | Dark Zurich | 10.91 | config | model | log |
| PSPNet | R-50-D8 | Cityscapes Training set | Nighttime Driving | 23.02 | config | model | log |
| PSPNet | R-50-D8 | Cityscapes Training set | Cityscapes Validation set | 77.85 | config | model | log |
| PSPNet | R-101-D8 | Cityscapes Training set | Dark Zurich | 10.16 | config | model | log |
| PSPNet | R-101-D8 | Cityscapes Training set | Nighttime Driving | 20.25 | config | model | log |
| PSPNet | R-101-D8 | Cityscapes Training set | Cityscapes Validation set | 78.34 | config | model | log |
| PSPNet | R-101b-D8 | Cityscapes Training set | Dark Zurich | 15.54 | config | model | log |
| PSPNet | R-101b-D8 | Cityscapes Training set | Nighttime Driving | 22.25 | config | model | log |
| PSPNet | R-101b-D8 | Cityscapes Training set | Cityscapes Validation set | 79.69 | config | model | log |
COCO-Stuff 10k
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-50-D8 | 512x512 | 20000 | 9.6 | 20.5 | 35.69 | 36.62 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 20000 | 13.2 | 11.1 | 37.26 | 38.52 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 40000 | - | - | 36.33 | 37.24 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 40000 | - | - | 37.76 | 38.86 | config | model | log |
COCO-Stuff 164k
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-50-D8 | 512x512 | 80000 | 9.6 | 20.5 | 38.80 | 39.19 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 80000 | 13.2 | 11.1 | 40.34 | 40.79 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 160000 | - | - | 39.64 | 39.97 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 160000 | - | - | 41.28 | 41.66 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 320000 | - | - | 40.53 | 40.75 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 320000 | - | - | 41.95 | 42.42 | config | model | log |
LoveDA
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-18-D8 | 512x512 | 80000 | 1.45 | 26.87 | 48.62 | 47.57 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 80000 | 6.14 | 6.60 | 50.46 | 50.19 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 80000 | 9.61 | 4.58 | 51.86 | 51.34 | config | model | log |
Potsdam
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
|---|---|---|---|---|---|---|---|---|---|
| PSPNet | R-18-D8 | 512x512 | 80000 | 1.50 | 85.12 | 77.09 | 78.30 | config | model | log |
| PSPNet | R-50-D8 | 512x512 | 80000 | 6.14 | 30.21 | 78.12 | 78.98 | config | model | log |
| PSPNet | R-101-D8 | 512x512 | 80000 | 9.61 | 19.40 | 78.62 | 79.47 | config | model | log |
Note:
FP16means Mixed Precision (FP16) is adopted in training.