STDC/configs/bisenetv2/README.md
MengzhangLI 4003b8f421
[Feature] Support BiSeNetV2 (#804)
* BiSeNetV2 first commit

* BiSeNetV2 unittest

* remove pytest

* add pytest module

* fix ConvModule input name

* fix pytest error

* fix unittest

* refactor

* BiSeNetV2 Refactory

* fix docstrings and add some small changes

* use_sigmoid=False

* fix potential bugs about upsampling

* Use ConvModule instead

* Use ConvModule instead

* fix typos

* fix typos

* fix typos

* discard nn.conv2d

* discard nn.conv2d

* discard nn.conv2d

* delete **kwargs

* uploading markdown and model

* final commit

* BiSeNetV2 adding Unittest for its modules

* BiSeNetV2 adding Unittest for its modules

* BiSeNetV2 adding Unittest for its modules

* BiSeNetV2 adding Unittest for its modules

* BiSeNetV2 adding Unittest for its modules

* BiSeNetV2 adding Unittest for its modules

* BiSeNetV2 adding Unittest for its modules

* Fix README conflict

* Fix unittest problem

* Fix unittest problem

* BiSeNetV2

* Fixing fps

* Fixing typpos

* bisenetv2
2021-09-26 18:52:16 +08:00

4.3 KiB

Bisenet v2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Introduction

@article{yu2021bisenet,
  title={Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation},
  author={Yu, Changqian and Gao, Changxin and Wang, Jingbo and Yu, Gang and Shen, Chunhua and Sang, Nong},
  journal={International Journal of Computer Vision},
  pages={1--18},
  year={2021},
  publisher={Springer}
}

Results and models

Cityscapes

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
BiSeNetV2 BiSeNetV2 1024x1024 160000 7.64 31.77 73.21 75.74 config model | log
BiSeNetV2 (OHEM) BiSeNetV2 1024x1024 160000 7.64 - 73.57 75.80 config model | log
BiSeNetV2 (4x8) BiSeNetV2 1024x1024 160000 15.05 - 75.76 77.79 config model | log
BiSeNetV2 (FP16) BiSeNetV2 1024x1024 160000 5.77 36.65 73.07 75.13 config model | log

Note:

  • OHEM means Online Hard Example Mining (OHEM) is adopted in training.
  • FP16 means Mixed Precision (FP16) is adopted in training.
  • 4x8 means 4 GPUs with 8 samples per GPU in training.