English | 简体中文

Model Zoo and Benchmark

Environment

  • Python 2.7.1
  • PaddlePaddle >=1.5
  • CUDA 9.0
  • cuDNN >=7.4
  • NCCL 2.1.2

Common settings

  • All models below were trained on coco_2017_train, and tested on coco_2017_val.
  • Batch Normalization layers in backbones are replaced by Affine Channel layers.
  • Unless otherwise noted, all ResNet backbones adopt the ResNet-B variant..
  • For RCNN and RetinaNet models, only horizontal flipping data augmentation was used in the training phase and no augmentations were used in the testing phase.
  • Inf time (fps): the inference time is measured with fps (image/s) on a single GPU (Tesla V100) with cuDNN 7.5 by running 'tools/eval.py' on all validation set, which including data loadding, network forward and post processing. The batch size is 1.

Training Schedules

  • We adopt exactly the same training schedules as Detectron.
  • 1x indicates the schedule starts at a LR of 0.02 and is decreased by a factor of 10 after 60k and 80k iterations and eventually terminates at 90k iterations for minibatch size 16. For batch size 8, LR is decreased to 0.01, total training iterations are doubled, and the decay milestones are scaled by 2.
  • 2x schedule is twice as long as 1x, with the LR milestones scaled accordingly.

ImageNet Pretrained Models

The backbone models pretrained on ImageNet are available. All backbone models are pretrained on standard ImageNet-1k dataset and can be downloaded here.

  • Notes: The ResNet50 model was trained with cosine LR decay schedule and can be downloaded here.

Baselines

Faster & Mask R-CNN

Backbone Type Image/gpu Lr schd Inf time (fps) Box AP Mask AP Download Configs
ResNet50 Faster 1 1x 12.747 35.2 - model config
ResNet50 Faster 1 2x 12.686 37.1 - model config
ResNet50 Mask 1 1x 11.615 36.5 32.2 model config
ResNet50 Mask 1 2x 11.494 38.2 33.4 model config
ResNet50-vd Faster 1 1x 12.575 36.4 - model config
ResNet34-FPN Faster 2 1x - 36.7 - model config
ResNet34-vd-FPN Faster 2 1x - 37.4 - model config
ResNet50-FPN Faster 2 1x 22.273 37.2 - model config
ResNet50-FPN Faster 2 2x 22.297 37.7 - model config
ResNet50-FPN Mask 1 1x 15.184 37.9 34.2 model config
ResNet50-FPN Mask 1 2x 15.881 38.7 34.7 model config
ResNet50-FPN Cascade Faster 2 1x 17.507 40.9 - model config
ResNet50-FPN Cascade Mask 1 1x 12.43 41.3 35.5 model config
ResNet50-vd-FPN Faster 2 2x 21.847 38.9 - model config
ResNet50-vd-FPN Mask 1 2x 15.825 39.8 35.4 model config
CBResNet50-vd-FPN Faster 2 1x - 39.7 - model config
ResNet101 Faster 1 1x 9.316 38.3 - model config
ResNet101-FPN Faster 1 1x 17.297 38.7 - model config
ResNet101-FPN Faster 1 2x 17.246 39.1 - model config
ResNet101-FPN Mask 1 1x 12.983 39.5 35.2 model config
ResNet101-vd-FPN Faster 1 1x 17.011 40.5 - model config
ResNet101-vd-FPN Faster 1 2x 16.934 40.8 - model config
ResNet101-vd-FPN Mask 1 1x 13.105 41.4 36.8 model config
CBResNet101-vd-FPN Faster 2 1x - 42.7 - model config
ResNeXt101-vd-64x4d-FPN Faster 1 1x 8.815 42.2 - model config
ResNeXt101-vd-64x4d-FPN Faster 1 2x 8.809 41.7 - model config
ResNeXt101-vd-64x4d-FPN Mask 1 1x 7.689 42.9 37.9 model config
ResNeXt101-vd-64x4d-FPN Mask 1 2x 7.859 42.6 37.6 model config
SENet154-vd-FPN Faster 1 1.44x 3.408 42.9 - model config
SENet154-vd-FPN Mask 1 1.44x 3.233 44.0 38.7 model config
ResNet101-vd-FPN CascadeClsAware Faster 2 1x - 44.7(softnms) - model config
ResNet101-vd-FPN CascadeClsAware Faster 2 1x - 46.5(multi-scale test) - model config

Deformable ConvNets v2

Backbone Type Conv Image/gpu Lr schd Inf time (fps) Box AP Mask AP Download Configs
ResNet50-FPN Faster c3-c5 2 1x 19.978 41.0 - model config
ResNet50-vd-FPN Faster c3-c5 2 2x 19.222 42.4 - model config
ResNet101-vd-FPN Faster c3-c5 2 1x 14.477 44.1 - model config
ResNeXt101-vd-64x4d-FPN Faster c3-c5 1 1x 7.209 45.2 - model config
ResNet50-FPN Mask c3-c5 1 1x 14.53 41.9 37.3 model config
ResNet50-vd-FPN Mask c3-c5 1 2x 14.832 42.9 38.0 model config
ResNet101-vd-FPN Mask c3-c5 1 1x 11.546 44.6 39.2 model config
ResNeXt101-vd-64x4d-FPN Mask c3-c5 1 1x 6.45 46.2 40.4 model config
ResNet50-FPN Cascade Faster c3-c5 2 1x - 44.2 - model config
ResNet101-vd-FPN Cascade Faster c3-c5 2 1x - 46.4 - model config
ResNeXt101-vd-FPN Cascade Faster c3-c5 2 1x - 47.3 - model config
SENet154-vd-FPN Cascade Mask c3-c5 1 1.44x - 51.9 43.9 model config
ResNet200-vd-FPN-Nonlocal CascadeClsAware Faster c3-c5 1 2.5x 3.103 51.7%(softnms) - model config
CBResNet200-vd-FPN-Nonlocal Cascade Faster c3-c5 1 2.5x 1.68 53.3%(softnms) - model config

Notes:

HRNet

Res2Net

IOU loss

GCNet

Libra R-CNN

Auto Augmentation

Group Normalization

Backbone Type Image/gpu Lr schd Box AP Mask AP Download Configs
ResNet50-FPN Faster 2 2x 39.7 - model config
ResNet50-FPN Mask 1 2x 40.1 35.8 model config

Notes:

YOLO v3

Backbone Pretrain dataset Size deformable Conv Image/gpu Lr schd Inf time (fps) Box AP Download Configs
DarkNet53 (paper) ImageNet 608 False 8 270e - 33.0 - -
DarkNet53 (paper) ImageNet 416 False 8 270e - 31.0 - -
DarkNet53 (paper) ImageNet 320 False 8 270e - 28.2 - -
DarkNet53 ImageNet 608 False 8 270e 45.571 38.9 model config
DarkNet53 ImageNet 416 False 8 270e - 37.5 model config
DarkNet53 ImageNet 320 False 8 270e - 34.8 model config
MobileNet-V1 ImageNet 608 False 8 270e 78.302 29.3 model config
MobileNet-V1 ImageNet 416 False 8 270e - 29.3 model config
MobileNet-V1 ImageNet 320 False 8 270e - 27.1 model config
MobileNet-V3 ImageNet 608 False 8 270e - 31.6 model config
MobileNet-V3 ImageNet 416 False 8 270e - 29.9 model config
MobileNet-V3 ImageNet 320 False 8 270e - 27.1 model config
ResNet34 ImageNet 608 False 8 270e 63.356 36.2 model config
ResNet34 ImageNet 416 False 8 270e - 34.3 model config
ResNet34 ImageNet 320 False 8 270e - 31.4 model config
ResNet50_vd ImageNet 608 True 8 270e - 39.1 model config
ResNet50_vd Object365 608 True 8 270e - 41.4 model config

YOLO v3 on Pascal VOC

Backbone Size Image/gpu Lr schd Inf time (fps) Box AP(0.5) Download Configs
DarkNet53 608 8 270e 54.977 83.5 model config
DarkNet53 416 8 270e - 83.6 model config
DarkNet53 320 8 270e - 82.2 model config
DarkNet53 Diou-Loss 608 8 270e - 83.5 model config
MobileNet-V1 608 8 270e 104.291 76.2 model config
MobileNet-V1 416 8 270e - 76.7 model config
MobileNet-V1 320 8 270e - 75.3 model config
ResNet34 608 8 270e 82.247 82.6 model config
ResNet34 416 8 270e - 81.9 model config
ResNet34 320 8 270e - 80.1 model config

Notes:

  • YOLOv3-DarkNet53 performance in paper YOLOv3 is also provided above, our implements improved performance mainly by using L1 loss in bounding box width and height regression, image mixup and label smooth.
  • YOLO v3 is trained in 8 GPU with total batch size as 64 and trained 270 epoches. YOLO v3 training data augmentations: mixup, randomly color distortion, randomly cropping, randomly expansion, randomly interpolation method, randomly flippling. YOLO v3 used randomly reshaped minibatch in training, inferences can be performed on different image sizes with the same model weights, and we provided evaluation results of image size 608/416/320 above. Deformable conv is added on stage 5 of backbone.
  • Compared with YOLOv3-DarkNet53, the average AP of YOLOv3-DarkNet53 with Diou-Loss increases about 2% in VOC dataset.
  • YOLO v3 enhanced model improves the precision to 43.6 involved with deformable conv, dropblock, IoU loss and IoU aware. See more details in YOLOv3_ENHANCEMENT

RetinaNet

Backbone Image/gpu Lr schd Inf time (fps) Box AP Download Configs
ResNet50-FPN 2 1x - 36.0 model config
ResNet101-FPN 2 1x - 37.3 model config
ResNeXt101-vd-FPN 1 1x - 40.5 model config

Notes: In RetinaNet, the base LR is changed to 0.01 for minibatch size 16.

EfficientDet

Scale Image/gpu Lr schd Box AP Download
EfficientDet-D0 16 300 epochs 33.8 model

Notes: base LR is 0.16 for minibatch size 128 (8x16).

SSDLite

Backbone Size Image/gpu Lr schd Inf time (fps) Box AP Download Configs
MobileNet_v1 300 64 Cosine decay(40w) - 23.6 model config
MobileNet_v3 small 320 64 Cosine decay(40w) - 16.2 model config
MobileNet_v3 large 320 64 Cosine decay(40w) - 23.3 model config
MobileNet_v3 small w/ FPN 320 64 Cosine decay(40w) - 18.9 model config
MobileNet_v3 large w/ FPN 320 64 Cosine decay(40w) - 24.3 model config
GhostNet 320 64 Cosine decay(40w) - 23.3 model config

Notes: SSDLite is trained in 8 GPU with total batch size as 512 and uses cosine decay strategy to train.

SSD

Backbone Size Image/gpu Lr schd Inf time (fps) Box AP Download Configs
VGG16 300 8 40w 81.613 25.1 model config
VGG16 512 8 40w 46.007 29.1 model config

Notes: VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 400000 iters.

SSD on Pascal VOC

Backbone Size Image/gpu Lr schd Inf time (fps) Box AP(0.5) Download Configs
MobileNet v1 300 32 120e 159.543 73.2 model config
VGG16 300 8 240e 117.279 77.5 model config
VGG16 512 8 240e 65.975 80.2 model config

NOTE: MobileNet-SSD is trained in 2 GPU with totoal batch size as 64 and trained 120 epoches. VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 240 epoches. SSD training data augmentations: randomly color distortion, randomly cropping, randomly expansion, randomly flipping.

Face Detection

Please refer face detection models for details.

Object Detection in Open Images Dataset V5

Please refer Open Images Dataset V5 Baseline model for details.

Anchor Free Models

Please refer Anchor Free Models for details.