Available pretrained models

Available pre-trained pytorch models are stored in bonseyes_openpifpaf_wholebody/models/pytorch. Models with shufflenetv2k16 and shufflenetv2k30 backbones were trained on COCO wholebody dataset (133 keypoints) and the rest of models were trained on COCO kp dataset (17 keypoints)

models
  |-- pytorch
      |-- mobilenetv2
      |   `-- v3.0_mobilenetv2_default_641x641_fp32.pkl
      |-- mobilenetv3large
      |   `-- v3.0_mobilenetv3large_default_641x641_fp32.pkl
      |-- mobilenetv3small
      |   `-- v3.0_mobilenetv3small_default_641x641_fp32.pkl
      |-- resnet50
      |   `-- v3.0_resnet50_default_641x641_fp32.pkl
      |-- shufflenetv2k16
      |   `-- v3.0_shufflenetv2k16_default_641x641_fp32.pkl
      |-- shufflenetv2k30
      |   `-- v3.0_shufflenetv2k30_default_641x641_fp32.pkl
      |-- swin-b
      |   `-- v3.0_swin-b_default_641x641_fp32.pkl
      |-- swin-s
      |  `-- v3.0_swin-s_default_641x641_fp32.pkl
      |-- swin-t
          `-- v3.0_swin-t_default_641x641_fp32.pkl

Pretrained model summaries

For each backbone, these summaries contain detailed architecture info (csv tables available for download) and tables that hold estimates of:

Total number of network parameters
Theoretical amount of floating point arithmetics (FLOPs)
Theoretical amount of multiply-adds (MAdd)
Memory usage

ShuffleNet v2k30

Download information about layers for shufflenetv2k30 pytorch model of input size 384x216 csv

Stats for pretrained shufflenetv2k30 pytorch model with different input sizes
INPUT_SIZE	#PARAMS	GFLOPs	memory	MAdd	MemR+W
128x72	45.3	3.6	97.3	7100	364.6
128x96	45.3	4.6	128.2	9100	426.1
128x128	45.3	6.1	170.9	12130	510.5
256x144	45.3	13.7	384.4	27290	932.7
256x192	45.3	18.2	512.6	36380	1187.8
256x256	45.3	24.3	683.5	48510	1525.8
384x216	45.3	31.2	868.6	62220	1884.2
384x288	45.3	41.0	1153.3	81860	2457.6
384x384	45.3	54.7	1537.8	109150	3215.4
512x288	45.3	54.7	1537.8	109150	3215.4
512x384	45.3	72.9	2050.3	145540	4229.1
512x512	45.3	97.2	2733.8	194050	5580.8

ShuffleNet v2k16

Download information about layers for shufflenetv2k16 pytorch model of input size 384x216 csv

Stats for pretrained shufflenetv2k16 pytorch model with different input sizes
INPUT_SIZE	#PARAMS	GFLOPs	memory	MAdd	MemR+W
128x72	20.5	1.3	40.9	2570	157.0
128x96	20.5	1.6	53.5	3240	181.9
128x128	20.5	2.2	71.3	4330	216.4
256x144	20.5	4.9	160.6	9730	389.1
256x192	20.5	6.5	214.1	12980	492.7
256x256	20.5	8.7	285.4	17310	630.8
384x216	20.5	11.2	363.4	22330	780.6
384x288	20.5	14.6	481.6	29200	1010.7
384x384	20.5	19.5	642.2	38940	1321.0
512x288	20.5	19.5	642.2	38940	1321.0
512x384	20.5	26.0	856.2	51920	1740.8
512x512	20.5	34.7	1141.7	69220	2283.5

ResNet50

Download information about layers for resnet50 pytorch model of input size 384x216 csv

Stats for pretrained resnet50 pytorch model with different input sizes
INPUT_SIZE	#PARAMS	GFLOPs	memory	MAdd	MemR+W
128x72	25.5	3.1	76.3	6170	248.7
128x96	25.5	4.0	101.0	8060	298.0
128x128	25.5	5.4	134.7	10740	365.0
256x144	25.5	12.1	303.1	24170	699.9
256x192	25.5	16.1	404.1	32230	900.8
256x256	25.5	21.5	538.9	42970	1167.4
384x216	25.5	27.4	683.5	54780	1454.1
384x288	25.5	36.3	909.3	72510	1904.6
384x384	25.5	48.4	1212.4	96680	2508.8
512x288	25.5	48.4	1212.4	96680	2508.8
512x384	25.5	64.6	1616.5	128910	3307.5
512x512	25.5	86.1	2155.4	171880	4382.7

MobileNet v2

Download information about layers for mobilenetv2 pytorch model of input size 384x216 csv

Stats for pretrained mobilenetv2 pytorch model with different input sizes
INPUT_SIZE	#PARAMS	GFLOPs	memory	MAdd	MemR+W
128x72	12.1	0.2	15.1	365.3	75.1
128x96	12.1	0.2	19.1	389.5	83.1
128x128	12.1	0.3	25.4	519.4	95.4
256x144	12.1	0.6	57.9	1260.0	157.9
256x192	12.1	0.8	76.2	1560.0	194.0
256x256	12.1	1.1	101.7	2080.0	243.3
384x216	12.1	1.4	130.1	2710.0	298.1
384x288	12.1	1.8	171.6	3510.0	378.9
384x384	12.1	2.4	228.7	4670.0	489.8
512x288	12.1	2.4	228.7	4670.0	489.8
512x384	12.1	3.1	305.0	6230.0	637.7
512x512	12.1	4.2	406.7	8310.0	834.9

MobileNet v3 small

Download information about layers for mobilenetv3small pytorch model of input size 384x216 csv

Stats for pretrained mobilenetv3small pytorch model with different input sizes
INPUT_SIZE	#PARAMS	GFLOPs	memory	MAdd	MemR+W
128x72	1.5	0.1	12.5	130.8	24.7
128x96	1.5	0.1	16.4	164.6	30.7
128x128	1.5	0.1	21.9	219.2	39.0
256x144	1.5	0.2	49.2	492.0	80.8
256x192	1.5	0.3	65.6	655.7	105.9
256x256	1.5	0.4	87.5	873.9	139.3
384x216	1.5	0.6	111.3	1130.0	175.5
384x288	1.5	0.7	147.6	1470.0	231.1
384x384	1.5	1.0	196.8	1970.0	306.3
512x288	1.5	1.0	196.8	1970.0	306.3
512x384	1.5	1.3	262.4	2620.0	406.5
512x512	1.5	1.8	349.8	3490.0	540.1

MobileNet v3 large

Download information about layers for mobilenetv3large pytorch model of input size 384x216 csv

Stats for pretrained mobilenetv3large pytorch model with different input sizes
INPUT_SIZE	#PARAMS	GFLOPs	memory	MAdd	MemR+W
128x72	3.9	0.2	37.9	407.2	77.9
128x96	3.9	0.3	50.0	522.0	98.4
128x128	3.9	0.4	66.7	695.1	126.3
256x144	3.9	0.8	150.1	1560.0	265.5
256x192	3.9	1.1	200.1	2080.0	349.1
256x256	3.9	1.4	266.8	2770.0	460.5
384x216	3.9	1.8	338.7	3550.0	580.0
384x288	3.9	2.4	450.2	4670.0	766.8
384x384	3.9	3.2	600.3	6230.0	1017.5
512x288	3.9	3.2	600.3	6230.0	1017.5
512x384	3.9	4.2	800.4	8310.0	1351.7
512x512	3.9	5.6	1067.2	11080.0	1802.2

When exporting tensorrt fp16 shufflenetv2k30 model (input size 320x320) with enabled DLA usage and GPU fallback, we observed the following distribution of layer execution:

Layers running on DLA vs. GPU
Layers running on DLA:	Layers running on GPU:
{Conv_0, Relu_1, Conv_2, Conv_5, Conv_3, Relu_6, Relu_4}, {Conv_8, Relu_9},	(Unnamed Layer* 1479) [Constant], (Unnamed Layer* 1675) [Constant], (Unnamed Layer* 1677) [Constant], Conv_7, 600 copy, 608 copy, Reshape_32 + Transpose_33, Reshape_38, Split_39, Split_39_1, Conv_40 + Relu_41, Conv_42, Conv_43 + Relu_44, Reshape_67 + Transpose_68, Reshape_73, Split_74, Split_74_1, Conv_75 + Relu_76, Conv_77, Conv_78 + Relu_79, Reshape_102 + Transpose_103, Reshape_108, Split_109, Split_109_1, Conv_110 + Relu_111, Conv_112, Conv_113 + Relu_114, Reshape_137 + Transpose_138, Reshape_143, Split_144, Split_144_1, Conv_145 + Relu_146, Conv_147, Conv_148 + Relu_149, Reshape_172 + Transpose_173, Reshape_178, Split_179, Split_179_1, Conv_180 + Relu_181, Conv_182, Conv_183 + Relu_184, Reshape_207 + Transpose_208, Reshape_213, Split_214, Split_214_1, Conv_215 + Relu_216, Conv_217, Conv_218 + Relu_219, Reshape_242 + Transpose_243, Reshape_248, Split_249, Split_249_1, Conv_250 + Relu_251, Conv_252, Conv_253 + Relu_254, Reshape_277 + Transpose_278, Reshape_283, Conv_284, Conv_287 + Relu_288, Conv_285 + Relu_286, Conv_289, Conv_290 + Relu_291, Reshape_314 + Transpose_315, Reshape_320, Split_321, Split_321_1, Conv_322 + Relu_323, Conv_324, Conv_325 + Relu_326, Reshape_349 + Transpose_350, Reshape_355, Split_356, Split_356_1, Conv_357 + Relu_358, Conv_359, Conv_360 + Relu_361, Reshape_384 + Transpose_385, Reshape_390, Split_391, Split_391_1, Conv_392 + Relu_393, Conv_394, Conv_395 + Relu_396, Reshape_419 + Transpose_420, Reshape_425, Split_426, Split_426_1, Conv_427 + Relu_428, Conv_429, Conv_430 + Relu_431, Reshape_454 + Transpose_455, Reshape_460, Split_461, Split_461_1, Conv_462 + Relu_463, Conv_464, Conv_465 + Relu_466, Reshape_489 + Transpose_490, Reshape_495, Split_496, Split_496_1, Conv_497 + Relu_498, Conv_499, Conv_500 + Relu_501, Reshape_524 + Transpose_525, Reshape_530, Split_531, Split_531_1, Conv_532 + Relu_533, Conv_534, Conv_535 + Relu_536, Reshape_559 + Transpose_560, Reshape_565, Split_566, Split_566_1, Conv_567 + Relu_568, Conv_569, Conv_570 + Relu_571, Reshape_594 + Transpose_595, Reshape_600, Split_601, Split_601_1, Conv_602 + Relu_603, Conv_604, Conv_605 + Relu_606, Reshape_629 + Transpose_630, Reshape_635, Split_636, Split_636_1, Conv_637 + Relu_638, Conv_639, Conv_640 + Relu_641, Reshape_664 + Transpose_665, Reshape_670, Split_671, Split_671_1, Conv_672 + Relu_673, Conv_674, Conv_675 + Relu_676, Reshape_699 + Transpose_700, Reshape_705, Split_706, Split_706_1, Conv_707 + Relu_708, Conv_709, Conv_710 + Relu_711, Reshape_734 + Transpose_735, Reshape_740, Split_741, Split_741_1, Conv_742 + Relu_743, Conv_744, Conv_745 + Relu_746, Reshape_769 + Transpose_770, Reshape_775, Split_776, Split_776_1, Conv_777 + Relu_778, Conv_779, Conv_780 + Relu_781, Reshape_804 + Transpose_805, Reshape_810, Split_811, Split_811_1, Conv_812 + Relu_813, Conv_814, Conv_815 + Relu_816, Reshape_839 + Transpose_840, Reshape_845, Conv_846, Conv_849 + Relu_850, Conv_847 + Relu_848, Conv_851, Conv_852 + Relu_853, Reshape_876 + Transpose_877, Reshape_882, Split_883, Split_883_1, Conv_884 + Relu_885, Conv_886, Conv_887 + Relu_888, Reshape_911 + Transpose_912, Reshape_917, Split_918, Split_918_1, Conv_919 + Relu_920, Conv_921, Conv_922 + Relu_923, Reshape_946 + Transpose_947, Reshape_952, Split_953, Split_953_1, Conv_954 + Relu_955, Conv_956, Conv_957 + Relu_958, Reshape_981 + Transpose_982, Reshape_987, Split_988, Split_988_1, Conv_989 + Relu_990, Conv_991, Conv_992 + Relu_993, Reshape_1016 + Transpose_1017, Reshape_1022, Split_1023, Split_1023_1, Conv_1024 + Relu_1025, Conv_1026, Conv_1027 + Relu_1028, Reshape_1051 + Transpose_1052, Reshape_1057, Conv_1058 + Relu_1059, Conv_1085 \|\| Conv_1060, Reshape_1062 + Transpose_1063, Reshape_1087 + Transpose_1088, Reshape_1065, Reshape_1090, Slice_1066, Slice_1091, Slice_1067, Slice_1092, Reshape_1073 + Transpose_1074, Reshape_1098 + Transpose_1099, Slice_1075, Slice_1077, Slice_1080, Slice_1081, Slice_1100, Slice_1102, Slice_1103, Slice_1108, Slice_1109, Sigmoid_1076, Add_1079, Softplus_1082, Sigmoid_1101, Add_1105, Add_1107, 1922 copy, 1925 copy, 1926 copy, 1928 copy, Transpose_1084, Softplus_1110, 1955 copy, 1959 copy, 1961 copy, 1962 copy, 1964 copy, Transpose_1112,