modules.resnet.BlockBottleneck2d¶

Class · nn.Module · Source

layer = mdnc.modules.resnet.BlockBottleneck2d(
    in_planes, out_planes,
    kernel_size=3, stride=1, padding=1, output_size=None,
    normalizer='pinst', activator='prelu', layer_order='new', scaler='down'
)

In the following paper, the authors propose two structres of the residual block.

Deep Residual Learning for Image Recognition

This is the implementation of the bottleneck (second-type) residual block. The residual block could be divided into two branches (input + conv). In this plain implementation, the convolutional branch is a composed of double convolutional layers. Shown in the following chart:

flowchart TB
    in((" ")) --> conv1[Projection<br>convolution] --> conv2[Modern<br>convolution] --> conv3[Projection<br>convolution] --> plus(("+")):::diagramop --> out((" "))
    in --> plus
    classDef diagramop fill:#FFB11B, stroke:#AF811B;

where the projection convolutional layer is implemented by a convolution with kernel_size=1. Compared to the plain block, this implementation requires fewer paramters, but provides a deeper stack.

If the channel of the output changes, or the size of the output changes, a projection layer is required for mapping the input branch to the output space:

flowchart TB
    in((" ")) --> conv1[Projection<br>convolution] --> conv2[Modern<br>convolution] --> conv3[Projection<br>convolution] --> plus(("+")):::diagramop --> out((" "))
    in --> pconv[Projection<br>convolution] --> plus
    classDef diagramop fill:#FFB11B, stroke:#AF811B;

In the following paper, a new op composition order is proposed for building residual block:

Identity Mappings in Deep Residual Networks

This implementation called "pre-activation" would change the order of the sub-layers in the modern convolutional layer (see mdnc.modules.conv.ConvModern2d). We support and recommend to use this implementation, set layer_order = 'new' to enable it.

Arguments¶

Requries

Argument	Type	Description
`in_planes`	`int`	The channel number of the input data.
`out_planes`	`int`	The channel number of the output data.
`kernel_size`	`int` or `(int, int)`	The kernel size of this layer.
`stride`	`int` or `(int, int)`	The stride size of this layer. When `scaler='down'`, this argument serves as the down-sampling factor. When `scaler='up'`, this argument serves as the up-sampling factor.
`padding`	`int` or `(int, int)`	The padding size of this layer. The zero padding would be performed on both edges of the input before the convolution.
`output_size`	`int` or `(int, int)`	The size of the output data. This option is only used when `scaler='up'`. When setting this value, the size of the up-sampling would be given explicitly and the argument `stride` would not be used.
`normalizer`	`str`	The normalization method, could be: `'batch'`: Batch normalization. `'inst'`: Instance normalization. `'pinst'`: Instance normalization with tunable rescaling parameters. `'null'`: Without normalization, would falls back to the "convolution + activation" form. In this case, the `layer_order='new'` would not take effects.
`activator`	`str`	The activation method, could be: `'prelu'`, `'relu'`, `'null'`.
`layer_order`	`str`	The sub-layer composition order, could be: `'new'`: normalization + activation + convolution. `'old'`: convolution + normalization + activation.
`scaler`	`str`	The scaling method, could be: `'down'`: the argument `stride` would be used for down-sampling. `'up'`: the argument `stride` would be used for up-sampling (equivalent to transposed convolution).

Operators¶

`call`¶

y = layer(x)

The forward operator implemented by the forward() method. The input is a 2D tensor, and the output is the final output of this layer.

Requries

Argument	Type	Description
`x`	`torch.Tensor`	A 2D tensor, the size should be `(B, C, L1, L2)`, where `B` is the batch size, `C` is the input channel number, and `(L1, L2)` is the input data size.

Returns

Argument	Description
`y`	A 2D tensor, the size should be `(B, C, L1, L2)`, where `B` is the batch size, `C` is the output channel number, and `(L1, L2)` is the output data size.

Examples¶

In the first example, we build a plain residual block with ½ down-sampling and same padding.

Example 1

Codes

import mdnc

layer = mdnc.modules.resnet.BlockBottleneck2d(16, 32, kernel_size=3, stride=(1, 2), padding=1, scaler='down')
mdnc.contribs.torchsummary.summary(layer, (16, 4, 255), device='cpu')

Output

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
    InstanceNorm2d-1           [-1, 16, 4, 255]              32
             PReLU-2           [-1, 16, 4, 255]              16
            Conv2d-3           [-1, 16, 4, 255]             256
    InstanceNorm2d-4           [-1, 16, 4, 255]              32
             PReLU-5           [-1, 16, 4, 255]              16
            Conv2d-6           [-1, 16, 4, 128]           2,304
    InstanceNorm2d-7           [-1, 16, 4, 128]              32
             PReLU-8           [-1, 16, 4, 128]              16
            Conv2d-9           [-1, 32, 4, 128]             512
           Conv2d-10           [-1, 32, 4, 128]             512
   InstanceNorm2d-11           [-1, 32, 4, 128]              64
BlockBottleneck2d-12           [-1, 32, 4, 128]               0
================================================================
Total params: 3,792
Trainable params: 3,792
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.06
Forward/backward pass size (MB): 1.31
Params size (MB): 0.01
Estimated Total Size (MB): 1.39
----------------------------------------------------------------

Note that the output size would be (4, 128) in this example, because the same padding is used for both two axes of the input size. In this case, if we want to make a reverse layer, we could specify the output_size for the up-sampling layer, for example:

Example 2