Transposed Convolution#
:label:sec_transposed_conv
The CNN layers we have seen so far,
such as convolutional layers (:numref:sec_conv_layer
) and pooling layers (:numref:sec_pooling
),
typically reduce (downsample) the spatial dimensions (height and width) of the input,
or keep them unchanged.
In semantic segmentation
that classifies at pixel-level,
it will be convenient if
the spatial dimensions of the
input and output are the same.
For example,
the channel dimension at one output pixel
can hold the classification results
for the input pixel at the same spatial position.
To achieve this, especially after
the spatial dimensions are reduced by CNN layers,
we can use another type
of CNN layers
that can increase (upsample) the spatial dimensions
of intermediate feature maps.
In this section,
we will introduce
transposed convolution, which is also called fractionally-strided convolution :cite:Dumoulin.Visin.2016
,
for reversing downsampling operations
by the convolution.
import torch
from torch import nn
# from d2l import torch as d2l
Basic Operation#
Ignoring channels for now,
let’s begin with
the basic transposed convolution operation
with stride of 1 and no padding.
Suppose that
we are given a
As an example,
:numref:fig_trans_conv
illustrates
how transposed convolution with a
:label:
fig_trans_conv
We can (implement this basic transposed convolution operation) trans_conv
for a input matrix X
and a kernel matrix K
.
def trans_conv(X, K):
h, w = K.shape
Y = torch.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))
for i in range(X.shape[0]):
for j in range(X.shape[1]):
Y[i: i + h, j: j + w] += X[i, j] * K
return Y
In contrast to the regular convolution (in :numref:sec_conv_layer
) that reduces input elements
via the kernel,
the transposed convolution
broadcasts input elements
via the kernel, thereby
producing an output
that is larger than the input.
We can construct the input tensor X
and the kernel tensor K
from :numref:fig_trans_conv
to [validate the output of the above implementation] of the basic two-dimensional transposed convolution operation.
X = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
trans_conv(X, K)
tensor([[ 0., 0., 1.],
[ 0., 4., 6.],
[ 4., 12., 9.]])
Alternatively,
when the input X
and kernel K
are both
four-dimensional tensors,
we can [use high-level APIs to obtain the same results].
X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, bias=False)
tconv.weight.data = K
tconv(X)
tensor([[[[ 0., 0., 1.],
[ 0., 4., 6.],
[ 4., 12., 9.]]]], grad_fn=<ConvolutionBackward0>)
[Padding, Strides, and Multiple Channels]#
Different from in the regular convolution where padding is applied to input, it is applied to output in the transposed convolution. For example, when specifying the padding number on either side of the height and width as 1, the first and last rows and columns will be removed from the transposed convolution output.
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, padding=1, bias=False)
tconv.weight.data = K
tconv(X)
tensor([[[[4.]]]], grad_fn=<ConvolutionBackward0>)
In the transposed convolution,
strides are specified for intermediate results (thus output), not for input.
Using the same input and kernel tensors
from :numref:fig_trans_conv
,
changing the stride from 1 to 2
increases both the height and weight
of intermediate tensors, hence the output tensor
in :numref:fig_trans_conv_stride2
.
:label:
fig_trans_conv_stride2
The following code snippet can validate the transposed convolution output for stride of 2 in :numref:fig_trans_conv_stride2
.
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, stride=2, bias=False)
tconv.weight.data = K
tconv(X)
tensor([[[[0., 0., 0., 1.],
[0., 0., 2., 3.],
[0., 2., 0., 3.],
[4., 6., 6., 9.]]]], grad_fn=<ConvolutionBackward0>)
For multiple input and output channels,
the transposed convolution
works in the same way as the regular convolution.
Suppose that
the input has
As in all, if we feed
X = torch.rand(size=(1, 10, 16, 16))
conv = nn.Conv2d(10, 20, kernel_size=5, padding=2, stride=3)
tconv = nn.ConvTranspose2d(20, 10, kernel_size=5, padding=2, stride=3)
tconv(conv(X)).shape == X.shape
True
[Connection to Matrix Transposition]#
:label:subsec-connection-to-mat-transposition
The transposed convolution is named after
the matrix transposition.
To explain,
let’s first
see how to implement convolutions
using matrix multiplications.
In the example below, we define a X
and a K
, and then use the corr2d
function to compute the convolution output Y
.
from scipy import signal
X = torch.arange(9.0).reshape(3, 3)
K = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
# Y = d2l.corr2d(X, K)
Y = signal.correlate2d(X, K, mode="valid")
print(X)
print(K)
print(Y)
tensor([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]])
tensor([[1., 2.],
[3., 4.]])
[[27. 37.]
[57. 67.]]
import numpy as np
np.linalg.inv(K)
array([[-2. , 1. ],
[ 1.5, -0.5]], dtype=float32)
Next, we rewrite the convolution kernel K
as
a sparse weight matrix W
containing a lot of zeros.
The shape of the weight matrix is (K
.
def kernel2matrix(K):
k, W = torch.zeros(5), torch.zeros((4, 9))
k[:2], k[3:5] = K[0, :], K[1, :]
W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k
return W
W = kernel2matrix(K)
W
tensor([[1., 2., 0., 3., 4., 0., 0., 0., 0.],
[0., 1., 2., 0., 3., 4., 0., 0., 0.],
[0., 0., 0., 1., 2., 0., 3., 4., 0.],
[0., 0., 0., 0., 1., 2., 0., 3., 4.]])
Concatenate the input X
row by row to get a vector of length 9. Then the matrix multiplication of W
and the vectorized X
gives a vector of length 4.
After reshaping it, we can obtain the same result Y
from the original convolution operation above:
we just implemented convolutions using matrix multiplications.
Y2 = torch.matmul(W, X.reshape(-1)).reshape(2, 2)
print(Y2)
tensor([[27., 37.],
[57., 67.]])
Likewise, we can implement transposed convolutions using
matrix multiplications.
In the following example,
we take the Y
from the above
regular convolution
as input to the transposed convolution.
To implement this operation by multiplying matrices,
we only need to transpose the weight matrix W
with the new shape
y = torch.Tensor(Y)
Z1 = trans_conv(Y, K)
Z2 = torch.matmul(W.T, y.reshape(-1)).reshape(3, 3)
print(Z1)
print(Z2)
tensor([[ 27., 91., 74.],
[138., 400., 282.],
[171., 429., 268.]])
tensor([[ 27., 91., 74.],
[138., 400., 282.],
[171., 429., 268.]])
Consider implementing the convolution
by multiplying matrices.
Given an input vector
Summary#
In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input.
If we feed
into a convolutional layer to output and create a transposed convolutional layer with the same hyperparameters as except for the number of output channels being the number of channels in , then will have the same shape as .We can implement convolutions using matrix multiplications. The transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer.
Exercises#
In :numref:
subsec-connection-to-mat-transposition
, the convolution inputX
and the transposed convolution outputZ
have the same shape. Do they have the same value? Why?Is it efficient to use matrix multiplications to implement convolutions? Why?
This section is borrowed from D2L
Further Reading