Okay, carrying on from last time, let’s have a look at pooling and tranposed convolutional layers.

Pooling

Pooling is used to reduce the number of parameters between layers by reducing the shape of the input. It is a pretty simple aggregation. A given number of elements from the input (say 2x2) are combined into a single value. The most common two approaches are to take the average of the elements or the maximum value of the elements. The latter, max pooling is more commonly preferred.

Max pooling has the advantage over average pooling in that it also performs some noise suppression.

$$n_{out} = \left[\frac{n_{in} - f}{s}\right] + 1$$

where: \(n_{out}\) is our output matrix dimension size,
\(n_{in}\) is the input matrix dimension size,
\(f\) is the pooling kernel dimension size, and
\(s\) is stride size

Quick Example

I am using max pooling with a stride of 2. A 4x4 input, pooling kernel 2x2 and stride 2, yeilds an output of size 2x2.

Input
507711194
288010040
85796374
6755105123
Output
 80111
 85123

Transposed Convolutional Layer

This is an upsampling layer that generates an output feature map that is larger than the input feature map. Transposed convolutional layers are particularly useful for tasks that involve upsampling the input data, such as generating an image from a set of noise vectors. Which is what I will eventually be using it for.

In our CNN GAN the discriminator will use convolutional layers. The generator will use transposed convolutional layers. This is in keeping with the concept that the generator should mirror the discriminator.

A Quick Example

Input and kernel will both be 2x2. And I will use a stride of 2. It’s enough to get the idea across.

Input
 80111
 85105
Kernel
 1-1
 -22
Output
80-80
-160160

... ...

Input
 80111
 85105
Kernel
 1-1
 -22
Output
80-80111-111
-160160-222222
85-85105-105
-170170-210210

Probably should have done a step of 1 as that adds some fun to the process. Perhaps I will add that before it comes time to publish. Maybe a chance to code something in PyTorch to output the result of the above with a step of 1.

Step of Size 1

Update: Okay, added coverage of transposed convolution with a step of 1. Thought it best.

So, after saying I wasn’t, I have decided to look at a transposed convolutional layer using a step of one. But I am going to start by generating the result in code. Then, because it will not likely be obvious what is happening, perhaps I will spend the time writing the HTML to display the steps.

And, I have no idea why PyTorch requires a 4-dimensional tensor for the input and the kernel. And, I am not going to try read the code to figure it out. Or, for that matter dig into the mathematics which is likely more important for understanding this need.

Got a couple of runtime errors as I was working on the code.

RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype
-- and again after I change the kernel tensor to type float64
RuntimeError: expected scalar type Long but found Double
-- so also changed input tensor to type float64
# do tranpose convolution on small matrices using PyTorch code
# input
in_t = torch.tensor([[80, 111], [85, 105]], dtype=torch.float64)
# kernel
k_t = torch.tensor([[1, -1], [-2, 2]], dtype=torch.float64)
# reshape for transposed conv
in_t = in_t.reshape(1, 1, 2, 2)
k_t = k_t.reshape(1, 1, 2, 2)
# transposed convolutional layer
transpc = nn.ConvTranspose2d(in_channels=1, out_channels=1,
                             kernel_size=2, stride=1,
                             padding=0, bias=False)
# intialize kernel
transpc.weight.data = k_t
# generate output and print to terminal
out_t = transpc(in_t)
print(out_t)
(mclp-3.12) PS F:\learn\mcl_pytorch\chap4> python cnn_learn.py
tensor([[[[  80.,   31., -111.],
          [ -75.,  -42.,  117.],
          [-170.,  -40.,  210.]]]], dtype=torch.float64,
       grad_fn=<ConvolutionBackward0>)

Like I said, probably not obvious what is happening. But thinking back to the example of the convolutional layer in the previous post may help. There were intermediate values created and these were then added together. In this case a slightly different dimensionality for the intermediate values but…

Input
 80111
 85105
Tranposed Convolution
(Stride 1)
Kernel
 1-1
 -22

=

 80-80000
 -160160

+

000111-111
-222222

+

000
85-85
-170170

+

000
105-105
-210210

=

80-80 + 111-111
-160 + 85160 + -222
+ -85 + 105
222 + -105
-170170 + -210210

And that last little bit of arithmetic gives us the tensor as calculated by PyTorch. So much fun!

I should also show you an equation for determining the output size given an input size, a kernel size, a stride and a input padding size. In the PyTorch documentation they provide formulae for both height and width of the output. I am just going to assume all the bits and pieces are square. I am also ignoring some of the other potential values that can be passed to the method for the transposed convolutional layer. E.G. output padding and dilation. Much more than I wish to deal with at the moment. So do check the documentation.

$$n_{out} = (n_{in} - 1)*s - (2*p) + (k-1) + 1$$

where: \(n_{out}\) is our output matrix dimension size,
\(n_{in}\) is the input matrix dimension size,
\(k\) is the pooling kernel dimension size, and
\(p\) is padding size
\(s\) is stride size

In the case of our example using a stride of 1 above that would be:

$$n_{out} = (2-1)*1 - (2*0) + (2-1) + 1 = 1 + 1 + 1 = 3$$

And for the first example with a stride of 2, we get:

$$n_{out} = (2-1)*2 - (2*0) + (2-1) + 1 = 2 + 1 + 1 = 4$$

Done

And that’s it for this one. A bit of work, some bugs to help with the learning and, in the end, some fun. Don’t know how much I learned, but at least I have managed to instantiate and use convolutional layers, conventional and transposed.

Until next time, may your days be as satisfying.

Resources

I had thought about repeating the resources from the last post, but decided against it.