The top right value in our activation map will be 0 because there wasn’t anything in the input volume that caused the filter to activate (or more simply said, there wasn’t a curve in that region of the original image). As we grew older however, our parents and teachers showed us different pictures and images and gave us a corresponding label. The success of a Convolutional networks can also perform more banal (and more profitable), business-oriented tasks such as optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed.CNNs are not limited to image recognition, however. CNNs do take a biological inspiration from the visual cortex. three-dimensional objects, rather than flat canvases to be measured only by width and height. Convolutional networks are designed to reduce the dimensionality of images in a variety of ways. )Rather than focus on one pixel at a time, a convolutional net takes in square patches of pixels and passes them through a Imagine two matrices. You could, for example, look for 96 different patterns in the pixels. They can be hard to visualize, so let’s approach them by analogy. Remember, this is just for one filter.

Those depth layers are referred to as As images move through a convolutional network, we will describe them in terms of input and output volumes, expressing them mathematically as matrices of multiple dimensions in this form: 30x30x3.

Pooling layer is usually added to speed up computation and to make some of the detected features more robust. A CNN works by extracting features from images. Once you finish the parameter update on the last training example, hopefully the network should be trained well enough so that the weights of the layers are tuned correctly.

How do the filters in each layer know what values to have? A 4-D tensor would simply replace each of these scalars with an array nested one level deeper. Finally, to see whether or not our CNN works, we have a different set of images and labels (can’t double dip between training and test!) If they don’t, it will be low. Here’s a 2 x 3 x 2 tensor presented flatly (picture the bottom element of each 2-element array extending along the z-axis to intuitively grasp why it’s called a 3-dimensional array):In other words, tensors are formed by arrays nested within arrays, and that nesting can go on infinitely, accounting for an arbitrary number of dimensions far greater than what we can visualize spatially. In the diagram below, we’ve relabeled the input image, the kernels and the output activation maps to make sure we’re clear.What we just described is a convolution. Filter stride is one way to reduce dimensionality. Near it is a second bell curve that is shorter and wider, drifting slowly from the left side of the graph to the right. The more training data that you can give to a network, the more training iterations you can make, the more weight updates you can make, and the better tuned to the network is when it goes to production. Without even thinking twice, we’re able to quickly and seamlessly identify the environment we are in as well as the objects that surround us. However, let’s talk about what this convolution is actually doing from a high level. Final Words When I was new to the field, I tried to search around to understand CNN, turned out wasted a lot time on a lot of subjects that I can’t really understand. How do the filters in the first conv layer know to look for edges and curves? When we go through another conv layer, the output of the first conv layer becomes the input of the 2                Now that we can detect these high level features, the icing on the cake is attaching a                 Now, this is the one aspect of neural networks that I purposely haven’t mentioned yet and it is probably the most important part. I’d strongly encourage those interested to read up on them and understand their function and effects, but in a general sense, they provide nonlinearities and preservation of dimension that help to improve the robustness of the network and control overfitting.

Now picture that we start in the upper lefthand corner of the underlying image, and we move the filter across the image step by step until it reaches the upper righthand corner. You can think of Convolution as a fancy kind of multiplication used in signal processing.
Things not discussed in this post include the nonlinear and pooling layers as well as hyperparameters of the network such as filter sizes, stride, and padding. And now you know the magic behind how they use it. What we want the computer to do is to be able to differentiate between all the images it’s given and figure out the unique features that make a dog a dog or that make a cat a cat.

They detect low level features such as edges and curves. Convolutional networks deal in 4-D tensors like the one below (notice the nested array).The width and height of an image are easily understood. While RBMs learn to reconstruct and identify the features of each image as a whole, convolutional nets learn images in pieces that we call feature maps. We didn’t know what a cat or dog or bird was. So let’s get into the most important one.
When we have this filter at the top left corner of the input volume, it is computing multiplications between the filter and pixel values at that region.