Project Description:
This project is about learning convolutional and Transformer neural networks for image classification. This project will implement, design and train different types of deep networks for scene recognition using PyTorch.
What excited me from this project:
In this part, we will implement the 2D convolution operation using PyTorch from scratch, including fold / unfold functions and matrix / tensor operations provied bby PyTorch.
(1) Forward Propagation
(2) Backward Propagation
In this part, we compare the loss and accuracy of Simple Covolution Network, Custom Convolution Network, Designed Convolution Network, Pre-Trained ResNet18, and Simple ViT.
Built the Simple ViT from scratch.
In this part, we will look at attention maps and adversarial samples. They present two critical aspects of deep neural networks: interpretation and robustness, and thus help us gain insight about these networks.
(1) Saliency map: We first compute the input gradient by minimizing the loss of predicted label (most confident prediction). Next, we will take the absolute values of the gradients and pick the maximum values aross three color channels. The magnitude of a pixel’s gradient indicated the importance of the pixel for the decision.
(2) Adversarial Samples: By mimizing the loss of an incorrect label and compute the gradient of the loss the input, we can create adversarial samples that will confuse a model.
(3) Adversarial Training: The key idea is to generate adversarial samples and feed these samples into network during traning.