Convolution / ViT / Adversarial samples

Project Description:

This project is about learning convolutional and Transformer neural networks for image classification. This project will implement, design and train different types of deep networks for scene recognition using PyTorch.

Idenitifying important image regions for classification using covolution from scratch and transformer
Generating adversarial samples to confuse the model
Training models to defend against those adverarial samples

What excited me from this project:

1. Understand Convolution:

In this part, we will implement the 2D convolution operation using PyTorch from scratch, including fold / unfold functions and matrix / tensor operations provied bby PyTorch.

(1) Forward Propagation

(2) Backward Propagation

2. Design and Train a Deep Neural Network

In this part, we compare the loss and accuracy of Simple Covolution Network, Custom Convolution Network, Designed Convolution Network, Pre-Trained ResNet18, and Simple ViT.

Built the Simple ViT from scratch.

3. Attention and Adversarial Samples

In this part, we will look at attention maps and adversarial samples. They present two critical aspects of deep neural networks: interpretation and robustness, and thus help us gain insight about these networks.

(1) Saliency map: We first compute the input gradient by minimizing the loss of predicted label (most confident prediction). Next, we will take the absolute values of the gradients and pick the maximum values aross three color channels. The magnitude of a pixel’s gradient indicated the importance of the pixel for the decision.

(2) Adversarial Samples: By mimizing the loss of an incorrect label and compute the gradient of the loss the input, we can create adversarial samples that will confuse a model.

Making class of least confidence have value 1 and others 0.
backprob to get the gradients at input on the proxy label
Do gradient descent on input image with step size and make suer the differnce of imput image doesn’t exceed epsilon

(3) Adversarial Training: The key idea is to generate adversarial samples and feed these samples into network during traning.