Compressing Neural Networks with Distillation, Pruning and Quantization
The complete research thesis can be read if you click on the following pdf :
Visualization and Listening
To generate samples, we take a latent vector of size 131 and pass it to the teacher and the student. For each model, we take 2 same latent spaces and obtain 2 generations. The student generation should sound like his corresponding teacher generation, as it is the same latent vector used.
Model 1 : Only MSE Generation teacher, Generation student
Teacher
Student reconstruction
Model 2 : Add feature loss
Teacher
Student reconstruction
Model 3: Feature Loss + Classif Loss
A. Feature Loss + Classif Loss with Fake Data
Teacher
Student reconstruction
B. Feature Loss + Classif Loss with Real Data
Teacher
Student reconstruction
Model 4: Feature Loss + Classif Loss + change in Leaky Relu Last layer