Compressing Neural Networks with Distillation, Pruning and Quantization

The complete research thesis can be read if you click on the following pdf :

Visualization and Listening

To generate samples, we take a latent vector of size 131 and pass it to the teacher and the student. For each model, we take 2 same latent spaces and obtain 2 generations. The student generation should sound like his corresponding teacher generation, as it is the same latent vector used.

Model 1 : Only MSE Generation teacher, Generation student

Teacher

ggg

Student reconstruction

ggg

Model 2 : Add feature loss

Teacher

ggg

Student reconstruction

ggg

Model 3: Feature Loss + Classif Loss

A. Feature Loss + Classif Loss with Fake Data

Teacher

ggg

Student reconstruction

ggg

B. Feature Loss + Classif Loss with Real Data

Teacher

ggg

Student reconstruction

ggg

Model 4: Feature Loss + Classif Loss + change in Leaky Relu Last layer

Teacher

ggg

Student reconstruction

ggg