Sunday, May 1, 2016

Tues, May 3: Unsupervised Representation Learning

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015.


  1. In Unsupervised Representational Learning with Deep Convolutional Generative Adversarial Networks, Redford et al propose methods and techniques to extend the success of convolutional neural networks from supervised datasets to unsupervised ones. One such technique is to cut max pooling layers from traditional deep networks and use only convolutional layers, so that the network is able to learn optimal striding parameters that are traditional hand-set in max pooling architectures. Other techniques include normalizing images and layers in a training patch to have mean zero and unit variance, and using alternatives to a traditional fully connected final set of layers. Their networks trained on unlabelled datasets were able to extract features, so that models trained on these features on labelled datasets were able to offer competitive performance on datasets such as CIFAR-10, which contains 600 thousand labelled tiny images, and Street View House Numbers (SVHN), house number images captured from street views. Finally linearity of these deep features were explored to generate images according to equations such as man with glasses - man without glasses + woman without glasses = woman with glasses.

    Discussion: What exactly are the generative adversarial networks that the authors are improving on? In particular, how do they generate the novel images the authors present in their paper? Also is the batch normalizing the authors use during training normalization of image pixel values or of network parameters?

  2. The paper proposes a new neural network architecture for unsupervised learning. In addition, features created by it were used as input for supervised tasks. The proposed architect allows a deeper model and it is more stable than previous approaches. The four step modifications consists in: substitute max pooling layers for strided convolutions, substitute fully connected layers for a global polling, batch normalization in the inner layers and ReLU activation in most layers(just excluding the first). The network was trained on three data sets (LSUN, Imagenet-1k and Faces). To evaluate the quality of the generated model features of the trained model were used for image classification on CIFAR-10 using K means. It does not get the state of art performance but it proves that effectiveness of the network.
    The paper presents two main experiments with the network. The first demonstrated that the network learns a hierarchy of features. This is demonstrated when objects vanished from he created images as a thresholds were applied to the weights. The other experiment was using the operations over the internal representation of the generators.

    The paper describes a series of measurements to adapt a network into a Deep Convolutional GAN. Which architecture did they perform these modifications over?
    How is the training performed for the DCGAN?

  3. This paper analyzes Convolutional Generative Adversarial Networks and their performance on various image understanding and representation tasks. The authors first show that a number of recent techniques from CNN literature including: all convolutional nets, global average pooling, batch normalization and a combination of ReLU, tan and leaky rectified activation functions, together improved the stability and speed of training convolutional generative adversarial networks. The authors trained their network on 3 different datasets: the LSUN bedrooms dataset, the imagined 1k dataset and a custom dataset of faces scrapped from the internet. The authors then tested the power of the network features learned using their strategy by using the descriminator part of their DCGAN as a feature extractor for a linear SVM and test this model on two further datasets: CIFAR-10 and SVHN digits. In both cases the network get state-of-the-art or near state-of-the-art accuracy. The authors then demonstrated various properties of the generative network with further experiments. First the authors chose random points in the random latent space for the generate network and interpolated between them to show that the transitions between the corresponding images are smooth.The authors also visualized the activations of the discriminator network to show that activations correspond to object shapes. Finally the authors showed that manipulations in the latent space can be used to control the generated samples in useful ways.

    What is the architecture of the discriminator network? Why is the 32x32 CIFAR-10 dataset useful compared to larger image datasets and why is unsupervised feature extraction better in this case than normal convolutional networks? How are the Z vectors for different concepts generated in the last section?

  4. The authors of Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks propose a method to perform unsupervised learning with CNNs. They make the following changes to GANs: they replace pooling layers with strided convolutions, removed fully connected hidden layers, used batch normalization while training, and used ReLU activation in the generator and discriminator. The authors trained on three datasets (Large-scale Scene Understanding, Imagenet-1k, and Faces).

    The authors tested their GANs on CIFAR-10 and SVHN. While the GANs were not state-of-the-art in CIFAR-10, they achieve state-of-the-art performance on SVHN (and outperforms a supervised CNN with the same architecture).

    The authors manipulate the generator representation by forgetting to draw certain objects (windows) and performing vector arithmetic on face samples. Both tests perform well, resulting in images which look good.

    What is a Generative Adversarial Network (GAN)?
    How does vector arithmetic work? Specifically, what are Z representations?