Tuesday, March 22, 2016

Thurs, Mar 24: Learning Visual Biases from Human Imagination

Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015.


  1. The paper presents a new technique to learn visual biases and applies it in a classifier. The principle is that in a small dataset, human biases can boost the performance since they would act as shortcuts for the classifier. To learn this visual biases no real images were used. The workers from the Mechanical Turk were instructed to classify images from white noise (coming from a zero mean Gaussian distribution). Actually, the future map is sampled from the distribution and an inverse function takes it to the image (HOG and a CNN were used). To evaluate this linear classifier the PASCAL 2011 dataset was chosen. The classifier as expected had problems with formats (bottles and people) and colors (hydrants and anything red) for example, but since they learned from human bias, this behavior was expected.
    In the second part this biases were integrated in a SVM image classifier. A constraint that creates a hypercone around the SVM hyperplane was added into the function. This cone restricts the maximum angle that the SVM hyperplane can deviate from the biased template. Although the use of massive datasets outperforms the technique for small datasets, the classifier AP became 10% better in some cases.
    1 – According to the paper, the technique contribute more for classifiers trained with a small dataset, how to use these "biases" in a classifier that uses more information?
    2 – They sampled white noise in the feature space, why do they have more chance to create “meaningful images” when they applied the inverse transformation to get images?

  2. This paper explores a method for identifying and exploiting biases in the human visual system for object classification. The authors aim to use human trials to build up a model of the decision function that humans use to differentiate one class of objects from another. The authors do this by randomly sampling from the feature space of both HOG features and CNNs trained on ImageNet, then using the HOGgles algorithm to create corresponding sampled pictures that match the representation. These sampled images were then given to human test subjects who were asked to classify whether the image looked like a given object type. A model of the human visual classifier was then constructed using the difference of the mean of the human selected classes. The authors also visualized the mean of the human selections for several classes and showed that they roughly approximated the appearance of the target classes. Finally the authors explored using the learned human bias model for image classification with two different approaches. in the first, the authors simply used the human model as a classifier and in the second, the authors trained an SVM classification, but constrained the SVM decision boundary to be close to the human model. In their trials, showed both that the estimate of the human model, does in fact improve over random guessing and that incorporating human biases in classification could be helpful in scenarios where few positive examples are available.

    What would be the effect of using a different CNN network, different layers or a different distribution for sampling in feature space? It might be interesting to see how a full classification CNN classifies the same set of random images and compare it to humans. Is there a way that humans could guide the search process for random positive examples so that fewer samples need to be used?

  3. The authors attempt to train biases from the human visual system that have been identified by contemporary psychology into artificial computer vision classifiers. Random sampling was performed in HOG and CNN feature space, creating visually rich random images. These images were then shown to users on mechanical turk, who were asked to identify the images that most resembled a particular object category. By taking the mean of the resulting images, the authors were able to generate images that resembled their respective object categories. Then, using a human model that models these biases, the authors were able to train an SVM that roughly learns these biases, and found that this human model improved classification accuracy.

    Would this work with a much deeper model and/or on more difficult tasks, such as segmentation, etc?

    Are there additional biases in the human visual system that are not encoded by this model? How could these be better captured?

  4. In Learning Visual Biases from Human Imagination, Vondrick et al estimate human imagination templates for various image categories such as car, bottle and fire hydrant. They do this by generating random points in both HOG and CNN space, inverting this image into RGB image space, and showing this to human workers. These humans are asked whether they see a category in the noise image, and a human bias image is taken by averaging the worker images with positive predictions. They also introduce an SVM that is strongly constrained to be within a given amount of degrees to this discovered human bias template, and train SVM classifiers for these object categories on the Caltech 101 and PASCAL VOC datasets. The authors found a modest AP performance boost on classifiers constrained by human bias templates.

    Discussion: What does #pos=x mean in Figure 10?