Paper discussion blog for CS 2951t at Brown University. Instructor: Genevieve Patterson
The authors of this paper explore object recognition in sketches. They collect a large dataset of sketches using AMT, for object categories which are exhaustive, recognizable, and specific. They collect 20,000 sketches for 250 object categories, and manually filter the incorrect sketches. They conduct a second AMT task to estimate human performance on sketch recognition, and determine that humans have 73.1% accuracy (without significant learning/improvement over time).The authors build a visual vocabulary using k-means, where a sketch is a histogram of visual words. Most categories have either 1 or 2 clusters. Qualitatively, the clusters look reasonable - it's interesting that orientation seems to be iconic, with the exception of wristwatches. The SVM model performed better than the knn model.Discussion:I wonder how closely sketches would relate to verbal descriptions or captions for an object - in other words, is this iconic representation domain independent? For example, if asked to describe a bunny, I would probably say, "fluffy tail, two big ears" so those would be the features I would focus on.
The paper presents a new sketch dataset that comprises 20000 unique images in 250 object categories. It also defines a classifier for the sketches and compare it with human performance. The task to deal with sketches is hard in the sense that scales and poor representation in general. To create the dataset AMT was used and the workers were asked to sketch different categories previously defined, those were extracted from other datasets as LabelMe, Caltech 256 and others. For the human evaluation AMT was used again and as expected the errors were mainly related to semantic similar categories. To allow sketch classification a sketch representation that is similar to SIFT was designed. It does the histograms of orientation but restrict the feature for just orientation. A typical representation contains samples features for less than 1000 pixels in the image.Two approaches were used for the automatic classification task, KNN and SVM. For the best configuration in terms of accuracy (defined after experiments), the SVM clearly outperforms KNN. However, the performance is still almost 20% below a human being. The test application retrieves images as the users draws the sketch. No quantitative data was acquired in this task but according to the paper’s authors, the testers gave a positive feedback.DiscussionDo you have some idea about how the feedback for the application could be evaluated (Interactive sketch recognition)?For the human test experiment, a hierarchical structure was build, at least to display the images. In addition, the errors of the classifiers usually did not match the pattern of mistakes by humans (hierarchical mistakes). Do you think that including this type of information would improve the classifier?
The authors carry out the first comprehensive computer vision study of human-drawn sketches. Using mechanical turk, they first created a data set consisting of 20,000 unique sketches across 250 object categories, and evaluated the accuracy with which humans could correctly identify the object represented in each sketch (on average, humans have 73.1% accuracy). Then the authors set out to encode feature vectors from the sketches and train an SVM and a KNN to perform object classification across the sketches dataset. They used k-means clustering to build a visual vocabulary, transforming each sketch into a histogram of words within this vocabulary. In the end, the SVM outperformed the KNN, however both were more than 20% below human accuracy, leaving plenty to be addressed in future work.How well do off-the-shelf classification ANNs do on this dataset? Are there interesting examples where humans fail to identify the sketch, but the learned approach is able to make a correct identification? These cases are actually much more interesting than the clear-cut cases where the sketch was very well drawn.