Data-driven Vision, CS 2951t Spring 2016: Thurs, Apr 28: Transient Attributes

Wednesday, April 27, 2016

Thurs, Apr 28: Transient Attributes

Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014.

4 comments:

AnonymousApril 27, 2016 at 8:28 PM
The paper presents a new dataset for transient attributes; it also defines an application that allows the user to edit the transient attributes of an image. The dataset is from 101 webcams and each one contributes with 60-120 frames. To select the attributes the authors started with 92, but after the use of AMT workers to select the features with greater incidence the number dropped to 40. The AMT was also used to tag the images. The workers could define discrete labels to each attribute (0, 0.5, 1) and them with the reliability of each user the attributes of each image is defined. The reliability is defined according to a Gaussian model and the variance of each worker.
For attribute prediction in new images three classifiers are trained(SVM, log reg and SVR), they all use human crafted features(HOG, GIST and others). As expected the more subjective the attribute are the lower the precision of the classifier. Additionally, more positives results generate better results (the comparison between random and holdout split clearly indicates it).
The main application of the dataset; however, is attribute guided editing. The user selects an image and a set of features that he wants to modify, in addition to the degree. First, the system selects the six closest images, in terms of image features, and limited to one per webcam(this set of images are called M). From M, images from the same webcam that have the desired attribute are selected(called T). The next step is map changes from M to T, since they are from the same webcam. These changes are recorded and then applied to the image that will be transformed.
Discussion
In the paper, it is discussed that the probability of each attribute in an image depends on the scene and is not globally defined. Do you have in mind, how this global attributes could be universally defined?
In section 5, appearance transfer is discussed in terms of transferring color statistics. Is it a different approach to the one proposed in section 6? Or this is the one that the section 6 method is compared with?
How is the degree of change in an attribute defined?

ReplyDelete
Replies
UnknownApril 27, 2016 at 8:59 PM
This paper introduces a new dataset and new techniques for working with "transient attributes" of outdoor images, where transient attributes are high-level image attributes that may vary when viewing a given scene at different times. Examples include different seasons, temperatures, lighting cues, etc. The authors first collected a dataset consisting of images from 101 different static webcams taken at different times. They used amazon mechanical turk workers to narrow down a set of potential transient attributes to those that were represented in some, but not all of the collected images, then used workers to classify how much each attribute was represented in each image, averaging to get a value between 0 and 1. Next the authors experimented with 3 different methods for predicting the attributes of a given image and found that support vector regressors are able to accurately predict the attributes associated with each image. Finally, the authors developed a system for transforming an image to match a desired attribute label. This system first identifies similar scenes in the database, then selects images from the same webcam that exhibit the desired attribute, allowing the user to select the specific image that best represents their desired transformation. The authors precompute a library of local pixel transformations for each pair of images in the dataset then apply these local transformations to similar local areas in the input image to get a image of the scene with the desired attribute. The authors compare the results of their system to previous results and find that their system generally outperform the previous state of the art for this task.

Discussion
Are their any examples of results for applying more difficult attributes such as "busy" and could these be applied with enough training data?
ReplyDelete
Replies
UnknownApril 27, 2016 at 9:01 PM
In this paper, the author detect transient attributes in order to edit outdoor scenes. Using a data set of 101 webcams across different conditions, the authors collect labels for 40 transient attributes in the following four categories: weather, lighting, season, and subjective impressions. They use AMT to collect labels, using control images to determine worker reliability.
The authors train models which recognize the presence of attributes (SVR performs best). They use these models in the next phase, which allows a user to edit an image by declaring which features should be present and then choosing from candidate images. Finally, the results are tested by a user study, in which the subjects declared whether or not an image looked more like a real photograph and more convincingly had an attribute than another image.

Discussion:
The subjective transient features seem to be more difficult for the classifiers (SVR has comparatively low accuracy on "soothing" and "gloomy"); is this because the humans performing the labeling disagreed on some images?
What happens if an input image already has a feature? For example, if I start with a nighttime image, and select more dark, will it overreact, or does it account for what is already in the original image?
ReplyDelete
Replies
UnknownApril 27, 2016 at 11:32 PM
The authors present a novel architecture that learns to hallucinate scene attributes onto real world images, for example, turning a day scene into a night scene, but even going as far as changing not just lighting but also texture, changing a summer picture to a winter scene. The authors first put together a data-set compiled from 101 outdoor webcams captured over long time. They then label these images using crowd sourcing to identify which scene attributes emerge. The authors then train regressors that recognize these transient attributes in unseen outdoor scene images. Using these regressors, the authors are able to perform high-level images editing by teaching networks to automatically "hallucinate" various scene attributes onto a source image. The results are strikingly photo-realistic, even for complicated transient attributes such as winter.

Discussion:

The next natural step seems to be something with segmentation. How could segmentation aid an application like this? Would it be possible to have, say an image of a winding river, and tell the system to winterize the left side of the river, and summerize, the right side of the river?
ReplyDelete
Replies

Add comment