Data-driven Vision, CS 2951t Spring 2016: January 2016

Thursday, January 28, 2016

Thurs, Feb 4: Deep learning Tutorial

This class will be somewhat unusual in that we won't discuss a particular paper. We'll try to make sure that we understand deep learning well enough to follow the papers for the rest of the semester.

CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning.

CVPR 2014 tutorial

Tues, Feb 2: Crowdsourcing Detectors with Minimal Training

Tropel: Crowdsourcing Detectors with Minimal Training. Genevieve Patterson, Grant Van Horn, James Hays, Serge Belongie, Pietro Perona. Human Computation (HCOMP) 2015.

Tues, Feb 2: MS COCO

Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014.

COCO website

This is the first paper for which you'll post reading summaries. Here is the description of these summaries from the class website: Students will be expected to read one paper for each class. For each assigned paper, students must write a two or three sentence summary and identify at least one question or topic of interest for class discussion. Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions, connections to other research, uncertainty about the conclusions of the experiments, etc. Reading summaries must be posted to the class blog http://cs7476.blogspot.com/ by 11:59pm the day before each class. Feel free to reply to other comments on the blog and help each other understanding confusing aspects of the papers. The blog discussion will be the starting point for the class discussion. If you are presenting you don't need to post a summary to the blog.

Simply click on the comment link below this to post your short summary and one or more questions / discussion topics.

Partner Search

Hi Class. I forgot to mention that you can work on your semester project with a partner. If you don't know who you want to work with feel free to reply to this thread and perhaps say a bit about what project topics you had in mind, if any. E.g. "I'm James and I'm very interested object proposals or crowdsourcing strategies. Let me know if you want to chat about working together on a project".

Saturday, January 16, 2016

Example discussion -- Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views

I've left this discussion of a paper from Spring 2013 as an example. This paper is not on the syllabus for this year. Notice that both example responses contain a Summary of the paper and several Discussion questions that the student would like to ask in class.

Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views. Pierre-Yves Laffont, Adrien Bousseau, George Drettakis. TVCG 2013.

Project page.

Example Student Summary #1

This paper presents a new method for decomposing outdoor scenes into intrinsic images. This paper differs from previous papers by further decomposing the illumination component into components for illumination from the sun, the sky, and other scene objects (indirect illumination). Using multiple images of the scene, a sparse 3D point cloud is constructed. The reflectance and sun illumination for each point is learned using mean shift iterations which optimize the energies of regions of influence over candidate reflectance curves of the points. The sky and indirect illumination components are estimated by sending rays out from the 3D points and seeing which rays hit the sky and other objects (and contributing radiance to those illumination components.) The algorithm is effective for estimating the intrinsic images of rich outdoor scenes and allowing for their manipulation while keeping the scene consistent. It is limited by the need for a reasonably accurate estimation of the direction to the sun and its need for a reflective sphere (the sky) to capture an environment map.

Discussion:
How consistent do the images used in these multi-image techniques have to be for them to work correctly? Do the objects need to be static? Does the photographer need to be more or less revolving around some scene center where the camera is pointed? Or is it robust enough to handle more general movements? Can the reverse be used where the camera remains in a fixed position but rotates, creating a panorama that is stitched together (and decomposing it into intrinsic images)?

Example Student Summary #2

This paper presents a pipeline for decomposing intrinsic images using multiple photos of the same scene captured at the same time, and the pipeline is able to decompose the illumination layer of the image further into sun, sky and indirect lighting layers. As inputs to the pipeline, the user needs to capture and provide a set of LDR photos from different viewpoints, two HDR images of the front and side of the reflective sphere, and HDR images of the viewpoints that need to be decomposed. The pipeline starts by generating a point cloud reconstruction of the scene, a approximate geometric proxy o the scene, the direction and radiance of the sun, and a HDR environment map containing the sky and distant indirect radiance. The geometric proxy is then used to compute sky illumination, indirect illumination, and approximate sun visibility for each point, but a more refined estimation of sun visibility needs to be computed by forming curves of candidate reflectances in the color space and finding their intersections, and this estimation algorithm is a key contribution of this work. After having illuminations for the initial set of points, the illuminations are propagated to the entire scene using a method similar to Bousseau et al.'s method for propagating user specified constraints, and finally the three illumination layers are separated using two successive matting procedures.

Discussion:
In section 6 it is mentioned that the sun visibility estimation algorithm assumes that the scene is composed of a sparse set of reflectances shared by multiple points. Does this mean that the algorithm might not be ideal for scenes that have a more diverse set of reflectances (though I guess only indoor scenes tend to have a richer set of reflectances, which is not the focus of this paper anyway)? Also, is there a parameter that can be tuned to adjust the desired degree of sparsity?