Data-driven Vision, CS 2951t Spring 2016: 2016

Sunday, May 1, 2016

Tues, May 3: Unsupervised Representation Learning

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015.

Wednesday, April 27, 2016

Thurs, Apr 28: Transient Attributes

Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014.

Monday, April 25, 2016

Tues, Apr 26: Quizz: Targeted crowdsourcing with a billion (potential) users

Quizz: Targeted crowdsourcing with a billion (potential) users. Ipeirotis, Panagiotis G., and Evgeniy Gabrilovich. Proceedings of the 23rd international conference on World wide web. ACM, 2014.

Wednesday, April 20, 2016

Thurs, April 21: How do humans sketch objects?

How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012.

Sunday, April 17, 2016

Tues, Mar 19:Exploring Nearest Neighbor Approaches for Image Captioning

Exploring Nearest Neighbor Approaches for Image Captioning. Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C Lawrence Zitnick. arXiv, 2015.

Wednesday, April 13, 2016

Thurs, April 14: Visual Question Answering

VQA: Visual Question Answering. S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. ICCV, 2015.

Tuesday, April 5, 2016

Thurs, Mar 7: Deep Neural Decision Forests.

Deep Neural Decision Forests. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. ICCV 2015.

Tuesday, March 22, 2016

Thurs, Mar 24: Learning Visual Biases from Human Imagination

Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015.

Monday, March 21, 2016

Tues, Mar 22: Special Presentation by Zhile Ren

Three-Dimensional Object Detection and Layout using Clouds of Oriented Gradients. Zhile Ren and Erik B. Sudderth.

Tuesday, March 15, 2016

Thurs, Mar 17: What makes Paris look like Paris?

What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012.

Sunday, March 13, 2016

Tues, Mar 15: Learning Visual Similarity

Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015.

Also read:
Learning Deep Representations for Ground-to-Aerial Geolocalization. Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays. CVPR 2015.

Wednesday, March 9, 2016

Thurs, Mar 10: Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015.

Thursday, March 3, 2016

Tues, Mar 8: Fast and Faster R-CNN

Fast R-CNN. Ross Girshick. ICCV 2015.
(additionally, the faster version)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS 2015.

Friday, February 26, 2016

Thurs, Mar 3: DeepBox: Learning Objectness

DeepBox: Learning Objectness with Convolutional Networks. Weicheng Kuo, Bharath Hariharan, Jitendra Malik. ICCV 2015.

Sample Code for Pretrained Network Feature Extraction

https://github.com/genp/DataDrivenVision_SampleCode

Saturday, February 20, 2016

Thurs, Feb 25: Diagnosing error in object detectors

Diagnosing error in object detectors. Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. ECCV 2012.

Matlab code for generating Hoiem-style ROC curves: https://github.com/pdollar/coco/blob/master/MatlabAPI/CocoEval.m

Tuesday, February 16, 2016

Thurs. Feb 18: Understanding Deep Image Representations by Inverting Them.

Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015.

Tues Feb 26: Object Detectors Emerge in Deep Scene CNNs

Object Detectors Emerge in Deep Scene CNNs. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba. ICLR, 2015.
Supplemental: Learning Deep Features for Scene Recognition using Places Database. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014.

Wednesday, February 10, 2016

Thurs, Feb 11: SUN Attributes

The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. Genevieve Patterson, Chen Xu, Hang Su, James Hays. IJCV 2014.

Monday, February 8, 2016

Tues, Feb. 9: AlexNet

ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012.

Thursday, January 28, 2016

Thurs, Feb 4: Deep learning Tutorial

This class will be somewhat unusual in that we won't discuss a particular paper. We'll try to make sure that we understand deep learning well enough to follow the papers for the rest of the semester.

CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning.

CVPR 2014 tutorial

Tues, Feb 2: Crowdsourcing Detectors with Minimal Training

Tropel: Crowdsourcing Detectors with Minimal Training. Genevieve Patterson, Grant Van Horn, James Hays, Serge Belongie, Pietro Perona. Human Computation (HCOMP) 2015.

Tues, Feb 2: MS COCO

Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014.

COCO website

This is the first paper for which you'll post reading summaries. Here is the description of these summaries from the class website: Students will be expected to read one paper for each class. For each assigned paper, students must write a two or three sentence summary and identify at least one question or topic of interest for class discussion. Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions, connections to other research, uncertainty about the conclusions of the experiments, etc. Reading summaries must be posted to the class blog http://cs7476.blogspot.com/ by 11:59pm the day before each class. Feel free to reply to other comments on the blog and help each other understanding confusing aspects of the papers. The blog discussion will be the starting point for the class discussion. If you are presenting you don't need to post a summary to the blog.

Simply click on the comment link below this to post your short summary and one or more questions / discussion topics.

Partner Search

Hi Class. I forgot to mention that you can work on your semester project with a partner. If you don't know who you want to work with feel free to reply to this thread and perhaps say a bit about what project topics you had in mind, if any. E.g. "I'm James and I'm very interested object proposals or crowdsourcing strategies. Let me know if you want to chat about working together on a project".

Saturday, January 16, 2016

Example discussion -- Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views

I've left this discussion of a paper from Spring 2013 as an example. This paper is not on the syllabus for this year. Notice that both example responses contain a Summary of the paper and several Discussion questions that the student would like to ask in class.

Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views. Pierre-Yves Laffont, Adrien Bousseau, George Drettakis. TVCG 2013.

Project page.

Example Student Summary #1

This paper presents a new method for decomposing outdoor scenes into intrinsic images. This paper differs from previous papers by further decomposing the illumination component into components for illumination from the sun, the sky, and other scene objects (indirect illumination). Using multiple images of the scene, a sparse 3D point cloud is constructed. The reflectance and sun illumination for each point is learned using mean shift iterations which optimize the energies of regions of influence over candidate reflectance curves of the points. The sky and indirect illumination components are estimated by sending rays out from the 3D points and seeing which rays hit the sky and other objects (and contributing radiance to those illumination components.) The algorithm is effective for estimating the intrinsic images of rich outdoor scenes and allowing for their manipulation while keeping the scene consistent. It is limited by the need for a reasonably accurate estimation of the direction to the sun and its need for a reflective sphere (the sky) to capture an environment map.

Discussion:
How consistent do the images used in these multi-image techniques have to be for them to work correctly? Do the objects need to be static? Does the photographer need to be more or less revolving around some scene center where the camera is pointed? Or is it robust enough to handle more general movements? Can the reverse be used where the camera remains in a fixed position but rotates, creating a panorama that is stitched together (and decomposing it into intrinsic images)?

Example Student Summary #2

This paper presents a pipeline for decomposing intrinsic images using multiple photos of the same scene captured at the same time, and the pipeline is able to decompose the illumination layer of the image further into sun, sky and indirect lighting layers. As inputs to the pipeline, the user needs to capture and provide a set of LDR photos from different viewpoints, two HDR images of the front and side of the reflective sphere, and HDR images of the viewpoints that need to be decomposed. The pipeline starts by generating a point cloud reconstruction of the scene, a approximate geometric proxy o the scene, the direction and radiance of the sun, and a HDR environment map containing the sky and distant indirect radiance. The geometric proxy is then used to compute sky illumination, indirect illumination, and approximate sun visibility for each point, but a more refined estimation of sun visibility needs to be computed by forming curves of candidate reflectances in the color space and finding their intersections, and this estimation algorithm is a key contribution of this work. After having illuminations for the initial set of points, the illuminations are propagated to the entire scene using a method similar to Bousseau et al.'s method for propagating user specified constraints, and finally the three illumination layers are separated using two successive matting procedures.

Discussion:
In section 6 it is mentioned that the sun visibility estimation algorithm assumes that the scene is composed of a sparse set of reflectances shared by multiple points. Does this mean that the algorithm might not be ideal for scenes that have a more diverse set of reflectances (though I guess only indoor scenes tend to have a richer set of reflectances, which is not the focus of this paper anyway)? Also, is there a parameter that can be tuned to adjust the desired degree of sparsity?