Paper discussion blog for CS 2951t at Brown University. Instructor: Genevieve Patterson
In Diagnosing Error in Object Detectors, Hoeim et al aim to refine the notion of average precision in grading object detectors. They argue that AP hides many important details, while ROC curves are still too crude. Their overriding goal is to identify which improvements in an object detector or localizer would lead to the biggest impact on better object detection. They analyze errors due to different factors, such as recognizing more smaller objects, or objects that are heavily occluded, or miscategorizing objects in similar categories. They apply their analysis to two current object detectors, VGVZ and FGMR. One take-away is that better detection of smaller objects would lead to a big impact on AP in these systems, misdirection of occluded objects would not, and the type of misclassification error is dependent on the kind of object being detected. Discussion: One question that I had was how their proposed normalized precision metric is more informative or better than a standard precision metric like TP/P*, where P* is the total number of positive predicted classes. To put the issues in another way, why does their normalized precision metric include an N term at all?
The paper propose new methods to analyze and evaluate object detectors and a method to identify weaknesses, possibly, in the algorithms. Assuming that average precision and accuracy are not illustrative enough, especially about the causes of the results. As a result, it is presented a methodology to consider different characteristics of the object as size, visible parts and viewpoint. Comparing to state of art object detectors, by that the time, false positives and false negatives were analyzed. The paper also defines a variation of precision that ignores the influence of number of objects in the class. For false positives, it was noticed a few importance of background false detection in the results and more concentration of misdetections of similar categories in better ranked classifiers. In false negatives, the poor performance with small objects especially due to not only size but unusual viewpoints and occlusions was an important result. DiscussionWhat is the current state of detection for small objects?Recent papers in object detection as Faster R-CNN does not perform this detailed analysis of misdetections. Is this kind of analysis harder because of the nature of the current detector (based on enormous amount of data and with restricted understanding of the detector)?
This paper presents a new set of analysis tools for better determining the sources of errors made by object detectors. The authors specifically test two state-of-the-art (at the time) object detectors: FGMR and VGVZ. The authors first broke down false positive errors into 4 subcategories: localization errors, confusions with similar objects, confusions with other objects and confusion with the background. The authors analyzed the performance of object detectors on the PASCAL VOC dataset and found that localization errors and similar object confusions were the largest sources of false positives. The authors also analyzed the causes of false negatives by assigning objects in the dataset further labels based on features that could cause them to be missed by object detectors, such as: size, aspect ratio, occasion, viewpoint and visibility of parts. The authors found that false negatives are caused by a variety of factors with size being the most important. The authors conclude with a discussion of possible ways to address different types of detection errors.Discussion: How specific are these error trends to the dataset and detectors? Would larger datasets with more object categories show similar trends?
In Diagnosing Error in Object Detectors, the authors analyze the effects of a number of different sources of error on two different classification methods, FGMR and VGVZ. They make distinctions between the types of mistakes that are made, such as localization errors or confusion with similar objects, dissimilar objects, or the background. They also define some object characteristics, which they attempt to use to estimate prediction confidence of their classifiers for some objects. Their primary goal is to identify areas of object detection which are causing the most error.Discussion: One of the conclusions that they draw in the paper is that, when the performance of both FGMR and VGVZ vary in similar ways, "their sensitivities may be due to some objects being intrinsically more difficult to recognize." Is there any way to tell the difference between something that is difficult to recognize, and something for which we didn't have much training data? Extending this, it might be interesting to compare these classification methods to a human's performance.
The authors develop a more sophisticated way of quantifying "average precision" as it applies to object detectors, with a major focus on identifying the specific features and components of particular object detectors and localizers that make them effective. Things like heavy occlusion, scattering and clustering of objects, recognition of the difference between foreground and background, are assessed as a part of their precision scoring. The also develop a prediction routine capable of giving a confidence measure in the effectiveness of a particular classifier on a particular set of objects based on object characteristics. The result is they are able to make much better-informed judgement about the effectiveness of various object detectors and localizers.Discussion: An even better metric for segmentation/localization accuracy, as I discuss in my NRL technical report on RAPTOR (Kelly S, 2014), might be true 3D distance and 3D pose displacement. This can only be found if 3D models of the object(s) in question are properly oriented and embedded in a 3D version of each 2D scene. Most existing work on segmentation only considers accuracy as a 2D phenomenon, even though we are considering 3D objects.