Saliency and Object Detection



The human visual system can quickly identify regions in a scene that attract our attention (saliency detection) or contain objects (object detection). Such detection is typically driven by low-level features. For saliency detection, it is generally referred to as bottom-up saliency. In this project, we are developing techniques to automatically detect objects or salient objects from input images. We are looking at the detection problem using both bottom-up as well as top-down approaches.


Shengfeng He, Jianbo Jiao, Xiaodan Zhang, Guoqiang Han, and Rynson Lau, "Delving into Salient Object Subitizing and Detection," Proc. IEEE ICCV, Oct. 2017.


Input-Output: Given an input image, our network detects the number of salient objects in it and outputs a salient map containing the corresponding number of salient objects.

Abstract: Subitizing (i.e., instant judgement on the number) and detection of salient objects are human inborn abilities. These two tasks influence each other in the human visual system. In this paper, we delve into the complementarity of these two tasks. We propose a multi-task deep neural network with weight prediction for salient object detection, where the parameters of an adaptive weight layer are dynamically determined by an auxiliary subitizing network. The numerical representation of salient objects is therefore embedded into the spatial representation. The proposed joint network can be trained end-to-end using backpropagation. Experiments show the proposed multi-task network outperforms existing multi-task architectures, and the auxiliary subitizing network provides strong guidance to salient object detection by reducing false positives and producing coherent saliency maps. Moreover, the proposed method is an unconstrained method able to handle images with/without salient objects. Finally, we show state-of-the-art performance on different salient object datasets.


Shengfeng He and Rynson Lau, "Exemplar-Driven Top-Down Saliency Detection via Deep Association," Proc. IEEE CVPR, pp. 5723-5732, June 2016.


Input-Output: Given a number of exemplar images containing a specific type of objects and another query image, our network recognizes the common object type in the exemplar images and detect it from the query image.

Abstract: Top-down saliency detection is a knowledge-driven search task. While some previous methods aim to learn this "knowledge" from category-specific data, others transfer existing annotations in a large dataset through appearance matching. In contrast, we propose in this paper a locate-by-exemplar strategy. This approach is challenging, as we only use a few exemplars (up to 4) and the appearances among the query object and the exemplars can be very different. To address it, we design a two-stage deep model to learn the intra-class association between the exemplars and query objects. The first stage is for learning object-to-object association, and the second stage is to learn background discrimination. Extensive experimental evaluations show that the proposed method outperforms different baselines and the category-specific models. In addition, we explore the influence of exemplar properties, in terms of exemplar number and quality. Furthermore, we show that the learned model is a universal model and offers great generalization to unseen objects.


Shengfeng He, Rynson Lau, Wenxi Liu, Zhe Huang, and Qingxiong Yang, "SuperCNN: A Superpixelwise Convolution Neural Network for Salient Object Detection," International Journal of Computer Vision, 115(3):330-344, Dec. 2015.


Input-Output: Given an input image, our network detects the salient objects in it.

Abstract: Existing computational models for salient object detection primarily rely on hand-crafted features, which are only able to capture low-level contrast information. In this paper, we learn the hierarchical contrast features by formulating salient object detection as a binary labeling problem using deep learning techniques. A novel superpixelwise convolutional neural network approach, called SuperCNN, is proposed to learn the internal representations of saliency in an effi- cient manner. In contrast to the classical convolutional networks, SuperCNN has four main properties. First, the proposed method is able to learn the hierarchical contrast features, as it is fed by two meaningful superpixel sequences, which is much more effective for detecting salient regions than feeding raw image pixels. Second, as SuperCNN recovers the contextual information among superpixels, it enables large context to be involved in the analysis efficiently. Third, benefiting from the superpixelwise mechanism, the required number of predictions for a densely labeled map is hugely reduced. Fourth, saliency can be detected independent of region size by utilizing a multiscale network structure. Experiments show that SuperCNN can robustly detect salient objects and outperforms the state-of-the-art methods on three benchmark datasets.


Hao Du, Shengfeng He, Bin Sheng, Lizhaung Ma, and Rynson Lau, "Saliency-Guided Color-to-Gray Conversion using Region-based Optimization," IEEE Trans. on Image Processing, 24(1):434-443, Jan. 2015.

[paper] [suppl] [CSDD Dataset] [Results on CSDD] [Result on Cadik]


Input-Output: Given an input color image, our method converts it into an output grayscale image.

Abstract: Image decolorization is a fundamental problem for many real world applications, including monochrome printing and photograph rendering. In this paper, we propose a new color-to-gray conversion method that is based on a region-based saliency model. First, we construct a parametric color-to-gray mapping function based on global color information as well as local contrast. Second, we propose a region-based saliency model that computes visual contrast among pixel regions. Third, we minimize the salience difference between the original color image and the output grayscale image in order to preserve contrast discrimination. To evaluate the performance of the proposed method in preserving contrast in complex scenarios, we have constructed a new decolorization dataset with 22 images, each of which contains abundant colors and patterns. Extensive experimental evaluations on the existing and the new datasets show that the proposed method outperforms the state-of-the-art methods quantitatively and qualitatively.


Shengfeng He and Rynson Lau, "Saliency Detection with Flash and No-flash Image Pairs," Proc. ECCV, Part III, pp. 110-124, Sept. 2014.

[paper] [suppl] [dataset]


Input-Output: Given a pair of flash/no-flash images, our method outputs the corresponding salient map.

Abstract: In this paper, we propose a new saliency detection method using a pair of flash and no-flash images. Our approach is inspired by two observations. First, only the foreground objects are significantly brightened by the flash as they are relatively nearer to the camera than the background. Second, the brightness variations introduced by the flash provide hints to surface orientation changes. Accordingly, the first observation is explored to form the background prior to eliminate background distraction. The second observation provides a new orientation cue to compute surface orientation contrast. These photometric cues from the two observations are independent of visual attributes like color, and they provide new and robust distinctiveness to support salient object detection. The second observation further leads to the introduction of new spatial priors to constrain the regions rendered salient to be compact both in the image plane and in 3D space. We have constructed a new flash/no-flash image dataset. Experiments on this dataset show that the proposed method successfully identifies salient objects from various challenging scenes that the state-of-the-art methods usually fail.


Last updated in October 2017.