Mirror and Glass Detection/Segmentation

 

In this project, we are developing techniques for mirror and glass detection/segmentation. While a mirror is a reflective surface that reflects the scene in front of it, glass is a transparent surface that transmits the scene from the back side and often also reflects the scene in front of it too. In general, both mirrors and glass do not have their own visual appearances. They only reflect/transmit the appearances of their surroundings.

As mirrors and glass do not have their own appearances, it is not straightforward to develop automatic algorithms to detect and segment them. However, as they appear everywhere in our daily life, it can be problematic if we are not able to detect them reliably. For example, a vision-based depth sensor may falsely estimate the depth of a piece of mirror/glass as the depth of the objects inside it, a robot may not be aware of the presence of a mirror/glass wall, and a drone may collide into a high rise (noted that most high rises are covered by glass these days).

To the best of our knowledge, my team is the first to develop computational models for automatic detection and segmentation of mirror and transparent glass surfaces. Although there have been some works that investigate the detection of transparent glass objects, these methods mainly focus on detecting wine glass and small glass objects, which have some special visual properties that can be used for detection. Unlike these works, we are more interested in detecting general glass surfaces that may not possess any special properties of their own.

We are also interested in exploring the application of our mirror/glass detection methods in autonomous navigation.

Exploiting Semantic Relations for Glass Surface Detection [paper] [model] [dataset]

Jiaying Lin*, Yuen Hei Yeung*, and Rynson Lau (* joint first authors)

Proc. NeurIPS, Nov. 2022

Visual comparisons of our method to state-of-the-art methods for glass surface detection [8, 18] on some example images.

Input-Output: Given an input image, our network outputs a binary mask that indicates where glass surfaces are.

Abstract. Glass surfaces are omnipresent in our daily lives and often go unnoticed by the majority of us. While humans are generally able to infer their locations and thus avoid collisions, it can be difficult for current object detection systems to handle them due to the transparent nature of glass surfaces. Previous methods approached the problem by extracting global context information to obtain priors such as boundary and reflection. However, their performances cannot be guaranteed when these critical features are not available. We observe that humans often reason through the semantic context of the environment, which offers insights into the categories of and proximity between entities that are expected to appear in the surrounding. For example, the odds of co-occurrence of glass windows with walls and curtains is generally higher than that with other objects such as cars and trees, which have relatively less semantic relevance. Based on this observation, we propose a model that integrates the contextual relationship of the scene for glass surface detection with two novel modules: (1) Scene Aware Activation (SAA) Module to adaptively filter critical channels with respect to spatial and semantic features, and (2) Context Correlation Attention (CCA) Module to progressively learn the contextual correlations among objects both spatially and semantically. In addition, we propose a large-scale glass surface detection dataset named GSD-S, which contains 4,519 real-world RGB glass surface images from diverse real-world scenes with detailed annotations. Experimental results show that our model outperforms contemporary works, especially with 48.8% improvement on MAE from our proposed GSD-S dataset.

Mirror Detection with the Visual Chirality Cue [paper] [code]

Xin Tan, Jiaying Lin, Ke Xu, Pan Chen, Lizhuang Ma, and Rynson Lau

IEEE Trans. on Pattern Analysis and Machine Intelligence (accepted)

Existing single image based mirror detection methods [42] [21], which are based on modeling contrasts/correspondences between mirror and non-mirror regions, may fail when these relations are not reliable. For example, MirrorNet [42] would fail if the contrasts between mirror/non-mirror regions are weak (top row) or have multiple degrees (bottom row). PMDNet [21] would fail if correspondences do not exist (top row) or are incorrectly detected (bottom row). Our method (Ours) leverages the visual chirality cue, which is an intrinsic property of mirrors reflecting real-world scenes, to accurately differentiate mirror and non-mirror regions.

Input-Output: Given an input image, our network outputs a binary mask that indicates where mirrors are.

Abstract. Mirror detection is challenging because the visual appearances of mirrors change depending on those of their surroundings. As existing mirror detection methods are mainly based on extracting contextual contrast and relational similarity between mirror and non-mirror regions, they may fail to identify a mirror region if these assumptions are violated. Inspired by a recent study of applying a CNN to help distinguish whether an image is flipped or not based on the visual chirality property, in this paper, we rethink this image-level visual chirality property and reformulate it as a learnable pixel level cue for mirror detection. Specifically, we first propose a novel flipping-convolution-flipping (FCF) transformation to model visual chirality as learnable commutative residual. We then propose a novel visual chirality embedding (VCE) module to exploit this commutative residual in multi-scale feature maps, to embed the visual chirality features into our mirror detection model. Besides, we also propose a visual chirality-guided edge detection (CED) module to integrate the visual chirality features with contextual features for detection refinement. Extensive experiments show that the proposed method outperforms state-of-the-art methods on three benchmark datasets.

Large-Field Contextual Feature Learning for Glass Detection [paper] [code]

Haiyang Mei, Xin Yang, Letian Yu, Qiang Zhang, Xiaopeng Wei, and Rynson Lau

IEEE Trans. on Pattern Analysis and Machine Intelligence (accepted)

The pipeline of the proposed GDNet-B. First, we use the pre-trained ResNeXt-101 [75] as a multi-level feature extractor (MFE) to obtain features of different levels. Second, we embed four LCFI modules to the last four layers of MFE, to learn large-field contextual features at different levels. Third, the outputs of the last three LCFI modules are concatenated and fused via an attention module [76] to generate high-level large-field contextual features. An attention map is then learned from these high-level large-field contextual features and used to guide the low-level large-field contextual features, i.e., the output of the first LCFI module, to focus more on glass regions. Fourth, we apply two BFE modules on the highlevel/attentive low-level large-field contextual features to further perceive and integrate boundary cues. Finally, we combine high-level and attentive low-level large-field contextual features by concatenation and attention [76] operations to produce the final glass detection map.

Input-Output: Given an input image, our network outputs a binary mask that indicates where glass surfaces are.

Abstract. Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RGB image. To address this problem, we construct the first large-scale glass detection dataset (GDD) and propose a novel glass detection network, called GDNet-B, which explores abundant contextual cues in a large field-of-view via a novel large-field contextual feature integration (LCFI) module and integrates both high-level and low-level boundary features with a boundary feature enhancement (BFE) module. Extensive experiments demonstrate that our GDNet-B achieves satisfying glass detection results on the images within and beyond the GDD testing set. We further validate the effectiveness and generalization capability of our proposed GDNet-B by applying it to other vision tasks, including mirror segmentation and salient object detection. Finally, we show the potential applications of glass detection and discuss possible future research directions.

Learning Semantic Associations for Mirror Detection [paper] [suppl] [code] [dataset]

Huankang Guan, Jiaying Lin, and Rynson Lau

Proc. IEEE CVPR, June 2022

Existing mirror detection methods based on learning contextual contrasts [47] or corresponding relations [24] falsely identify some distractors (e.g., the doorway in the 1st row and the painting in the 2nd row) as mirrors, and miss the mirror (3rd row) when the mirror is captured at an oblique angle to the camera along with some occluding lights. In contrast, our method considers the semantic associations between mirrors and their surrounding objects (e.g., the vanity table in the 1st row and the sink in the 2nd and 3rd rows), yielding accurate results.

Input-Output: Given an input image, our network outputs a binary mask that indicates where mirrors are.

Abstract. Mirrors generally lack a consistent visual appearance, making mirror detection very challenging. Although recent works that are based on exploiting contextual contrasts and corresponding relations have achieved good results, heavily relying on contextual contrasts and corresponding relations to discover mirrors tend to fail in complex real-world scenes, where a lot of objects, e.g., doorways, may have similar features as mirrors. We observe that humans tend to place mirrors in relation to certain objects for specific functional purposes, e.g., a mirror above the sink. Inspired by this observation, we propose a model to exploit the semantic associations between the mirror and its surrounding objects for a reliable mirror localization. Our model first acquires class-specific knowledge of the surrounding objects via a semantic side-path. It then uses two novel modules to exploit semantic associations: 1) an Associations Exploration (AE) Module to extract the associations of the scene objects based on fully connected graph models, and 2) a Quadruple-Graph (QG) Module to facilitate the diffusion and aggregation of semantic association knowledge using graph convolutions. Extensive experiments show that our method outperforms the existing methods and sets the new state-of-the-art on both PMD dataset (f-measure: 0.844) and MSD dataset (f-measure: 0.889).

Rich Context Aggregation with Reflection Prior for Glass Surface Detection [paper] [suppl] [code] [dataset]

Jiaying Lin, Zebang He, and Rynson Lau

Proc. IEEE CVPR, June 2021

Two popular scenarios where existing methods [20, 30] fail. GDNet [20] is based on extracting/integrating abundant context features for glass surface detection. As it does not consider any specific glass properties, it tends to fail in scenes with insufficient contexts (e.g., top row where the glass surface covers almost the whole image) or with glass-lookalike regions (e.g., bottom row where the center region is not covered by glass). TransLab [30] is based on a boundary-guided network for transparent object detection. It also fails to detect glass surfaces correctly. Our method, which considers reflections and boundaries, can accurately detect the glass surfaces in these complex scenes.

Input-Output: Given an input image, our network outputs a binary mask that indicates where glass surfaces are.

Abstract. Glass surfaces appear everywhere. Their existence can however pose a serious problem to computer vision tasks. Recently, a method is proposed to detect glass surfaces by learning multi-scale contextual information. However, as it is only based on a general context integration operation and does not consider any specific glass surface properties, it gets confused when the images contain objects that are similar to glass surfaces and degenerates in challenging scenes with insufficient contexts. We observe that humans often rely on identifying reflections in order to sense the existence of glass and on locating the boundary in order to determine the extent of the glass. Hence, we propose a model for glass surface detection, which consists of two novel modules: (1) a rich context aggregation module (RCAM) to extract multi-scale boundary features from rich context features for locating glass surface boundaries of different sizes and shapes, and (2) a reflection-based refinement module (RRM) to detect reflection and then incorporate it so as to differentiate glass regions from non-glass regions. In addition, we also propose a challenging dataset consisting of 4,012 glass images with annotations for glass surface detection. Our experiments demonstrate that the proposed model outperforms state-of-the-art methods from relevant fields.

Progressive Mirror Detection [paper] [suppl] [code] [dataset]

Jiaying Lin, Guodong Wang, and Rynson Lau

Proc. IEEE CVPR, June 2020

Visualization of our progressive approach to recognizing mirrors from a single image. By finding correspondences between objects inside and outside of the mirror and then explicitly locating the miror edges, we can detect the mirror region more reliably.

Input-Output: Given an input image, our network outputs a binary mask that indicates where mirrors are.

Abstract. The mirror detection problem is important as mirrors can affect the performances of many vision tasks. It is a difficult problem since it requires an understanding of global scene semantics. Recently, a method was proposed to detect mirrors by learning multi-level contextual contrasts between inside and outside of mirrors, which helps locate mirror edges implicitly. We observe that the content of a mirror reflects the content of its surrounding, separated by the edge of the mirror. Hence, we propose a model in this paper to progressively learn the content similarity between the inside and outside of the mirror while explicitly detecting the mirror edges. Our work has two main contributions. First, we propose a new relational contextual contrasted local (RCCL) module to extract and compare the mirror features with its corresponding context features, and an edge detection and fusion (EDF) module to learn the features of mirror edges in complex scenes via explicit supervision. Second, we construct a challenging benchmark dataset of 6,461 mirror images. Unlike the existing MSD dataset, which has limited diversity, our dataset covers a variety of scenes and is much larger in scale. Experimental results show that our model outperforms relevant state-of-the-art methods.

Don’t Hit Me! Glass Detection in Real-world Scenes [paper] [suppl] [code] [dataset]

Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, and Rynson Lau

Proc. IEEE CVPR, June 2020

Problems with glass in existing vision tasks. In depth prediction, existing method [16] wrongly predicts the depth of the scene behind the glass, instead of the depth to the glass (1st row of (b)). For instance segmentation, Mask RCNN [9] only segments the instances behind the glass, not aware that they are actually behind the glass (2nd row of (b)). Besides, if we directly apply an existing singe-image reflection removal (SIRR) method [36] to an image that is only partially covered by glass, the non-glass region can be corrupted (3rd row of (b)). GDNet can detect the glass (c) and then correct these failure cases (d).

Input-Output: Given an input image, our network outputs a binary mask that indicates where glass surfaces are.

Abstract. Transparent glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass, and the content within the glass region is typically similar to those behind it. In this paper, we propose an important problem of detecting glass from a single RGB image. To address this problem, we construct a large-scale glass detection dataset (GDD) and design a glass detection network, called GDNet, which explores abundant contextual cues for robust glass detection with a novel large-field contextual feature integration (LCFI) module. Extensive experiments demonstrate that the proposed method achieves more superior glass detection results on our GDD test set than state-of-the-art methods fine-tuned for glass detection.

Where is My Mirror? [paper] [suppl] [code and updated results] [dataset]

Xin Yang*, Haiyang Mei*, Ke Xu, Xiaopeng Wei, Baocai Yin, and Rynson Lau (* joint first authors)

Proc. IEEE ICCV, Oct. 2019

Problems with mirrors in existing vision tasks. In depth prediction, NYU-v2 dataset [32] uses a Kinect to capture depth as ground truth. It wrongly predicts the depths of the reflected contents, instead of the mirror depths (b). In instance semantic segmentation, Mask RCNN [12] wrongly detects objects inside the mirrors (c). With MirrorNet, we first detect and mask out the mirrors (d). We then obtain the correct depths (e), by interpolating the depths from surrounding pixels of the mirrors, and segmentation maps (f).

Input-Output: Given an input image, our network outputs a binary mask that indicates where mirrors are.

Abstract. Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content outside a mirror from the reflected content inside it is non-trivial. The key challenge is that mirrors typically reflect contents similar to their surroundings, making it very difficult to differentiate the two. In this paper, we present a novel method to segment mirrors from an input image. To the best of our knowledge, this is the first work to address the mirror segmentation problem with a computational approach. We make the following contributions. First, we construct a large-scale mirror dataset that contains mirror images with corresponding manually annotated masks. This dataset covers a variety of daily life scenes, and will be made publicly available for future research. Second, we propose a novel network, called MirrorNet, for mirror segmentation, by modeling both semantical and low-level color/texture discontinuities between the contents inside and outside of the mirrors. Third, we conduct extensive experiments to evaluate the proposed method, and show that it outperforms the carefully chosen baselines from the state-of-the-art detection and segmentation methods

Last updated in September 2022.