Spatiotemporal Background Subtraction

Mingliang Chen, Xing Wei, QingXiong Yang, Qing Li, Gang Wang, Ming-Hsuan Yang

Abstract. We propose a background subtraction algorithm using hierarchical superpixel segmentation, spanning trees and optical flow. First, we generate superpixel segmentation trees using a number of Gaussian Mixture Models (GMMs) by treating each GMM as one vertex to construct spanning trees. Next, we use the M-smoother to enhance the spatial consistency on the spanning trees and estimate optical flow to extend the M-smoother to the temporal domain. Experimental results on synthetic and real-world benchmark datasets show that the proposed algorithm performs favorably for background subtraction in videos against the state-of-the-art methods in spite of frequent and sudden changes of pixel values.

(a) Video Frame (c) SBM: Spatially-consistent (e) SSHBM Spatially-consistent
(with Superpixel Hierarchy)
(b) [1] Mixture of Gaussian (d) STBM: Spatiotemporally-consistent (f) STSHBM: Spatiotemporally-consistent
(with Superpixel Hierarchy)

Video Demos


    0: Video Codecs: http://www.free-codecs.com/download/K_Lite_Mega_Codec_Pack.htm

    1: WMV file: Demonstration of the proposed spatiotemporal background subtraction.

Experimental Results


    1. The Stuttgart Artificial Background Subtraction (SABS) [2] dataset

    TABLE 1: F-measures for the SABS dataset. The best two results are shown in red and blue.
    Approach Basic Dynamic Background Bootstrap Darkening Light Switch Noisy Night Average
    McFarlane [3] 0.614 0.482 0.541 0.496 0.211 0.203 0.425
    Stauffer [1] 0.800 0.704 0.642 0.404 0.217 0.194 0.494
    Oliver [4] 0.635 0.552 - 0.300 0.198 0.213 0.380
    McKenna [5] 0.522 0.415 0.301 0.484 0.306 0.098 0.354
    Li [6] 0.766 0.641 0.678 0.704 0.316 0.047 0.525
    Kim [7] 0.582 0.341 0.318 0.342 - - 0.396
    Zivkovic [8] 0.768 0.704 0.632 0.620 0.300 0.321 0.558
    Maddalena [9] 0.766 0.715 0.495 0.663 0.213 0.263 0.519
    Barnich [10] 0.761 0.711 0.685 0.678 0.268 0.271 0.562
    AtsushiShimada [11] 0.723 0.623 0.708 0.577 0.335 0.475 0.574
    Proposed SBM 0.764 0.747 0.669 0.672 0.364 0.519 0.623
    Proposed STBM 0.813 0.788 0.736 0.753 0.515 0.680 0.714
    Proposed SSHBM 0.815 0.795 0.742 0.774 0.598 0.692 0.736
    Proposed STSHBM 0.846 0.804 0.797 0.820 0.684 0.755 0.784

    (a) Basic (b) Dynamic Background (c) Bootstrapping
    (d) Darkening (e) Light Switch (f) Noisy Night
    Fig. 1: Precision-recall curves on the SABS dataset with different challenging factors.

    2. The ChangeDetection [12] dataset

    TABLE 2: F-measures for the ChangeDetection dataset. The best two results are shown in red and blue.
    Approach Baseline Dynamic Background Camera Jitter Intermittent Motion Shadow Thermal Average
    Spectral-360 0.9330 0.7872 0.7156 0.5656 0.8843 0.7764 0.7770
    CwisarD 0.9075 0.8086 0.7814 0.5674 0.8412 0.7619 0.7780
    GPRMF 0.9280 0.7726 0.8596 0.4870 0.8889 0.8305 0.7944
    SuBSENSE [13] 0.9503 0.8177 0.8152 0.6569 0.8646 0.8305 0.8260
    PAWCS [14] 0.9397 0.8938 0.8137 0.7764 0.8710 0.8324 0.8579
    Proposed SBM 0.9250 0.7882 0.7413 0.6755 0.8458 0.8423 0.8030
    Proposed STBM 0.9345 0.8193 0.7522 0.6780 0.8529 0.8571 0.8157
    Proposed SSHBM 0.9428 0.9008 0.8034 0.8001 0.8788 0.8443 0.8617
    Proposed STSHBM 0.9534 0.9120 0.8503 0.8349 0.8930 0.8579 0.8836

Reference


    [1] C. Stauffer and E. Grimson. Adaptive background mixture models for real-time tracking. CVPR (1999).
    [2] S. Brutzer, B. Hoferlin, and G. Heidemann. Evaluation of background subtraction techniques for video surveillance. CVPR (2011).
    [3] N. McFarlane and C. Schofield. Segmentation and tracking of piglets in images. Machine Vision and Applications (1995).
    [4] N. Oliver, B. Rosario, and A. Pentland. A bayesian computer vision system for modeling human interactions. PAMI (2000).
    [5] S. J. McKenna, S. Jabri, Z. Duric, A. Rosenfeld, and H. Wechsler. Tracking groups of people. CVIU (2000).
    [6] L. Li, W. Huang, I. Gu, and Q. Tian. Foreground object detection from videos containing complex background. ACM Multimedia (2003).
    [7] K. Kim, T. Chalidabhongse, D. Harwood, and L. Davis. Real-time foreground-background segmentation using codebook model. Real-Time Imaging (2005).
    [8] Z. Zivkovic and F. Heijden. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters (2006).
    [9] L. Maddalena and A. Petrosino. A self-organizing approach to background subtraction for visual surveillance applications. TIP (2008).
    [10] O. Barnich and M. V. Droogenbroeck. Vibe: a universal background subtraction algorithm for video sequences. TIP (2011).
    [11] A. Shimada, H. Nagahara, and R. Taniguchi. Background modeling based on bidirectional analysis. CVPR (2013).
    [12] N. Goyette, P.-M. Jodoin, F. Porikli, J. Konrad, and P. Ishwar. changedetection.net: A new change detection benchmark dataset. CVPR Workshop (2012).
    [13] P. L. St-Charles, G. A. Bilodeau, and R. Bergevin. Subsense: A universal change detection method with local adaptive sensitivity. TIP (2015).
    [14] P. L. St-Charles, G. A. Bilodeau, and R. Bergevin. A self-adjusting approach to change detection based on background word consensus. WACV (2015).

Preliminary version


    1. Reference:M. Chen, Q. Yang, Q. Li, G. Wang, and M. H. Yang. Spatiotemporal background subtraction using minimum spanning tree and optical flow. In ECCV, pages 521–534, 2014.
    2. Project page:http://www.cs.cityu.edu.hk/~mlchen2/eccv14/index.html
    3. Video demo:https://youtu.be/v1ofThKb5E8