Automatic Performance Evaluation
for
Video Text Detection

Xian-Sheng Hua 1, Liu Wenyin 2, Hong-Jiang Zhang 2

1 National Laboratory on Machine Perception
Peking University, Beijing 100871, China

2 Microsoft Research China
No. 49 Zhichun Road, Beijing 100080, China

We propose an objective, comprehensive, and complexity independent performance evaluation protocol for video text detection/location algorithms. The protocol includes a positive set and a negative set of indices at textbox level, which evaluate the detection quality in terms of both location accuracy and fragmentation of the detected textboxes. In the protocol, we assign a detection difficulty (DD) level to each ground truth textbox. The performance indices can then be normalized with respect to the textbox DD level and are therefore independent of the ground truth complexity. We also assign a detection importance (DI) level to each ground truth textbox. The overall detection/location rate is the DI-weighted average of the detection qualities of all ground truth textboxes, which makes the detection rate more accurate to reveal the real performance. The automatic performance evaluation scheme has been applied to performance evaluation of a text detection approach to determine the best parameters that can yield the best detection results. A paper on this topic has been submitted to ICDAR 2001.

Introduction to Ground Truth Data

The ground truth data used in our experiments are 45 video frames excerpted from MPEG-7 Video Content Set (Licensing Agreement for the MPEG-7 Content Set). 29 of them are from V3, 7 are from V4, and 9 are from V14. The owner of V3 and V4 is Spanish TV RTVE, and V14 comes from Ministry of Education of Singapore (See details). There are totally 158 textboxes in the 45 frames and 128 of them are human-recognizable. The maximum number of textboxes in one frame is 21. There are also 3 frames that contain no text at all. We use a tool named Ground Truth Generator developed by us to manually collect the ground truth textboxes and their attributes. The ground truth data of the 45 video frames are stored in 45 corresponding ground truth files.

For convenience of PE, the 45 video frames are stored in 45 directories named "00", "01", ... , "45". The 45 ground truth files (GTF) with “.gtf” filename extensions are also stored in their corresponding directories. (Download Ground Truth). The GTF file format is:

int  textboxes_num; // the number of ground truth textboxes

                    // in this video frame/image
The first textbox
The second textbox
......
The last textbox

where the data structure for one ground truth textbox is (VC++ Code):

typedef struct {

    CRect rect;

    int   height; // Text Box Height
    int   width;  // Text Box Width

    char  text_string[100]; // Text String
    int   text_length;

    int   height_variance;  // Character Height Variance
    float skew_angle;       // Skew Angle
    int   color_texture;    // Color and Texture

    float background_complexity;  // Background Complexity
    float string_density;         // String Density
    float contrast;               // Contrast

    int   recognition_importance; // Recognition Importance Level

} structGroundTruth;

The Read/Write routines for GTF files can be found in the source code. (Download Source Code).

The file format of the detected results

To evaluate the detection results using our PE program, the files of the detection results should be compatible with the file format we used. The filename extension of the files that store the information of detected textboxes (DTF) of one video frame/image is “.dtf”. They are also stored in the same directories as the ground truth files.

The DTF file format is:

int  textboxes_num; // the number of ground truth textboxes

                    // in this video frame/image
The first detected textbox
The second detected textbox
......
The last detected textbox

where the data structure for one detected textbox is (VC++ code):

typedef struct {
    RECT  rect;
    char  text_string[100];
    int   text_length;
} structDetectedTextBox;

The Read/Write routines for DTF files can also be found in the source code. (Download Source Code).

Usage of the Performance Evaluation (PE) Program

Note that the GTF file and the DTF file should be placed in the same directory as the corresponding video frame when running the PE program. The filename of the video frame is 0.bmp, the GTF filename is 0.gtf, and the default DTF filename is result.dtf. If you only download the source code of the PE program and want to build them by yourself, you must download and setup Microsoft Vision SDK (Download Free!) first.

How to evaluate your detection results:

  1. Run TextLocationPE.exe;
  2. Open the video frame from any the ground truth directory;
  3. Select the corresponding DTF filename by press button "Read DTF" if your DTF file name is not "result.dtf" (See the figure below);
  4. If you want to see ground truth text boxes, please set checkbox "Show Ground Truth" as checked; When it is checked, double click within any ground truth textbox, the corresponding ground truth data will be showed in a popup dialog. (Optional)
  5. If you want to see detected text boxes, please set checkbox "Show Detected Text Box" as checked; (Optional)
  6. If you want to evaluate the performance for current video frame, press the button "Evaluate"; (Optional)
  7. Press "Evaluate All" to do overall evaluation for the 45 video frames. The PE results are displayed when PE finished.
  8. If you want to evaluate several detection results from different algorithms or configurations at one time, you may use "Evaluate Batch" function. You may set the batch scope (the group of detection results) from menu "Options/Set Evaluation Batch Scope". The detected textbox files (DTF) yielded from different algorithms or parameter settings should be named from "result0000.dtf" to "result9999.dtf". The evaluation results are stored in a text file named "evaluation.txt" at the same path as the PE program. (Optional)

 

If there is any question about the PE program and the Ground Truth Data, please send mails to Xian-Sheng Hua or Liu Wenyin.


Last Update: 02/20/2001