Automatic Performance Evaluation
for
Video Text Detection
Xian-Sheng Hua 1, Liu
Wenyin 2, Hong-Jiang
Zhang 2
|
1 National Laboratory on Machine Perception |
2 Microsoft Research China |
We propose an objective, comprehensive, and complexity independent performance evaluation protocol for video text detection/location algorithms. The protocol includes a positive set and a negative set of indices at textbox level, which evaluate the detection quality in terms of both location accuracy and fragmentation of the detected textboxes. In the protocol, we assign a detection difficulty (DD) level to each ground truth textbox. The performance indices can then be normalized with respect to the textbox DD level and are therefore independent of the ground truth complexity. We also assign a detection importance (DI) level to each ground truth textbox. The overall detection/location rate is the DI-weighted average of the detection qualities of all ground truth textboxes, which makes the detection rate more accurate to reveal the real performance. The automatic performance evaluation scheme has been applied to performance evaluation of a text detection approach to determine the best parameters that can yield the best detection results. A paper on this topic has been submitted to ICDAR 2001.
The ground truth data used in our experiments are 45 video frames excerpted from MPEG-7 Video Content Set (Licensing Agreement for the MPEG-7 Content Set). 29 of them are from V3, 7 are from V4, and 9 are from V14. The owner of V3 and V4 is Spanish TV RTVE, and V14 comes from Ministry of Education of Singapore (See details). There are totally 158 textboxes in the 45 frames and 128 of them are human-recognizable. The maximum number of textboxes in one frame is 21. There are also 3 frames that contain no text at all. We use a tool named Ground Truth Generator developed by us to manually collect the ground truth textboxes and their attributes. The ground truth data of the 45 video frames are stored in 45 corresponding ground truth files.
For convenience of PE, the 45 video frames are stored in 45 directories named "00", "01", ... , "45". The 45 ground truth files (GTF) with “.gtf” filename extensions are also stored in their corresponding directories. (Download Ground Truth). The GTF file format is:
int textboxes_num; // the number of ground truth textboxes
// in this video frame/image
The first textbox
The second textbox
......
The last textbox
where the data structure for one ground truth textbox is (VC++ Code):
typedef
struct {
CRect rect;
int height; // Text
Box Height
int width; //
Text Box Width
char text_string[100]; //
Text String
int text_length;
int height_variance; // Character Height Variance
float skew_angle; // Skew Angle
int color_texture; // Color and Texture
float background_complexity; //
Background Complexity
float
string_density; // String Density
float
contrast;
// Contrast
int recognition_importance; // Recognition Importance Level
} structGroundTruth;
The Read/Write routines for GTF files can be found in the source code. (Download Source Code).
To evaluate the detection results using our PE program, the files of the detection results should be compatible with the file format we used. The filename extension of the files that store the information of detected textboxes (DTF) of one video frame/image is “.dtf”. They are also stored in the same directories as the ground truth files.
The DTF file format is:
int textboxes_num; // the number of ground truth textboxes
// in this video frame/image
The first detected textbox
The second detected textbox
......
The last detected textbox
where the data structure for one detected textbox is (VC++ code):
typedef
struct {
RECT rect;
char text_string[100];
int text_length;
} structDetectedTextBox;
The Read/Write routines for DTF files can also be found in the source code. (Download Source Code).
Note that the GTF file and the DTF file should be placed in the same directory as the corresponding video frame when running the PE program. The filename of the video frame is 0.bmp, the GTF filename is 0.gtf, and the default DTF filename is result.dtf. If you only download the source code of the PE program and want to build them by yourself, you must download and setup Microsoft Vision SDK (Download Free!) first.
How to evaluate your detection results:

If there is any question about the PE program and the Ground Truth Data, please send mails to Xian-Sheng Hua or Liu Wenyin.
Last Update: 02/20/2001