Printed Arabic Text Database for Recognition Systems:
PATD is a new comprehensive database for both of scanned and smartphone-captured mode, which contains 810 images scanned in grayscale format and different resolutions, and 2954 smartphone-captured images under varying capture conditions (blurred, at different angles and with different lighting conditions). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. Moreover, each image of our two datasets (a scanned Arabic printed document dataset and smartphone-captured Arabic printed document dataset) is defined by an XML file. This includes the transcription of the text in a document, the capture parameters (distortion types and values), the ID of a captured document and a sharp "reference" image of the document for each combination of position and lighting.
Capture Parameters:
Smartphone camera: 3 smartphones
Samsung Galaxy S7 edge (camera: 13MP):(928 images)
Samsung Galaxy S3 (camera: 5MP):(400 images)
IPhone 6 (camera: 8MP):(1626 images)
Light: 3 lighting conditions
o Lighting condition 1: Light condition 1: Daylight(895 images).
o Lighting condition 2: Light condition 2: Daylight + shadow of object on part of the document(901 images).
o Lighting condition 3: Night + lamp (artificial light) indoors(959 images).
Distance between the camera and the document: 10cm(1539 images), 20cm(565 images), 24cm(631 images) and 30cm(219 images).
Background: Without a background just the color of the newspaper page presented in the image.
Smartphone setting: the flash is always deactivated
Motion blur:We detected two types of motion blur:
o Horizontal motion blur:(199 images)
o Vertical motion blur:(205 images)
Out-of-focus blur:(307 images)
Download:
-Smartphone-captured:
o When the image captured contains one article using three smartphone cameras (1531 images): Download
o When the image captured contains two articles using three smartphone cameras (548 images): Download
o When the image captured contains three articles using three smartphone cameras (184 images): Download
o When the image captured contains more than three articles using three smartphone cameras (492 images):Download
o Smartphone-captured from the computer screen using the IPhone 6 camera (199 images):Download
o Ground Truth File description:Download
-Scanned newspapers:
Examples: