PATD  (Printed Arabic Text Database for Recognition Systems)

Printed Arabic Text Database for Recognition Systems:

PATD is a new comprehensive database for both of scanned and smartphone-captured mode, which contains 810 images scanned in grayscale format and different resolutions, and 2954 smartphone-captured images under varying capture conditions (blurred, at different angles and with different lighting conditions). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. Moreover, each image of our two datasets (a scanned Arabic printed document dataset and smartphone-captured Arabic printed document dataset) is defined by an XML file. This includes the transcription of the text in a document, the capture parameters (distortion types and values), the ID of a captured document and a sharp "reference" image of the document for each combination of position and lighting.


Capture Parameters:

Smartphone camera: 3 smartphones

Samsung Galaxy S7 edge (camera: 13MP):(928 images)
Samsung Galaxy S3 (camera: 5MP):(400 images)
IPhone 6 (camera: 8MP):(1626 images)

Light: 3 lighting conditions

    o Lighting condition 1: Light condition 1: Daylight(895 images).
    o Lighting condition 2: Light condition 2: Daylight + shadow of object on part of the document(901 images).
    o Lighting condition 3: Night + lamp (artificial light) indoors(959 images).

Distance between the camera and the document:  10cm(1539 images), 20cm(565 images), 24cm(631 images) and 30cm(219 images).
Background: Without a background just the color of the newspaper page presented in the image.

Smartphone setting: the flash is always deactivated

Motion blur:We detected two types of motion blur:
    o Horizontal motion blur:(199 images)
    o Vertical motion blur:(205 images)

Out-of-focus blur:(307 images)



Download:

-Smartphone-captured:

    o When the image captured contains one article using three smartphone cameras (1531 images): Download
    o When the image captured contains two articles using three smartphone cameras (548 images): Download
    o When the image captured contains three articles using three smartphone cameras (184 images): Download
    o When the image captured contains more than three articles using three smartphone cameras (492 images):Download

    o Smartphone-captured from the computer screen using the IPhone 6 camera (199 images):Download

    o Ground Truth File description:Download


-Scanned newspapers:

    o Scanned pages (810 images):Download

A Frequency Dictionary of Printed Arabic Text




Examples: