Urdu Nastalique Optical Character Recognition System - Introduction

Urdu Nastalique Optical Character Recognition System

  • To develop and mature algorithms for analyzing and recognizing Urdu text images based on segmentation-based and ligature-based methods.
  • To develop automatic scaling algorithms for Urdu ligatures to make font size independent system.
  • To develop Urdu OCR for Nastalique style of writing.
  • To develop post-processing algorithms in computational linguistics for output generation and error correction of Urdu OCR.
  • To identify future research directions for graduate research in this area.
  • To develop capacity in the area of Human Language Technology.
  • To create and release an Urdu text image corpus with open license for further development and testing of OCR for Urdu and other Pakistani languages by other interested research organizations and universities.
  • To provide access to textual information to print disable communities.