The fundamental goal of scene text detection problem (Yao et al. 2013) isto determine whether or not there is text in a given image, andif present, tolocalize and recognize it. These stages involved in developing an “end-to-end” system, are equally important for any scene text processing applications. The problem of text detection and recognition (Yao et al. 2014) in images have received significant attention in recent years due to growing demand of a number of applications.
Numerous text related applications for images can be broadly categorized as visual-based, multimedia and content-based retrieval.
Text in web and document images is relevant to content of the same. Street hoardings and sign-boards usually annotate information about where, when and who of the happening events thus carrying significant information. Automatic text recognition and translation systems enable users to overcome language barriers.Recognizing texts on packages, containers, houses, maps, etc. has broad visual based applications. However, text detection in natural scenes isextremely difficult and challengingdue to the following major factors:
There are two main approaches for developing the solution for scene-text recognition(Yao et al. 2014): stepwise and integrated. Stepwise methodologies have separated detection and recognition modules, and use a feed-forward pipeline to detect, segment and recognize text regions. Integrated methodologies, by contrast, recognize words where detection and recognition procedures share information with character classification and/or use joint optimization strategies. Key difference lies in the fact that latter uses recognition as a key focus. Present study deals with imagescontaining multi-lingual texts, so a single OCR engine will not suffice. Hence, a stepwise method is followed that mainly aims to localizetext regions and treat recognition as a separate module for later stage. Keeping this fact in mind, a text localization system is developed in the current work.
Text detection is the precursor to the recognition stage in the pipeline and errors only get cascaded. The localization step coarsely groups the components into candidate text regions, which are further classified into text or non-text regions during verification. The underlying assumption is that various text regions might be regarded as a kind of uniform pattern, therefore, there must exist properties or features that are invariant over this pattern. A coarse-to-fine strategy is applied, which first localizes text candidates, and then verifies segments through specially trained classifier for the application. One attractive feature is that most of the background is filtered in the coarse localization step, which greatly reduces the computational cost for the recognition process. Given language independent features of multilingual OCR modules, the developed system can process texts in different languages.
The robustness of MSER and SWT features of detecting candidate text regions lies in the fact that these features capture the basic essence of text by selecting connected components (CCs) having uniform intensity levels and stroke widths thus becoming a popular choice.However, a significant number of false positives are generated thereby necessitating the training of a classifier to reduce maximum possible false positives by assigning text or non-text labels to the regions obtained from the last stage of the pipeline. A One Class Classifier (OCC) is trained with texture based features like Histogram of Oriented Gradients (HOG), Grey Level Co-Occurrence Matrix (GLCM) and frequency based features like Discrete Cosine Transform (DCT) and Gabor filter.
The OCC is trained on an in-house dataset containing smartphone camera captured images with texts in English, Hindi and Bangla. Candidate regions obtained from images need to be filtered using the classifier for improved OCR performance on the merged regions. Treating scene text recognition as a one-class problemis suitable as the non-text class cannot find adequate representation, as cases may arise where a non-text sample is closer to a text training sample as non-text coverage for the training cannot be exhaustive. So, training an OCC for the text containing region is a way of identifying patterns for text regions and filtering out the false positives.
👋 Hi! I’m your smart assistant Amy!
Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.get help with your assignment