Now Accepting Apple Pay

Apple Pay is the easiest and most secure way to pay on StudyMoose in Safari.

Scene-Text Recognition


The fundamental goal of scene text detection problem (Yao et al. 2013) isto determine whether or not there is text in a given image, andif present, tolocalize and recognize it. These stages involved in developing an “end-to-end” system, are equally important for any scene text processing applications. The problem of text detection and recognition (Yao et al. 2014) in images have received significant attention in recent years due to growing demand of a number of applications.

Numerous text related applications for images can be broadly categorized as visual-based, multimedia and content-based retrieval.

Text in web and document images is relevant to content of the same. Street hoardings and sign-boards usually annotate information about where, when and who of the happening events thus carrying significant information. Automatic text recognition and translation systems enable users to overcome language barriers.Recognizing texts on packages, containers, houses, maps, etc. has broad visual based applications. However, text detection in natural scenes isextremely difficult and challengingdue to the following major factors:

  • Diversity of scene-text: In contrast to document images havingblack texts on white color with regular font and uniform arrangement, texts in a natural scene image may bear high variations in fonts, colors, scales and orientations.

    Get quality help now
    Dr. Karlyna PhD
    Verified writer

    Proficient in: Artificial Intelligence

    4.7 (235)

    “ Amazing writer! I am really satisfied with her work. An excellent price as well. ”

    +84 relevant experts are online
    Hire writer

  • Background complexity: The backgrounds in natural scene images can be very complex due to heterogeneity in colors and indistinguishable foreground-background due to shade closeness. Elements like signs, fences, bricks, grasses,tree leaves and branches may give impression of true texts, thereby causing confusions and errors. This makes the task difficult for a general Optical Character Recognition (OCR) engine with raw forms of such images.

    Get to Know The Price Estimate For Your Paper
    Number of pages
    Email Invalid email

    By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

    "You must agree to out terms of services and privacy policy"
    Check writers' offers

    You won’t be charged yet!

  • Interference factors: Various interference factors like noise, blur, distortion, low resolution, non-uniform illumination and partial occlusion, may cause failures in scene text detection and recognition. Camera captured applications face a lot of these issues.

There are two main approaches for developing the solution for scene-text recognition(Yao et al. 2014): stepwise and integrated. Stepwise methodologies have separated detection and recognition modules, and use a feed-forward pipeline to detect, segment and recognize text regions. Integrated methodologies, by contrast, recognize words where detection and recognition procedures share information with character classification and/or use joint optimization strategies. Key difference lies in the fact that latter uses recognition as a key focus. Present study deals with imagescontaining multi-lingual texts, so a single OCR engine will not suffice. Hence, a stepwise method is followed that mainly aims to localizetext regions and treat recognition as a separate module for later stage. Keeping this fact in mind, a text localization system is developed in the current work.

Text detection is the precursor to the recognition stage in the pipeline and errors only get cascaded. The localization step coarsely groups the components into candidate text regions, which are further classified into text or non-text regions during verification. The underlying assumption is that various text regions might be regarded as a kind of uniform pattern, therefore, there must exist properties or features that are invariant over this pattern. A coarse-to-fine strategy is applied, which first localizes text candidates, and then verifies segments through specially trained classifier for the application. One attractive feature is that most of the background is filtered in the coarse localization step, which greatly reduces the computational cost for the recognition process. Given language independent features of multilingual OCR modules, the developed system can process texts in different languages.

The robustness of MSER and SWT features of detecting candidate text regions lies in the fact that these features capture the basic essence of text by selecting connected components (CCs) having uniform intensity levels and stroke widths thus becoming a popular choice.However, a significant number of false positives are generated thereby necessitating the training of a classifier to reduce maximum possible false positives by assigning text or non-text labels to the regions obtained from the last stage of the pipeline. A One Class Classifier (OCC) is trained with texture based features like Histogram of Oriented Gradients (HOG), Grey Level Co-Occurrence Matrix (GLCM) and frequency based features like Discrete Cosine Transform (DCT) and Gabor filter.

The OCC is trained on an in-house dataset containing smartphone camera captured images with texts in English, Hindi and Bangla. Candidate regions obtained from images need to be filtered using the classifier for improved OCR performance on the merged regions. Treating scene text recognition as a one-class problemis suitable as the non-text class cannot find adequate representation, as cases may arise where a non-text sample is closer to a text training sample as non-text coverage for the training cannot be exhaustive. So, training an OCC for the text containing region is a way of identifying patterns for text regions and filtering out the false positives.

Cite this page

Scene-Text Recognition. (2019, Dec 02). Retrieved from

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment