Loading...

Removing noises similar to dots from persian scanned documents

Shirali Shahreza, M. H ; Sharif University of Technology | 2008

388 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/CCCM.2008.246
  3. Publisher: 2008
  4. Abstract:
  5. Nowadays, computer is being used in many aspects of human life. A consequence of computer is electronic documents. Computers cannot understand written documents. So, we need to convert written documents to electronic documents in order to be able to process them with computers. One of the common methods for converting written texts to electronic text is Optical Character Recognition (OCR). A lot of work has been done on English OCR, but Persian/Arabic OCR is still under development. One of the major problems in Persian/Arabic OCR is noise removal. Because dots are very important in Persian and Arabic languages and they are very similar to noises, so noise removal from Persian/Arabic documents is more difficult than Latin documents. In this paper, we propose a new method for removing noises similar to dots from Persian/Arabic printed documents. In this method, the size of the dots is estimated in each region after page segmentation. Then the noises which are similar to dots are removed using the estimated size of the dots. This method is implemented as a part of page segmentation phase and the experimental results are well. Some advantages of our method are high speed and strong resistance to skew. © 2008 IEEE
  6. Keywords:
  7. Arabic languages ; Electronic documents ; High speeds ; Human lives ; Noise removals ; Page segmentations ; Persians ; Printed documents ; Scanned documents ; Written documents ; Written texts ; Character recognition ; Information retrieval systems ; Ionizing radiation ; Optical character recognition ; Word processing ; Computational methods
  8. Source: ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2008, Guangzhou, 3 August 2008 through 4 August 2008 ; Volume 2 , 2008 , Pages 313-317 ; 9780769532905 (ISBN)
  9. URL: https://ieeexplore.ieee.org/abstract/document/4609697