Text Spotting with Machine Learning

Shamsi, Fatemeh; Razvan, Mohammad Reza Kamali Tabrizi, Mostafa

Please enable javascript in your browser.

Text Spotting with Machine Learning

Shamsi, Fatemeh | 2021

592 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 54148 (02)
University: Sharif University of Technology
Department: Mathematical Sciences
Advisor(s): Razvan, Mohammad Reza; Kamali Tabrizi, Mostafa
Abstract:
Detection text in natural images is a challenging task due to the complex backgrounds in an image. complex backgrounds, changes in ambient light, changing viewing angles, and other factors can make systems difficult to detection text. Hence text detection is always an problem. Since detection and recognizing a text in an image has many uses such as translating texts for tourists, helping the blind, etc., recognizing a text in different languages is important. In this thesis, we first examine the three methods of Reading Text in the Wild with Convolutional Neural Networks and FOTS and CRAFT. Then we prepared two Persian data sets. The first data set contains images to which Persian texts have been artificially added. The second set of natural images includes Persian text taken from streets and passages. We taught the CRAFT model with these two Persian datasets. The experimental results of this model on the Persian evaluation set now have an precision of 74.7 and recall of 64.9.We taught the FOTS model with these two Persian datasets. The experimental results of this model on the Persian evaluation set now have an precision of 78.07 and recall of 60.67
Keywords:
Image Processing ; Machine Learning ; Deep Learning ; Text Detection ; Convolutional Neural Network

Digital Object List

محتواي کتاب
view

Bookmark

چکیده
فهرست جدول‌ها
فهرست تصویرها
فصل اول: مقدمه
- 1-1 تعریف مسئله
- 1-2 کاربردها
- 1-3 چالش‌ها
فصل دوم: روشهای مطرح
- 2-1 مفاهیم اولیه دررابطه‌با یادگیری ژرف
  - 2-1-1 شبکه‌های عصبی
  - 2-1-2 شبکه‌های عصبی کانولوشن
  - 2-1-3 VGG Net
  - 2-1-4 ResNet
  - 2-1-5 الگوریتم حذف غیر بیشینه
  - 2-1-6 الگوریتم OHEM
  - 2-1-7 شبکه‌های عصبی بازگشتی
- 2-2 مروری بر مطالعات مطرح
  - 2-2-1 آشکارسازی متن مبتنی بر رگرسیون
    - 2-2-1-1 شبکه Faster R-CNN
    - 2-2-1-2 شبکه SSD
  - 2-2-2 آشکارسازی متن مبتنی بر قطعه‌بندی
    - 2-2-2-1 mask RCNN
    - 2-2-2-2 شبکه تماماً کانولوشن
فصل سوم: داده‌ها
- 3-1 ICDAR
  - 3-1-1 ICDAR2013
  - 3-1-2 ICDAR2015
  - 3-1-3 ICDAR2017 mlt
- 3-2 مجموعه داده فارسی
- 3-3 داده‌های مصنوعی
- 3-4 متون مصنوعی فارسی در تصاویر طبیعی
فصل چهارم: خواندن متون در تصاویر طبیعی با شبکه‌های عصبی کانولوشن
- 4-1 تولید کادرهای پیشنهادی
- 4-2 فیلتر کادرهای پیشنهادی
- 4-3 بازشناسایی متن
فصل پنجم: FOTS
- 5-1 معماری شبکه
- 5-2 آشکارسازی متن
- 5-3 RoIRotate
- 5-4 بازشناسایی متن
- 5-5 جزئیات پیاده‌سازی
فصل ششم: CRAFT
- 6-1 معماری شبکه
- 6-2 آموزش به شبکه
- 6-3 تابع هزینه
- 6-4 جزئیات پیاده‌سازی
فصل هفتم: مطالعات تجربی
- 7-1 معیارهای ارزیابی
- 7-2 نتایج تجربی
نتیجه‌گیری
منابع یا مراجع

Friend's email
Your name
Your email
enter code