Loading...

Massively Parallel Clustering with Outliers

Navidi Ghaziani, Zahra | 2023

73 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56511 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Zarrabizadeh, Hamid
  7. Abstract:
  8. Clustering is a fundamental problem for data analysis, and it has a lot of variants. In this thesis we focused on the k-center problem, which is one of the most popular and well-studied variants of clustering. In this problem we are given a metric set of points called X, and a parameter k ⩽ |X|. Our goal is to find a set of k centers in X, minimizing the maximum distance of any point of X from its closest center. This thesis has worked on a version of the problem that is harder to solve. we have an extra parameter called z, which represents the maximum number of points that there is no need to be clustered, and we refer to them as outliers. The growth of data that needs to be processed makes us invent strategies that can be efficient on large sizes of data. Massively parallel computation(MPC) is one of them, which is the strategy of this thesis. the best known approximation factor of the problem in MPC model is 13 and we improved it to 11 + ϵ, which is a considerable improvement
  9. Keywords:
  10. Clustering ; Outliers ; Massively Parallel Computation ; Outlier Removal ; K-Center Problem

 Digital Object List

 Bookmark

  • مقدمه
    • تعریف مسئله
    • ادبیات موضوع
    • اهداف پژوهش
    • ساختار پایان‌نامه
  • مفاهیم اولیه
    • مسائل ان-پی سخت
    • مدل محاسبات موازی انبوه
    • مجموعه‌ی هسته‌ی ترکیب‌پذیر
    • فضای متریک
    • بعد مضاعف
    • علامت
    • با احتمال بالا
    • الگوریتم‌های تقریبی
  • کارهای پیشین
    • k-مرکز بدون نقاط پرت
      • الگوریتم‌ ایستا
      • الگوریتم‌های موجود در مدل MPC
    • k-مرکز با نقاط پرت
      • الگوریتم‌ ایستا
      • الگوریتم‌های موجود در MPC
  • نتایج جدید
    • تعاریف و نمادگذاری‌ها
    • شرح الگوریتم
      • پیدا کردن r مناسب
      • پیدا کردن k مرکز با استفاده از شعاع r
    • اثبات درستی الگوریتم
    • تحلیل زمان اجرا و حافظه‌ی مصرفی الگوریتم
  • نتیجه‌گیری
    • کارهای آتی
  • مراجع
  • واژه‌نامه
...see more