Improving the Robustness of Deep Learning Models Against Model Extraction Attacks

Sobhanian Ghasi, Amir Mohammad; Jalili, Rasool

Please enable javascript in your browser.

Improving the Robustness of Deep Learning Models Against Model Extraction Attacks

Sobhanian Ghasi, Amir Mohammad | 2022

206 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 55057 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Jalili, Rasool
Abstract:
Deep neural networks attain high performance on many domains and gaining more attention from real-world businesses in recent years. With the emergence of Machine Learning as a Service (MLaaS), users have the opportunity to produce their model on these platforms and make it available to others through the prediction APIs. However, studies have shown that an adversary can produce a surrogate model with similar characteristics to the victim's model by accessing these APIs. Aside from ruining the victim's business plan, studies have shown that an adversary can implement more sophisticated attacks on the victim's model by accessing a surrogate model. Due to the adversary's inaccessibility to the victim model's training set, recent studies proposed using synthetic or natural surrogate datasets for conducting model extraction attacks. These alternative datasets have different Distribution than the victim model's training set. In this study, we investigate the maximum Softmax probability of the model's inputs as a potential criterion for detecting out-of-distribution input sequences. We demonstrate that the maximum Softmax probability histogram of model extraction attacks' input sequences are Distinguishable from benign users' ones. In this work, we introduce the in-distribution detection approach (IDA) which attempts to detect malicious users by observing their input sequences. Based on our experiments, IDA can robustly detect three types of adversaries with high accuracy and low false-positive rate by observing only a limited number of their inputs. Finally, we compare performances of IDA and Prada and our results show that IDA outperforms Prada by observing even shorter input sequences.
Keywords:
Deep Neural Networks ; Adversarial Example ; Machine Learning Security ; Model Extraction Attacks

Digital Object List

محتواي کتاب
view

Bookmark

مقدمه
- بیان مسئله
- اهداف پژوهش
- ساختار پایان‌نامه
مفاهیم اولیه
- ‫یادگیری ماشین‬
  - ‫یادگیری بانظارت‬
  - ‫یادگیری بدون نظارت‬
  - ‫یادگیری نیمه‌نظارت‌شده ‬
  - ‫یادگیری تقویتی ‬
- شبکه‌های عصبی ژرف
  - پرسپترون
  - تابع فعال‌ساز
  - تابع سافت‌مکس
  - تابع هزینه
  - گرادیان کاهشی
- عصاره‌گیری دانش
- انتقال یادگیری
- یادگیری فعال
- حملات مطرح علیه مدل‌های یادگیری ژرف
  - حمله‌ی نمونه خصمانه
  - حمله‌ی مسموم‌سازی مدل
  - حمله‌ی استنتاج عضویت
  - حمله‌ی استخراج مدل
  - مدل تهدید و فرضیات مسئله
- جمع‌بندی
کارهای پیشین
- حملات استخراج مدل
  - حمله‌ی پپرنات
  - حمله‌ی ناک‌آف‌نت
  - حملات جاگیلسکی
  - حمله‌ی اکتیوتیف
  - حمله‌ی مِیز
  - حمله‌ی باتینا
- دفاع در برابر حملات استخراج مدل
  - دفاع پرادا
  - دفاع سیت
  - دفاع وَردیتکت
  - دفاع آشفتگی فریبنده
  - دفاع مسموم‌سازی خروجی
  - دفاع تخریب انطباقی اطلاعات
  - دفاع ایی.دی.ام
  - دفاع اَدی
  - دفاع ایی.و.ایی
- جمع‌بندی
راهکار پیشنهادی
- ضعف‌های روش‌های پیشین
- احتمال دسته‌ی خروجی
- روش پیشنهادی
- مزایای روش پیشنهادی
- جمع‌بندی
ارزیابی
- دادگان و محیط پیاده‌سازی
- حملات
- قرار‌دهی در برابر دادگان خارج از توزیع
- تحلیل و تشخیص حملات
- جمع‌بندی
نتیجه‌گیری

Friend's email
Your name
Your email
enter code