Improving the Robustness of Deep Learning Models Against Model Extraction Attacks

Sobhanian Ghasi, Amir Mohammad; Jalili, Rasool

Please enable javascript in your browser.

Improving the Robustness of Deep Learning Models Against Model Extraction Attacks

Sobhanian Ghasi, Amir Mohammad | 2022

206 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 55057 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Jalili, Rasool
Abstract:
Deep neural networks attain high performance on many domains and gaining more attention from real-world businesses in recent years. With the emergence of Machine Learning as a Service (MLaaS), users have the opportunity to produce their model on these platforms and make it available to others through the prediction APIs. However, studies have shown that an adversary can produce a surrogate model with similar characteristics to the victim's model by accessing these APIs. Aside from ruining the victim's business plan, studies have shown that an adversary can implement more sophisticated attacks on the victim's model by accessing a surrogate model. Due to the adversary's inaccessibility to the victim model's training set, recent studies proposed using synthetic or natural surrogate datasets for conducting model extraction attacks. These alternative datasets have different Distribution than the victim model's training set. In this study, we investigate the maximum Softmax probability of the model's inputs as a potential criterion for detecting out-of-distribution input sequences. We demonstrate that the maximum Softmax probability histogram of model extraction attacks' input sequences are Distinguishable from benign users' ones. In this work, we introduce the in-distribution detection approach (IDA) which attempts to detect malicious users by observing their input sequences. Based on our experiments, IDA can robustly detect three types of adversaries with high accuracy and low false-positive rate by observing only a limited number of their inputs. Finally, we compare performances of IDA and Prada and our results show that IDA outperforms Prada by observing even shorter input sequences.
Keywords:
Deep Neural Networks ; Adversarial Example ; Machine Learning Security ; Model Extraction Attacks

Digital Object List

محتواي کتاب
view

Bookmark

Friend's email
Your name
Your email
enter code