Universal adversarial attacks on text classifiers

Please enable javascript in your browser.

Behjati, M ; Sharif University of Technology | 2019

414 Viewed

Type of Document: Article
DOI: 10.1109/ICASSP.2019.8682430
Publisher: Institute of Electrical and Electronics Engineers Inc , 2019
Abstract:
Despite the vast success neural networks have achieved in different application domains, they have been proven to be vulnerable to adversarial perturbations (small changes in the input), which lead them to produce the wrong output. In this paper, we propose a novel method, based on gradient projection, for generating universal adversarial perturbations for text; namely sequence of words that can be added to any input in order to fool the classifier with high probability. We observed that text classifiers are quite vulnerable to such perturbations: inserting even a single adversarial word to the beginning of every input sequence can drop the accuracy from 93% to 50%. © 2019 IEEE
Keywords:
Gradient projection ; Neural network ; Text classifier ; Classification (of information) ; Neural networks ; Speech communication ; Gradient projections ; High probability ; Input sequence ; Text classifiers ; Universal adversarial perturbation ; Audio signal processing
Source: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 May 2019 through 17 May 2019 ; Volume 2019-May , 2019 , Pages 7345-7349 ; 15206149 (ISSN); 9781479981311 (ISBN)
URL: https://ieeexplore.ieee.org/document/8682430