[processed by Michael Hodel] Adversarial Training with LOTS

Deep neural networks are well-known for their great capabilities in classifying contents in images. For example, the latest networks can classify images of 1000 categories just based on examples.

A couple of year ago, it was shown that neural networks are susceptible to so-called adversarial examples, i.e., where well-designed unnoticeably small perturbations to correctly classified inputs lead deep networks to misclassification. Since then, many different types of adversarial attacks, i.e., different ways of computing these perturbations have been proposed. Also, researchers have come up with many different defenses against adversarial samples, the most promising of them is to train on adversarial samples, see On the Convergence and Robustness of Adversarial Training.

One specific adversarial attack is the so-called Layerwise Origin-Target Synthesis (LOTS) that is able to not only change the final classification of the network, but also to get the internal representation (deep feature) output of any layer of the network close to a given target by minimally perturbing the input image, see LOTS about Attacking Deep Features. In an unpublished experiment we found that adversarial training using some standard adversarial image generation techniques as implemented in AdverTorch do not improve the robustness against LOTS adversarials. Even when we tried to used LOTS adversarial examples to train a network, we found that training with LOTS does not help against LOTS, but this was never published.

Since there was a new adversarial training technique implemented in On the Convergence and Robustness of Adversarial Training, this has improved the stability against other adversarial image generation techniques. However, it is questionable if it also helps to defeat LOTS adversarials.

There are several tasks to be fulfilled in this thesis. The first task would be to re-implement the traditional and the novel adversarial training strategy and the results of the above-mentioned paper should be reproduced. Second, a reasonable training strategy using LOTS adversarial images need to be defined (an implementation of LOTS in pytorch is available). Finally, good evaluation criteria need to be designed.

Requirements

A reasonable understanding of deep neural networks and how they learn.
Programming experience in python or the willingness to learn python.
Decent understanding of written English and Math.

Department of Informatics Artificial Intelligence and Machine Learning Group

Quicklinks und Sprachwechsel

Main navigation

[processed by Michael Hodel] Adversarial Training with LOTS

Requirements