[Processed by Jan Schnyder] Adversarial Training for Teaching Networks to Reject Unknown Inputs

Deep neural networks are well-known to be able to learn how to classify the content of images based on examples of the classes. When a previously unseen sample of one of the training classes is shown to the network, it is able to classify the correct class with high level of accuracy and with high confidence. However, when a sample of a class that is unknown to the network is presented, the network has no other choice than classifying it as one of the known classes. Unfortunately, this usually happens with high confidence, so that simply thresholding on confidence and rejecting samples below threshold as unknown is not a solution.

In Reducing Network Agnostophobia we have shown that it is possible to train networks with negative examples -- samples that do not belong to any of the known classes -- in order to enable the network to reject these samples as unknown. The introduced loss function reduces the magnitude of the deep features one layer before the classification output, which has the proven effect of reducing the confidence of negative training samples. This technique makes it possible to use the confidence value as a predictor, i.e., samples with low confidence can be rejected as unknown. Additionally, the confidence can be multiplied by the magnitude of the deep features, and the resulting score can be thresholded to achieve an even better separation, even for samples of classes that were not part of the negative set used for training.

One issue with this technique is that it requires an additional negative training set which needs to be close to the positive samples. Simple inputs such as random noise or unrelated images do not help in the rejection of samples that are similar to the known classes, but actually belong to different, unknown classes.

In the literature, it is common practice to use data augmentation techniques techniques -- such as adding Gaussian noise, blurring or alike -- to increase the number of positive samples and, therewith, to improve the stability of deep neural networks. Some researchers even utilize so-called adversarial samples to train their network to get more stable, such as On the Convergence and Robustness of Adversarial Training, and there is an open-source implementation for adversarial training in AdverTorch. However, the goal of adversarial training in those resources is to to use adversarial samples as positive samples for the known classes.

In this master thesis, it is to be investigated if input manipulations (blurring, noising, or adversarial samples) of samples of known classes can make up a good negative training set, which can then be used as a replacement of the additional training set required before. The research includes to investigate, which (combination of) input manipulations work better than others in achieving this task.

Requirements

A reasonable understanding of deep neural networks and how they learn.
Programming experience in python and a deep learning framework, optimally, pytorch.
Decent understanding of written English.

Department of Informatics Artificial Intelligence and Machine Learning Group

Quicklinks und Sprachwechsel

Main navigation

[Processed by Jan Schnyder] Adversarial Training for Teaching Networks to Reject Unknown Inputs

Requirements