Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation

Hypothesis

The author gained insights from loss landscape(i.e. loss distribution in hyper-dimensional space)(ref:https://arxiv.org/pdf/1712.09913.pdf )

Suppose that in the process of generating the adversarial example, if the loss of this example is in a flatter region (e.g., plain) for the surrogate model, then it is possible to have a better attack transferability.

Implementation: add constraint terms to minimize adv example in a local neighborhood region corresponding to the maximum loss.