Decision-based adversarial attacks construct inputs that fool a machine learning model into making targeted mispredictions. For the most part, these attacks have been applied directly to isolated neural network models. However, in practice, machine learning models are just a component of a much larger system, and we find that state-of-the-art query-based attacks are as much as four times less effective at attacking a prediction pipeline with only one preprocessor than attacking the machine learning model alone. We can explain this discrepancy by the fact that most preprocessors introduce some notion of “invariant” to the input space (e.g., with center crop, the prediction is invariant to the border pixels). Hence, attacks that are unaware of this invariance inevitably wastes a large number of queries to re-discover or to overcome it. We therefore develop techniques to first reverse-engineer preprocessors and then use the extracted information to attack the end-to-end system. Our extraction method uses only a few hundreds queries to learn the preprocessors used by most publicly available models, and our preprocessor-aware attack recovers the same efficacy as just attacking the model alone.
Part-Based Models Improve Adversarial Robustness
Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, and David Wagner
In International Conference on Learning Representations, May 2023
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. We believe that the richer form of annotation helps guide neural networks to learn more robust features without requiring more samples or larger models. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts and then classify the segmented object. Empirically, our part-based models achieve both higher accuracy and higher adversarial robustness than a ResNet-50 baseline on all three datasets. For instance, the clean accuracy of our part models is up to 15 percentage points higher than the baseline’s, given the same level of robustness. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations. The code is publicly available at https://github.com/chawins/adv-part-model.
Short: Certifiably Robust Perception against Adversarial Patch Attacks: A Survey
Chong Xiang, Chawin Sitawarin, Tong Wu, and Prateek Mittal
In 1st Symposium on Vehicle Security and Privacy (VehicleSec), Mar 2023
Co-located with NDSS 2023. Best Short/WIP Paper Award Runner-Up.
The physical-world adversarial patch attack poses a security threat to AI perception models in autonomous vehicles. To mitigate this threat, researchers have designed defenses with certifiable robustness. In this paper, we survey existing certifiably robust defenses and highlight core robustness techniques that are applicable to a variety of perception tasks, including classification, detection, and segmentation. We emphasize the unsolved problems in this space to guide future research, and call for attention and efforts from both academia and industry to robustify perception models in autonomous vehicles.
Machine learning models are known to be susceptible to adversarial perturbation. One famous attack is the adversarial patch, a sticker with a particularly crafted pattern that makes the model incorrectly predict the object it is placed on. This attack presents a critical threat to cyber-physical systems that rely on cameras such as autonomous cars. Despite the significance of the problem, conducting research in this setting has been difficult; evaluating attacks and defenses in the real world is exceptionally costly while synthetic data are unrealistic. In this work, we propose the REAP (REalistic Adversarial Patch) Benchmark, a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions. Built on top of the Mapillary Vistas dataset, our benchmark contains over 14,000 traffic signs. Each sign is augmented with a pair of geometric and lighting transformations, which can be used to apply a digitally generated patch realistically onto the sign, while matching real-world conditions. Using our benchmark, we perform the first large-scale assessments of adversarial patch attacks under realistic conditions. Our experiments suggest that adversarial patch attacks may present a smaller threat than previously believed and that the success rate of an attack on simpler digital simulations is not predictive of its actual effectiveness in practice.
Demystifying the Adversarial Robustness of Random Transformation Defenses
Chawin Sitawarin, Zachary Golan-Strieb, and David Wagner
In Proceedings of the 39th International Conference on Machine Learning, Oct 2022
Best Paper Award from AAAI-2022 Workshop on Adversarial Machine Learning and Beyond
Neural networks’ lack of robustness against attacks raises concerns in security-sensitive settings such as autonomous vehicles. While many countermeasures may look promising, only a few withstand rigorous evaluation. Defenses using random transformations (RT) have shown impressive results, particularly BaRT (Raff et al., 2019) on ImageNet. However, this type of defense has not been rigorously evaluated, leaving its robustness properties poorly understood. Their stochastic properties make evaluation more challenging and render many proposed attacks on deterministic models inapplicable. First, we show that the BPDA attack (Athalye et al., 2018a) used in BaRT’s evaluation is ineffective and likely over-estimates its robustness. We then attempt to construct the strongest possible RT defense through the informed selection of transformations and Bayesian optimization for tuning their parameters. Furthermore, we create the strongest possible attack to evaluate our RT defense. Our new attack vastly outperforms the baseline, reducing the accuracy by 83% compared to the 19% reduction by the commonly used EoT attack (4.3x improvement). Our result indicates that the RT defense on Imagenette dataset (ten-class subset of ImageNet) is not robust against adversarial examples. Extending the study further, we use our new attack to adversarially train RT defense (called AdvRT), resulting in a large robustness gain.
Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams
Chawin Sitawarin, Evgenios M Kornaropoulos, Dawn Song, and David Wagner
In Advances in Neural Information Processing Systems, Oct 2021
While Adversarial Training remains the standard in improving robustness to adversarial attack, it often sacrifices accuracy on natural (clean) samples to a significant extent. Dual-domain training, optimizing on both clean and adversarial objectives, can help realize a better trade-off between clean accuracy and robustness. In this paper, we develop methods to improve dual-domain training for large adversarial perturbations and complex datasets. We first demonstrate that existing methods suffer from poor performance in this setting, due to a poor training procedure and overfitting to a particular attack. Then, we develop novel methods to address these issues. First, we show that adding KLD regularization to the dual training objective mitigates this overfitting and achieves a better trade-off, on CIFAR-10 and a 10-class subset of ImageNet. Then, inspired by domain adaptation, we develop a new normalization technique, Dual Batch Normalization, to further improve accuracy. Combining these two strategies, our model sets a new state of the art in trade-off performance for dual-domain adversarial training.
SAT: Improving Adversarial Training via Curriculum-Based Loss Smoothing
Chawin Sitawarin, Supriyo Chakraborty, and David Wagner
In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, Jul 2021
Adversarial training (AT) has become a popular choice for training robust networks. However, it tends to sacrifice clean accuracy heavily in favor of robustness and suffers from a large generalization error. To address these concerns, we propose Smooth Adversarial Training (SAT), guided by our analysis on the eigenspectrum of the loss Hessian. We find that curriculum learning, a scheme that emphasizes on starting "easy” and gradually ramping up on the "difficulty” of training, smooths the adversarial loss landscape for a suitably chosen difficulty metric. We present a general formulation for curriculum learning in the adversarial setting and propose two difficulty metrics based on the maximal Hessian eigenvalue (H-SAT) and the softmax probability (P-SA). We demonstrate that SAT stabilizes network training even for a large perturbation norm and allows the network to operate at a better clean accuracy versus robustness trade-off curve compared to AT. This leads to a significant improvement in both clean accuracy and robustness compared to AT, TRADES, and other baselines. To highlight a few results, our best model improves normal and robust accuracy by 6% and 1% on CIFAR-100 compared to AT, respectively. On Imagenette, a ten-class subset of ImageNet, our model outperforms AT by 23% and 3% on normal and robust accuracy respectively.
The adversarial training paradigm has become the standard in training deep neural networks for robustness. Yet, it remains unstable, with the mechanisms driving this instability poorly understood. In this study, we discover that this instability is primarily driven by a non-smooth optimization landscape and an internal covariate shift phenomenon, and show that Batch Normalization (BN) can effectively mitigate both these issues. Further, we demonstrate that BN universally improves clean and robust performance across various defenses, datasets, and model types, with greater improvement on more difficult tasks. Finally, we confirm BN’s heterogeneous distribution issue with mixed-batch training and propose a solution.
Minimum-Norm Adversarial Examples on KNN and KNN Based Models
Chawin Sitawarin, and David Wagner
In 2020 IEEE Security and Privacy Workshops (SPW), May 2020
We study the robustness against adversarial examples of kNN classifiers and classifiers that combine kNN with neural networks. The main difficulty lies in the fact that finding an optimal attack on kNN is intractable for typical datasets. In this work, we propose a gradient-based attack on kNN and kNN-based defenses, inspired by the previous work by Sitawarin & Wagner . We demonstrate that our attack outperforms their method on all of the models we tested with only a minimal increase in the computation time. The attack also beats the state-of-the-art attack  on kNN when k > 1 using less than 1% of its running time. We hope that this attack can be used as a new baseline for evaluating the robustness of kNN and its variants.
Analyzing the Robustness of Open-World Machine Learning
Vikash Sehwag, Arjun Nitin Bhagoji, Liwei Song, Chawin Sitawarin, Daniel Cullina, Mung Chiang, and Prateek Mittal
In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, May 2019
When deploying machine learning models in real-world applications, an open-world learning framework is needed to deal with both normal in-distribution inputs and undesired out-of-distribution (OOD) inputs. Open-world learning frameworks include OOD detectors that aim to discard input examples which are not from the same distribution as the training data of machine learning classifiers. However, our understanding of current OOD detectors is limited to the setting of benign OOD data, and an open question is whether they are robust in the presence of adversaries. In this paper, we present the first analysis of the robustness of open-world learning frameworks in the presence of adversaries by introducing and designing øodAdvExamples. Our experimental results show that current OOD detectors can be easily evaded by slightly perturbing benign OOD inputs, revealing a severe limitation of current open-world learning frameworks. Furthermore, we find that øodAdvExamples also pose a strong threat to adversarial training based defense methods in spite of their effectiveness against in-distribution adversarial attacks. To counteract these threats and ensure the trustworthy detection of OOD inputs, we outline a preliminary design for a robust open-world machine learning framework.
(We took the paper down from arXiv because the defense is broken by our new attack. The paper is still available [here](https://drive.google.com/file/d/1_3SjKi92mfCRAg99EXEJXpOpGCCw2OXN/view?usp=sharing)) Robustness is an increasingly important property of machine learning models as they become more and more prevalent. We propose a defense against adversarial examples based on a k-nearest neighbor (kNN) on the intermediate activation of neural networks. Our scheme surpasses state-of-the-art defenses on MNIST and CIFAR-10 against l2-perturbation by a significant margin. With our models, the mean perturbation norm required to fool our MNIST model is 3.07 and 2.30 on CIFAR-10. Additionally, we propose a simple certifiable lower bound on the l2-norm of the adversarial perturbation using a more specific version of our scheme, a 1-NN on representations learned by a Lipschitz network. Our model provides a nontrivial average lower bound of the perturbation norm, comparable to other schemes on MNIST with similar clean accuracy.
On the Robustness of Deep K-Nearest Neighbors
Chawin Sitawarin, and David Wagner
In 2019 IEEE Security and Privacy Workshops (SPW), May 2019
Despite a large amount of attention on adversarial examples, very few works have demonstrated an effective defense against this threat. We examine Deep k-Nearest Neighbor (DkNN), a proposed defense that combines k-Nearest Neighbor (kNN) and deep learning to improve the model’s robustness to adversarial examples. It is challenging to evaluate the robustness of this scheme due to a lack of efficient algorithm for attacking kNN classifiers with large k and high-dimensional data. We propose a heuristic attack that allows us to use gradient descent to find adversarial examples for kNN classifiers, and then apply it to attack the DkNN defense as well. Results suggest that our attack is moderately stronger than any naive attack on kNN and significantly outperforms other attacks on DkNN.
Enhancing Robustness of Machine Learning Systems via Data Transformations
Arjun Nitin Bhagoji, Daniel Cullina, Chawin Sitawarin, and Prateek Mittal
In 52nd Annual Conference on Information Sciences and Systems (CISS), May 2018
We propose the use of data transformations as a defense against evasion attacks on ML classifiers. We present and investigate strategies for incorporating a variety of data transformations including dimensionality reduction via Principal Component Analysis and data "anti-whitening" to enhance the resilience of machine learning, targeting both the classification and the training phase. We empirically evaluate and demonstrate the feasibility of linear transformations of data as a defense mechanism against evasion attacks using multiple real-world datasets. Our key findings are that the defense is (i) effective against the best known evasion attacks from the literature, resulting in a two-fold increase in the resources required by a white-box adversary with knowledge of the defense for a successful attack, (ii) applicable across a range of ML classifiers, including Support Vector Machines and Deep Neural Networks, and (iii) generalizable to multiple application domains, including image classification and human activity classification.
Not All Pixels Are Born Equal: An Analysis of Evasion Attacks under Locality Constraints
Vikash Sehwag, Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Mung Chiang, and Prateek Mittal
In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Oct 2018
Deep neural networks (DNNs) have enabled success in learning tasks such as image classification, semantic image segmentation and steering angle prediction which can be key components of the computer vision pipeline of safety-critical systems such as autonomous vehicles. However, previous work has demonstrated the feasibility of using physical adversarial examples to attack image classification systems.
Sign recognition is an integral part of autonomous cars. Any misclassification of traffic signs can potentially lead to a multitude of disastrous consequences, ranging from a life-threatening accident to even a large-scale interruption of transportation services relying on autonomous cars. In this paper, we propose and examine security attacks against sign recognition systems for Deceiving Autonomous caRs with Toxic Signs (we call the proposed attacks DARTS). In particular, we introduce two novel methods to create these toxic signs. First, we propose Out-of-Distribution attacks, which expand the scope of adversarial examples by enabling the adversary to generate these starting from an arbitrary point in the image space compared to prior attacks which are restricted to existing training/test data (In-Distribution). Second, we present the Lenticular Printing attack, which relies on an optical phenomenon to deceive the traffic sign recognition system. We extensively evaluate the effectiveness of the proposed attacks in both virtual and real-world settings and consider both white-box and black-box threat models. Our results demonstrate that the proposed attacks are successful under both settings and threat models. We further show that Out-of-Distribution attacks can outperform In-Distribution attacks on classifiers defended using the adversarial training defense, exposing a new attack vector for these defenses.
Inverse-designed photonic fibers and metasurfaces for nonlinear frequency conversion (Invited)
Chawin Sitawarin, Weiliang Jin, Zin Lin, and Alejandro W. Rodriguez
Typically, photonic waveguides designed for nonlinear frequency conversion rely on intuitive and established principles, including index guiding and bandgap engineering, and are based on simple shapes with high degrees of symmetry. We show that recently developed inverse-design techniques can be applied to discover new kinds of microstructured fibers and metasurfaces designed to achieve large nonlinear frequency-conversion efficiencies. As a proof of principle, we demonstrate complex, wavelength-scale chalcogenide glass fibers and gallium phosphide three-dimensional metasurfaces exhibiting some of the largest nonlinear conversion efficiencies predicted thus far, e.g., lowering the power requirement for third-harmonic generation by 104 and enhancing second-harmonic generation conversion efficiency by 107. Such enhancements arise because, in addition to enabling a great degree of tunability in the choice of design wavelengths, these optimization tools ensure both frequency- and phase-matching in addition to large nonlinear overlap factors.
Rogue Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logos
Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Prateek Mittal, and Mung Chiang
We propose a new real-world attack against the computer vision based systems of autonomous vehicles (AVs). Our novel Sign Embedding attack exploits the concept of adversarial examples to modify innocuous signs and advertisements in the environment such that they are classified as the adversary’s desired traffic sign with high confidence. Our attack greatly expands the scope of the threat posed to AVs since adversaries are no longer restricted to just modifying existing traffic signs as in previous work. Our attack pipeline generates adversarial samples which are robust to the environmental conditions and noisy image transformations present in the physical world. We ensure this by including a variety of possible image transformations in the optimization problem used to generate adversarial samples. We verify the robustness of the adversarial samples by printing them out and carrying out drive-by tests simulating the conditions under which image capture would occur in a real-world scenario. We experimented with physical attack samples for different distances, lighting conditions and camera angles. In addition, extensive evaluations were carried out in the virtual setting for a variety of image transformations. The adversarial samples generated using our method have adversarial success rates in excess of 95% in the physical as well as virtual settings.
As an initial assessment, over 480,000 labeled virtual images of normal highway driving were readily generated in Grand Theft Auto V’s virtual environment. Using these images, a CNN was trained to detect following distance to cars/objects ahead, lane markings, and driving angle (angular heading relative to lane centerline): all variables necessary for basic autonomous driving. Encouraging results were obtained when tested on over 50,000 labeled virtual images from substantially different GTA-V driving environments. This initial assessment begins to define both the range and scope of the labeled images needed for training as well as the range and scope of labeled images needed for testing the definition of boundaries and limitations of trained networks. It is the efficacy and flexibility of a "GTA-V"-like virtual environment that is expected to provide an efficient well-defined foundation for the training and testing of Convolutional Neural Networks for safe driving. Additionally, described is the Princeton Virtual Environment (PVE) for the training, testing and enhancement of safe driving AI, which is being developed using the video-game engine Unity. PVE is being developed to recreate rare but critical corner cases that can be used in re-training and enhancing machine learning models and understanding the limitations of current self driving models. The Florida Tesla crash is being used as an initial reference.
Inverse-Designed Nonlinear Nanophotonic Structures: Enhanced Frequency Conversion at the Nano Scale
Zin Lin, Chawin Sitawarin, Marko Loncar, and Alejandro W. Rodriguez
In 2016 Conference on Lasers and Electro-Optics, CLEO 2016, Dec 2016
\textcopyright 2016 OSA. We describe a large-scale computational approach based on topology optimization that enables automatic discovery of novel nonlinear photonic structures. As examples, we design complex cavity and fiber geometries that can achieve high-efficiency nonlinear frequency conversion.