Improving the Adversarial Robustness of Neural ODE Image Classifiers by Tuning the Tolerance Parameter

Fabio Carrara; Roberto Caldelli; Fabrizio Falchi; Giuseppe Amato

doi:10.3390/info13120555

Improving the Adversarial Robustness of Neural ODE Image Classifiers by Tuning the Tolerance Parameter

Carrara, Fabio;Caldelli, Roberto;Falchi, Fabrizio;Amato, Giuseppe 2022-11-26 00:00:00 information Article Improving the Adversarial Robustness of Neural ODE Image Classiﬁers by Tuning the Tolerance Parameter 1 2,3, 1 1 Fabio Carrara , Roberto Caldelli * , Fabrizio Falchi and Giuseppe Amato Istituto di Scienza e Tecnologie dell’Informazione, 56124 Pisa, Italy Media Integration and Communication Center, National Inter-University Consortium for Telecommunications (CNIT), 50134 Florence, Italy Faculty of Economics, Mercatorum University, 00186 Rome, Italy * Correspondence: [email protected] Abstract: The adoption of deep learning-based solutions practically pervades all the diverse areas of our everyday life, showing improved performances with respect to other classical systems. Since many applications deal with sensible data and procedures, a strong demand to know the actual reliability of such technologies is always present. This work analyzes the robustness characteristics of a speciﬁc kind of deep neural network, the neural ordinary differential equations (N-ODE) network. They seem very interesting for their effectiveness and a peculiar property based on a test-time tunable parameter that permits obtaining a trade-off between accuracy and efﬁciency. In addition, adjusting such a tolerance parameter grants robustness against adversarial attacks. Notably, it is worth highlighting how decoupling the values of such a tolerance between training and test time can strongly reduce the attack success rate. On this basis, we show how such tolerance can be adopted, during the prediction phase, to improve the robustness of N-ODE to adversarial attacks. In particular, we demonstrate how we can exploit this property to construct an effective detection Citation: Carrara, F.; Caldelli, R.; strategy and increase the chances of identifying adversarial examples in a non-zero knowledge attack Falchi, F.; Amato, G. Improving the scenario. Our experimental evaluation involved two standard image classiﬁcation benchmarks. This Adversarial Robustness of Neural showed that the proposed detection technique provides high rejection of adversarial examples while ODE Image Classiﬁers by Tuning the maintaining most of the pristine samples. Tolerance Parameter. Information 2022, 13, 555. https://doi.org/ Keywords: neural ordinary differential equation; adversarial defense; image classiﬁcation 10.3390/info13120555 Academic Editors: Willy Susilo, Jun Hu, Antonio Jiménez-Martín and Zahir M. Hussain 1. Introduction Received: 5 August 2022 Deep learning models have had huge success, mainly thanks to their undeniable Accepted: 22 November 2022 performance with respect to many complex tasks, e.g., visual perception, natural language Published: 26 November 2022 processing, self-driving cars, and multimedia analysis. Notwithstanding this, various ﬂaws and drawbacks still need to be tackled. Indeed, when neural networks are called to Publisher’s Note: MDPI stays neutral work in an unfair environment, as can happen in multimedia security applications, they with regard to jurisdictional claims in have demonstrated crucial vulnerabilities that a malevolent user could exploit through published maps and institutional afﬁl- the design of ad hoc adversarial manipulations in order to induce the model into a wrong iations. evaluation. Such an incorrect decision might be crucial for the consequent action to be taken. In the context of image classiﬁcation, also focused on in this work, an adversary can control and mislead a deep neural network classiﬁer by introducing a limited malicious Copyright: © 2022 by the authors. perturbation into the input image [1]. Licensee MDPI, Basel, Switzerland. The aforementioned phenomenon has been vastly analyzed with several neural net- This article is an open access article work architectures in multiple tasks. Attacking a deep model seems relatively easy due to distributed under the terms and its differentiability and complexity (many successful adversarial generation approaches ex- conditions of the Creative Commons ist [2–4]), but counteracting and defending from attacks is still an open problem. However, Attribution (CC BY) license (https:// multiple approaches aiming at strengthening the attacked model [5–7] achieve robustness creativecommons.org/licenses/by/ to weak adversaries, but stronger attacks usually can mislead also enhanced ones: ad- 4.0/). Information 2022, 13, 555. https://doi.org/10.3390/info13120555 https://www.mdpi.com/journal/information Information 2022, 13, 555 2 of 12 versarial examples appear to be an intrinsic shortcoming affecting every common deep learning architecture. This work focuses on the phenomenon of adversarial examples against neural or- dinary differential Equation (N-ODE) networks, which represent a recent deep learning model that generalizes deep residual networks through the solution of parametric ODEs. Among its peculiar properties, we are interested in the ability to tune at test time the precision–efﬁciency trade-off of the network by changing the tolerance of the adaptive ODE solver with respect to that used in the forward computation. Thanks to this property, neural ODE nets exhibited increased robustness to projected gradient descent (PGD) attacks with respect to standard architectures such as ResNets, as evidenced in [8]; higher tolerance values provided increased robustness at a negligible expense to the accuracy of the model. In reference [9], we further investigated how these phenomena occur under a stronger at- tacks, such as the Carlini and Wagner attack. In particular, we tested its performance when the values of the solver tolerance used for the adversarial generation and for the prediction phase are decoupled. Starting from this, ODE solver ’s tolerance has been introduced as a defensive property of neural ODEs against adversarial attacks, and adversarial detection approaches can be designed accordingly. Test-time tolerance randomization is presented as a possible defense approach in image classiﬁcation benchmarks under the assumption of a zero-knowledge adversary—i.e., the attacker can access the model but does not know about the defense strategy. Moreover, for the sake of completeness, we have also investigated the more general and challenging case of an adaptive attack scenario where the attacker knows that there is a defense procedure based on the ODE solver tolerance, and both the attacker and the defender can play with it. We have speciﬁcally analyzed how the attack success rate can vary in different circumstances. The contributions of the present work are the following: • We provide a complete study on neural ODE image classiﬁers and on how their robustness can vary by playing with the ODE solver tolerance against adversarial attacks such as the Carlini and Wagner one; • We demonstrate the defensive properties offered by ODE nets in a zero-knowledge adversarial scenario; • We analyze how the robustness offered by Neural ODE nets varies in the more strin- gent scenario of an active attacker that changes the attack-time solver tolerance. The rest of the paper is organized as follows: Section 2 introduces related work, and Section 3 brieﬂy recalls background knowledge on neural ODE nets and the Carlini and Wagner adversarial attack. In Section 4, the robustness to adversarial samples of neural ODEs in relation to the ODE solver tolerance is debated, and in Section 4.1, we introduce an adversarial detection scheme harnessing this property. Section 5 is dedicated to a novel analysis that takes into account an adaptive attacker and studies the effect of the solver tolerance on the attacker side. In Section 6, we report the implementation details of our experimental evaluation (code and resources to reproduce the experiments presented here are available at https://github.com/fabiocarrara/neural-ode-features/tree/master/ adversarial, accessed on 1 August 2022). Section 7 draws some conclusions and lays out future research directions. 2. Related Work In the scientiﬁc literature, the vulnerability to adversarial examples is studied diffusely. The majority of the analyzed deep models focus on deep convolutional network image classiﬁers [1,10,11] under a variety of attacks, such as PGD [12] or the stronger CW [4]. Defensive methodologies against adversarial samples have been devised speciﬁcally for attacks, such as model enhancement via distillation [6] and adversarial sample detection via statistical methods [13] or auxiliary models [14,15]. Among them, the most promising methods are based on the introduction of randomization in the prediction process [16,17]. Feinman et al. [18] proposed a detection scheme based on randomizing the output of the network using dropout. This approach relates to the rationale of our proposed detection Information 2022, 13, 555 3 of 12 method, as both resort to the stochasticity of the output [19]. Not so many works deal with analyzing and defending neural ODE architectures. Our previous works include Carrara et al. [8,9], which analyzes ODE nets under PGD and CW attacks and ﬁnds their superior robustness with respect to standard architectures. Such intrinsic resilience is also evidenced in Yan et al. [20], which presents an extended empirical study on this phenomenon and proposes a regularization based on the time-invariance property of steady states of ODE solutions in order to improve robustness. Finally, relevant to our proposed method is also the work of Liu et al. [21] that exploits stochasticity by injecting noise in the ODE to increase robustness to perturbations of initial conditions, including adversarial ones. 3. Background This section is dedicated to introducing the neural ordinary differential equation (N-ODE) networks and the Carlini and Wagner adversarial attack used in the present work. 3.1. The Neural ODE Networks Hereafter, a basic description of neural ODE (ordinary differential equations) is pro- vided; a more detailed discussion can be found in [22]. A neural ODE network is a parametric model which includes an ODE block. The computation of such a block is deﬁned by a parametric ordinary differential equation (ODE) whose solution gives the output result. The input of the ODE block is indicated with h , and it coincides with the initial state ODE at time t , as in Equation (1): 0 0 dh(t) = f (h(t), t, q) dt . (1) h(t ) = h 0 0 The function f (), which depends on the parameter q, deﬁnes the continuous dynamic of the state h(t). The output of the block h(t ) at a time t > t is obtained by integrating 1 1 0 the ODE (see Equation (2)). Z Z t t 1 1 dh(t) h(t ) = h(t ) + dt = h(t ) + f (h(t), t, q)dt . (2) 1 0 0 dt t t 0 0 The above integral can be computed with standard ODE solvers, such as Runge– Kutta or multi-step methods. Thus, the computation performed by the ODE block can be formalized as a call to a generic ODE solver: h(t ) = ODESolver( f , h(t ), t , t , q) . (3) 1 0 0 1 Generally, in image classiﬁcation applications, the function f () is implemented by means of a small, trainable convolutional neural network. During training, the gradients of the output h(t ) with respect to the input h(t ) and the parameter q can be obtained using 1 0 the adjoint sensitivity method. One of the more interesting properties shown by ODE networks and determined by their intrinsic structure is deﬁnitely the accuracy–efﬁciency trade-off, which is tunable at inference time by controlling the tolerance parameter t of adaptive ODE solvers. The ODE-Net image classiﬁer we consider in this work (see Figure 1 bottom part, ODE) is constituted by an ODE block (based on Equation (3)) responsible for the whole feature extraction chain. Before this block, a pre-processing stage comprised of a single K-ﬁlter 4 4 convolutional layer is inserted; it linearly maps the input image in a proper state space. The f () function in the ODE block is implemented as a standard residual block used in ResNets (described below). After the ODE block, the classiﬁcation step is implemented with a global average-pooling operation followed by a single fully-connected layer with softmax activation. Information 2022, 13, 555 4 of 12 Residual Network (RES) x6 Classifier … Avg.Pool + 10-d FC + Softmax Neural ODE Network (ODE) Classifier ODE Block Avg.Pool + 3x3, K 10-d FC + 3x3, K Softmax Figure 1. Convolutional layers are written in the format kernel width kernel height [/ stride], n. ﬁlters; padding is always set to 1. For MNIST, K = 64, and for CIFAR-10, K = 256. In addition to this, we consider also a standard ResNet (Figure 1 top part, RES) as baseline [22] for comparison with the ODE-Nets. It is composed of a 64-ﬁlter 3 3 convo- lutional layer and 8 residual blocks. Each residual block follows the standard formulation deﬁned in [23], where group normalization [24] is used instead of batch one. The sequence of layers comprising a residual block is GN-ReLU-Conv-GN-ReLU-Conv-GN, where GN stands for group normalization with 32 groups, and Conv is a 3 3 convolutional layer. The ﬁrst two blocks downsample their input by a factor of 2 using a stride of 2, and the subsequent blocks maintain the input dimensionality. Only the ﬁrst block uses 64-ﬁlters convolutions, and the subsequent ones employ K-ﬁlter convolutions, where K varies with the speciﬁc dataset. The ﬁnal classiﬁcation step is the same as before. 3.2. The Carlini and Wagner Attack This section brieﬂy introduces the Carlini and Wagner (CW) attack [4] that has been used in our work to test and evaluate the robustness of the ODE-Net to adversarial samples. The CW attack is currently considered one of the strongest available adversary techniques with which to attack neural networks designed for the image classiﬁcation task. Among the three existing versions (different metrics used to measure the perturbation), we have considered the CW-L , which is formalized as in Equation (4): adv adv min c g x + x x (4) with adv adv adv g(x ) = max max Z(x ) Z(x ) ,k (5) i t i6=t tanh(w) + 1 adv x = , (6) adv where g() is the objective function (misclassiﬁcation), x is the adversarial example in the pixel space, and w is its counterpart in the tanh space in which the optimization is carried out. Z() are the logits of a given input, t is the target class, k is a parameter that allows adjusting the conﬁdence with which the misclassiﬁcation occurs, and c is a positive constant whose value is set by exploiting a binary search procedure. The rationale behind the attack is to minimize at each iteration the highest conﬁdence among non-target classes (ﬁrst term of Equation (4)) while retaining the smallest possible perturbation (second term). It is worth mentioning the use of the term tanh(w) that represents a change in variable that allows one to move from the pixel to the tanh space. This helps regularize the image image 4x4 / 2, K 3x3, 64 3x3 / 2, 64 3x3, 64 3x3 / 2, K 3x3, K 3x3, K 3x3, K 3x3, K 3x3, K class class Information 2022, 13, 555 5 of 12 gradient in extremal regions of the perturbation space, thereby facilitating optimization with gradient-based optimizers. 4. Robustness via Tolerance Variation Though ODE-Nets are very promising and perform well, they are vulnerable to the same attacks as the standard networks. However, one of their properties, namely, the ability to change the ODE solver tolerance t at prediction time, is demonstrated to provide some degree of robustness against basic adversarial attacks [8]. Changing the tolerance value of an adaptive ODE solver causes the solver to adopt different step sizes during the computation of the ODE solution, and this leads to a perturbation of the forward pass that increases the adversarial robustness. Such property is observed even under the CW attack, which represents one of the strongest adversarial algorithms to fool neural networks in image classiﬁcation speciﬁcally. To prove this, a neural ODE model trained on the MNIST and CIFAR-10 datasets, two well-known 10-class image classiﬁcation benchmarks, has been considered. We used the train split (50k images) of each dataset to train the model and half of the test split (5k images) to generate adversarial samples with the CW attack (examples are reported in Figure 2). The training procedure was performed once with a ﬁxed tolerance value, and we considered multiple values of the tolerance when doing inference and generating adversarial samples. MNIST CIFAR-10 Figure 2. Adversarial examples found with the Carlini and Wagner attack on our neural ODE network on MNIST and CIFAR-10 datasets. Adversarial perturbations (Diff.) of CIFAR-10 samples have been ampliﬁed by a factor 10 for visualization purposes. In Table 1, we report the classiﬁcation error on the test subset, the attack success rate (in percentage), and the mean L norm of the adversarial perturbation for each tested tolerance value on the two datasets; just for comparison, the results obtained by a standard residual network classiﬁer were inserted. Details on the datasets, models, and adversary generation are available in Section 6. Note that the basic behavior of both the standard residual (RES) and ODE-Net (ODE) models is similar: they show a limited error rate on original images, but on the contrary, the CW attack achieves a very high attack success rate. However, in ODE-Nets, the value of the ODE solver tolerance t plays an essential role in determining the success rate of an attack; when we increase the value of the tolerance t Orig. Orig. Diff. Adv. Diff. Adv. Information 2022, 13, 555 6 of 12 used at test time and by the attacker (t = t ), the classiﬁcation error rate is rather test attack stable, but the required attack budget increases. This is quite clear for the MNIST dataset, where the attack success rate quickly decreases, but it is also appreciable for CIFAR-10 when looking at the mean perturbation introduced by the attack. Though the attack success rate continues to be 100%, an increasing cost is paid in terms of applied distortion. While this witnesses again to the strength of the CW attack, on the other hand, it conﬁrms that the sensibility to the tolerance variations, found in the case of the projected gradient descent (PGD) attack [8], is also shown by the CW attack, suggesting this being a more general defensive property of ODE-Nets. Table 1. Classiﬁcation error (Err, %), Carlini and Wagner attack success rate (ASR, %), and mean L norm perturbation (Pert) of RES and ODE on MNIST and CIFAR-10 test sets; obviously only for ODE are quantities varying the test-time adaptive solver tolerance t (t = t ) listed. attack test MNIST CIFAR-10 2 5 Err (%) ASR (%) Pert (10 ) Err (%) ASR (%) Pert (10 ) RES 0.4 99.7 1.1 7.3 100 2.6 0.5 99.7 1.4 9.1 100 2.2 ODE t = 10 ODE t = 10 0.5 90.7 1.7 9.2 100 2.4 ODE t = 10 0.6 74.4 1.9 9.3 100 4.1 0.8 71.6 1.7 10.6 100 8.0 ODE t = 10 ODE t = 10 1.2 69.7 1.9 11.3 100 13.7 Intuition suggests that introducing a decoupling (t 6= t ) between attacker and attack test defender should increase robustness. To verify such hypothesis, we generated adversarial samples by setting a ﬁxed tolerance t and measuring the model’s accuracy when attack varying the test-time tolerance t . By considering a zero-knowledge scenario, the value test of attack tolerance t = t was taken from the best choice the attacker can make. train attack At prediction time, the tolerance was drawn from a log-uniform distribution with the 5 1 3 interval [10 , 10 ] centered in t = 10 ; 20 values were sampled for each image to train be classiﬁed. Figure 3 reports the accuracy of the ODE-Net classiﬁer on original inputs (blue lines) and adversarial examples (orange lines) for MNIST and CIFAR-10 datasets, respectively; the tolerance on the x-axis is binned (21 bins) in the log space. It is evident that accuracy on natural inputs (blue lines) is always stable and very high for each tolerance value, averagely around the original network accuracy (100% for MNIST and 90% for CIFAR-10); this means that varying the tolerance does not signiﬁcantly affect accuracy on standard pristine samples. On the contrary, accuracy on CW-created adversarial inputs (orange lines) is quite poor (this demonstrates the power of such a technique again), but it is very interesting to note that in the central bin (around t = 10 ) the attack has train the highest effectiveness; this seems to mean that when the tolerance at test time coincides with that adopted by the CW attacker, the classiﬁer is strongly induced to misclassify. Furthermore, it can also be appreciated that if t is moved away from the central value test used by the CW attacker, accuracy increases. This means that changes in the tolerance can provide robustness against CW attack, achieving, for instance, accuracy on adversarial inputs of about 60% (and corresponding accuracy of around 90% on original images) for the CIFAR-10 dataset (see Figure 3 on the extreme right). Finally, it is worth observing that the trend of growth of the orange lines is asymmetric with respect to the central value t = 10 , and higher values were achieved for the right side: this shows, as generally train expected, that increasing the tolerance permits one to gain in robustness. Information 2022, 13, 555 7 of 12 100% 80% Sample Type Natural 60% Adversarial Dataset 40% MNIST CIFAR-10 20% 0% − 5.0 − 4.5 − 4.0 − 3.5 − 3.0 − 2.5 − 2.0 − 1.5 − 1.0 log (τ ) test Figure 3. Accuracy vs. test-time solver tolerance t . For each image, we sampled 20 values for t test 5 1 from a log-uniform distribution within the [10 , 10 ] interval. We report the mean accuracy of the ODENet classiﬁer on natural and adversarial examples for each tolerance bin (in log space, points’ x-coordinates indicate the bin centers). 4.1. Defensive Tolerance Randomization In light of these ﬁndings, we exploited tolerance variation as an active measure against adversarial attacks and measured to which extent this defensive property can be effective in an adversarial detection scenario where the defender is asked to discern adversarial samples from authentic ones. We considered a white-box attack scenario in which the attacker has access to the trained ODE-Net and knows the parameter settings of the classiﬁer—speciﬁcally, the solver tolerance t used during the network training. An attack is successful if the train CW algorithm ﬁnds an adversarial perturbation leading to a misclassiﬁcation without exceeding a preﬁxed attack budget deﬁned as the maximum number of optimization iterations. According to this, we have introduced an adversarial detection strategy based on ODE-Net, tolerance randomization, which collects several predictions with different randomly drawn test-time tolerance parameters t , to detect whether the classiﬁcation test system is subjected to an adversarial sample. t is sampled uniformly from a range test centered on t such that t = t 6= t . Introducing such a variability also train train attack test helps the defendant against knowledgeable adversaries, as simply changing t to a test different ﬁxed value can be easily counteracted by the adversary also changing t to attack the new value. By indicating with V the number of voting members (i.e., the number of t values randomly drawn) belonging to the ensemble, we will declare that an adversarial sample is detected if v < v , where v is the largest amount of members that agree min agree have reached the same decision on the test image (size of the majority), and v is the min minimum consensus threshold required for assessing the authenticity (non-maliciousness) of the input. The performance of this adversarial detection scheme is depicted in Figure 4. We can see that, once establishing the size V of the voting ensemble (different colored lines), by varying the threshold v with a step of 1, ROC curves can be obtained in terms of TPR min versus FPR, where true positive indicates the correct classiﬁcation of a natural input. Such graphs demonstrate that high TPRs can be registered in correspondence with limited FPRs. Accuracy Information 2022, 13, 555 8 of 12 This is particularly visible for the MNIST dataset (see Figure 4a), but it is still true for CIFAR- 10; if, just for example, we refer to Figure 4b when V = 20 (purple line), by increasing the value of v (going down along the curve), we can reduce the FPR while maintaining a min high TPR: with v =20 a TPR=92% and a corresponding FPR=15% are achieved (see the min bottom-left corner of Figure 4b). This experiment basically demonstrates that if the ODE- Net is subjected to a zero-knowledge Carlini and Wagner attack in a white-box scenario, by resorting to test-time tolerance randomization, it is possible both to preserve classiﬁcation performances on natural images and signiﬁcantly reduce the capacity of the CW attack to fool the ODE classiﬁer at the expense of performing multiple inferences. 100.0% 100% 99% 99.8% 98% 99.6% 97% 96% 99.4% 95% 99.2% 94% Ensemble Size V Ensemble Size V 93% 99.0% 92% 98.8% 91% 25% 50% 75% 100% 25% 50% 75% 100% FPR FPR (b) CIFAR-10 (a) MNIST Figure 4. Analysis of the detection performance with the randomized tolerance ensemble. ROC curves (TPR vs. FPR, where TP = “correctly detected natural input” and FP = “adversarial input misdetected as natural”) are obtained after varying the minimum majority size v , i.e., if the number min of majoritarian votes v in the ensemble is greater than v , the input is considered authentic agree min (positive), and otherwise, adversarial (negative). 5. Robustness under Adaptive Attackers To date, we have studied tolerance variation for defensive purposes under the as- sumption of a non-adaptive adversary. In this section, we extend the analysis described in Sections 4 and 4.1 by also exploring the effect of the attacker ’s tolerance when generating adversarial samples. As the defender, the attacker can vary the solver ’s tolerance to generate malicious samples. While we observed that the defender should set t to a value away from test the t , the attacker instead aims at setting t = t , where he is certain about test attack attack the success of the attack. We explored how the CW attack success rate varies over the (t , t ) 2 R space. Instead of a randomized exploration of this space, we performed test attack a logarithmic grid sampling of tolerance values by setting t = 10 , i 2 f4,3,2,1, 0g independently for t and t . As in previous sections, CW attacks were performed on test attack our trained neural ODE classiﬁers using the ﬁrst halves of the MNIST and CIFAR-10 test sets with the solver ’s tolerance set to t . We report results in Figure 5, where for each attack (t , t ) couple, we show the percentage of successful attacks (adversarial samples attack test increasing v min increasing v min TPR TPR Information 2022, 13, 555 9 of 12 that fooled the network as intended, in red), the percentage of failed attacks (adversarial samples that were either “recovered” as such or correctly classiﬁed by the network, in green), and the percentage of successful but changed attacks (adversarial samples that were still misclassiﬁed but were classiﬁed differently from what the attacker expected, in yellow). For this experiment, we ignored samples on which the attack failed to generate an adversarial perturbation in the ﬁrst place, i.e., we discarded samples that failed to generate an adversarial sample when t = t , thus having an attack success rate of 100% in test attack the diagonal entries. This permits focusing only on the effect of tolerance decoupling on the attack success rate while discarding the contributions to robustness already studied and presented in Table 1. It is quite evident that tolerance decoupling between attack and defense can be disruptive for an attacker. For instance, this led to an attack failure rate of 0 3 up to 78.3% for MNIST when (t , t ) = (10 , 10 ), and 66.2% for CIFAR-10 when test attack 3 0 (t , t ) = (10 , 10 ). In general, higher values of recovery tend to be concentrated attack test where the discrepancy between t and t is maximum, but this trend seems to saturate test attack as this discrepancy decreases. (a) MNIST Successful Attack Successful Attack (Target Changed) Recovered 51.0% 51.3% 49.3% 78.9% 10 100.0% 47.3% 43.5% 44.1% 20.6% 53.6% 54.5% 54.4% 79.2% 100.0% 43.7% 40.1% 39.4% 20.6% 40.2% 41.0% 63.6% 62.0% 10 100.0% 54.0% 54.8% 30.8% 31.2% 22.6% 22.8% 25.4% 60.9% 10 100.0% 73.2% 72.9% 71.2% 36.5% 17.8% 17.4% 21.5% 37.4% 10 100.0% 77.9% 78.3% 75.0% 60.5% 4 3 2 1 0 10 10 10 10 10 test (b) CIFAR-10 Successful Attack Successful Attack (Target Changed) Recovered 31.5% 34.4% 49.8% 92.1% 10 100.0% 62.4% 64.4% 49.4% 30.3% 33.6% 50.6% 76.9% 10 100.0% 66.2% 63.1% 48.7% 23.1% 35.9% 33.7% 52.0% 52.3% 100.0% 61.0% 62.2% 47.6% 47.3% 42.1% 41.8% 44.9% 45.4% 10 100.0% 55.6% 55.9% 52.8% 52.1% 37.0% 37.2% 38.7% 43.6% 10 100.0% 59.0% 58.7% 57.5% 54.1% 4 3 2 1 0 10 10 10 10 10 test Figure 5. Attack success rate (in red) and recovery rate (in green) when varying t (on y-axis) attack and t (on x-axis). In yellow, the percentage of classiﬁcations that changed with respect to the one test induced by the attack but are still adversarial. attack attack Information 2022, 13, 555 10 of 12 6. Experimental Details This section reports the implementation details of the experiments described in the previous sections of the paper. 6.1. Datasets: MNIST and CIFAR-10 All the models used in this analysis were trained on two standard and well-known image classiﬁcation benchmarks: MNIST [25] and CIFAR-10 [26]. MNIST is composed of 60,000 grayscale images subdivided into training (50,000) and testing (10,000) sets; images are 28 28 pixels and represent hand-written digits (from 0 to 9, so it consists of 10 classes). MNIST is substantially the de fact standard baseline for novel machine learning algorithms and is nearly the only dataset used in research concerning ODE networks. The second dataset taken into account in our analysis was CIFAR-10; it is a 10-class image classiﬁcation dataset too, comprised of 60,000 RGB images (size 32 32 pixels) of common objects subdivided into training/testing sets (50,000/10,000). 6.2. The Training Phase Both considered models, RES and ODE, apply dropout before the fully-connected classiﬁer with a drop probability of 0.5, and the SGD optimizer has a momentum of 0.9; 4 1 the weight decay is 10 , batch size is 128, and the learning rate is 10 reduced by a factor 10 every time the error plateaus. The number of ﬁlters K in the internal blocks is differently set for each dataset: 64 for MNIST and 256 for CIFAR-10. For the ODE net model, containing the ODE block, we used the Dormand–Prince variant of the ﬁfth- order Runge–Kutta ODE solver (implemented in https://github.com/rtqichen/torchdiffeq, accessed on 1 August 2022); in such an algorithm, the step size is adaptive and can be controlled by a tolerance parameter t (t = 10 was used in our experiments during train the training phase). The value of t constitutes a threshold for the maximum absolute and relative error (estimated using the difference between the fourth-order and the ﬁfth-order solution) tolerated when performing a step of integration; if such a step error exceeds t, the integration step is discarded, and the step size decreased. Both models, RES and ODE, the achieved classiﬁcation performances are comparable with the current state-of-the-art performances on MNIST and CIFAR-10 datasets (see Table 1). 6.3. Carlini and Wagner Attack Implementation Details The CW attack was implemented by resorting to Foolbox 2.0 [27] on PyTorch models. We adopted Adam to optimize Equation (4), setting the maximum iterations to 100 and performing 5 binary search steps to tune c starting from 10 . The learning rate of 0.05 was used for MNIST and 0.01 for CIFAR-10. The ﬁrst 5000 images of each test set were selected as original samples to be perturbed, discarding the images naturally misclassiﬁed by the classiﬁer. 7. Conclusions and Future Works In this paper, we have presented an analysis of the robustness of neural ODE image classiﬁers in an uncontrolled environment, and the behavior of N-ODE nets against the Carlini and Wagner (CW) attack was speciﬁcally studied. The CW attack was considered, as it is one of the most performing adversarial attacks for the image classiﬁcation task. Furthermore, we have focused on how the tolerance parameter of the adaptive ODE solver, which is generally used in neural ODE networks to tune the computational precision- efﬁciency trade-off, can affect the robustness against such attacks. We have observed that modifying the tolerance used during the prediction phase from that used when generating adversarial inputs tends to undermine attacks while maintaining high accuracy on pristine samples. According to this, we have proposed using the tolerance as a defensive property of neural ODE nets and demonstrated that it is possible by introducing a novel adversarial detection strategy for ODE nets based on tolerance randomization and a major voting ensemble scheme. Information 2022, 13, 555 11 of 12 Our evaluation performed on two standard image classiﬁcation benchmarks (MNIST and CIFAR-10) has shown that our simple detection technique can reject roughly 80% of strong CW adversarial examples while maintaining +90% of original samples under white-box attacks and zero-knowledge adversaries. We have also hypothesized that to overcome our method, the adversary should require high attack budgets to attack a wide range of tolerance values and distill them in a unique malicious input. We have also explored the defensive properties of tolerance variation in the scenario with adaptive adversaries and shown that the simple decoupling of attack and test toler- ances, without any additional defensive procedures, increases adversarial robustness up to roughly 78% and 66% for MNIST and CIFAR-10 datasets, respectively. Future works will be dedicated to gathering deeper insights into the relationship between attacker and defender tolerance settings by exploring the tolerance space on a ﬁner scale. In addition, we would be interested to investigate the dynamic scenario in which both the attacker and the defender try to adapt each other and analyze it in a game-theoretic framework. Author Contributions: Conceptualization, F.C., R.C., F.F. and G.A.; methodology, R.C., F.F. and G.A.; software, F.C.; investigation, F.C. and R.C.; writing—original draft preparation, F.C., R.C. and F.F.; writing—review and editing, F.C., R.C., F.F. and G.A.; supervision, F.F. and G.A. All authors have read and agreed to the published version of the manuscript. Funding: This work was partially funded by Tuscany POR FSE 2014-2020 AI-MAP (CNR4C program, CUP B15J19001040004), the AI4EU project (EC, H2020, n. 825619) and the AI4Media Project (EC, H2020, n. 951911). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: Code and resources to reproduce the experiments presented here are available at https://github.com/fabiocarrara/neural-ode-features/tree/master/adversarial (accessed on 1 August 2022). Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. 2. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. 3. Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. 4. Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. 5. Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial Examples in the Physical World; Chapman and Hall: London, UK, 2017. 6. Papernot, N.; McDaniel, P.D.; Wu, X.; Jha, S.; Swami, A. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [CrossRef] 7. Wu, J.; Xia, Z.; Feng, X. Improving Adversarial Robustness of CNNs via Maximum Margin. Appl. Sci. 2022, 12, 7927. [CrossRef] 8. Carrara, F.; Caldelli, R.; Falchi, F.; Amato, G. On the robustness to adversarial examples of neural ODE image classiﬁers. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Delft, The Netherlands, 9–12 December 2019; pp. 1–6. 9. Carrara, F.; Caldelli, R.; Falchi, F.; Amato, G. Defending Neural ODE Image Classiﬁers from Adversarial Attacks with Tolerance Randomization. In Proceedings of the Pattern Recognition—ICPR International Workshops and Challenges, Virtual Event, 15–20 January 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 425–438. 10. Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. 11. Kurakin, A.; Goodfellow, I.J.; Bengio, S.; Dong, Y.; Liao, F.; Liang, M.; Pang, T.; Zhu, J.; Hu, X.; Xie, C.; et al. Adversarial Attacks and Defences Competition. In The NIPS ’17 Competition: Building Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2018. Information 2022, 13, 555 12 of 12 12. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2018, arXiv:1706.06083. 13. Grosse, K.; Manoharan, P.; Papernot, N.; Backes, M.; McDaniel, P.D. On the (Statistical) Detection of Adversarial Examples. arXiv 2017, arXiv:1702.06280. 14. Metzen, J.H.; Genewein, T.; Fischer, V.; Bischoff, B. On Detecting Adversarial Perturbations. arXiv 2017, arXiv:1702.04267. 15. Carrara, F.; Becarelli, R.; Caldelli, R.; Falchi, F.; Amato, G. Adversarial examples detection in features distance spaces. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. 16. Taran, O.; Rezaeifar, S.; Holotyak, T.; Voloshynovskiy, S. Defending against adversarial attacks by randomized diversiﬁcation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11226–11233. 17. Barni, M.; Nowroozi, E.; Tondi, B.; Zhang, B. Effectiveness of random deep feature selection for securing image manipulation detectors against adversarial examples. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2977–2981. 18. Feinman, R.; Curtin, R.R.; Shintre, S.; Gardner, A.B. Detecting Adversarial Samples from Artifacts. arXiv 2017, arXiv:1703.00410. 19. Carlini, N.; Wagner, D. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. In Proceedings of the Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, Dallas, TX, USA, 3 November 2017; ACM: New York, NY, USA, 2017; pp. 3–14. [CrossRef] 20. Hanshu, Y.; Jiawei, D.; Vincent, T.; Jiashi, F. On robustness of neural ordinary differential equations. arXiv 2019, arXiv:1910.05513. 21. Liu, X.; Xiao, T.; Si, S.; Cao, Q.; Kumar, S.; Hsieh, C.J. Stabilizing Neural ODE Networks with Stochasticity. 2019. Available online: https://openreview.net/forum?id=Skx2iCNFwB (accessed on 1 August 2022). 22. Chen, T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2018; pp. 6572–6583. 23. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. 24. Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 3–19. 25. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [CrossRef] 26. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. 27. Rauber, J.; Brendel, W.; Bethge, M. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. arXiv 2017, arXiv:1707.04131. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information Multidisciplinary Digital Publishing Institute http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/improving-the-adversarial-robustness-of-neural-ode-image-classifiers-wMu4wRPme6

Loading next page...

References (27)

M. Barni, Ehsan Nowroozi, B. Tondi, Bowen Zhang (2019)
Effectiveness of Random Deep Feature Selection for Securing Image Manipulation Detectors Against Adversarial Examples
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Nicolas Papernot, P. Mcdaniel, S. Jha, Matt Fredrikson, Z. Celik, A. Swami (2015)
The Limitations of Deep Learning in Adversarial Settings
2016 IEEE European Symposium on Security and Privacy (EuroS&P)
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, P. Frossard (2015)
DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Yann LeCun, L. Bottou, Yoshua Bengio, P. Haffner (1998)
Gradient-based learning applied to document recognition
Proc. IEEE, 86
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, I. Goodfellow, R. Fergus (2013)
Intriguing properties of neural networks
CoRR, abs/1312.6199
A. Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu (2017)
Towards Deep Learning Models Resistant to Adversarial Attacks
ArXiv, abs/1706.06083
T. Chen, Yulia Rubanova, J. Bettencourt, D. Duvenaud (2018)
Neural Ordinary Differential Equations
J. Metzen, Tim Genewein, Volker Fischer, Bastian Bischoff (2017)
On Detecting Adversarial Perturbations
ArXiv, abs/1702.04267
Nicholas Carlini, D. Wagner (2017)
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
F. Carrara, R. Caldelli, F. Falchi, G. Amato (2020)
Defending Neural ODE Image Classifiers from Adversarial Attacks with Tolerance Randomization
O. Taran, Shideh Rezaeifar, T. Holotyak, S. Voloshynovskiy (2019)
Defending Against Adversarial Attacks by Randomized Diversification
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Jiaping Wu, Zhaoqiang Xia, Xiaoyi Feng (2022)
Improving Adversarial Robustness of CNNs via Maximum Margin
Applied Sciences
I. Goodfellow, Jonathon Shlens, Christian Szegedy (2014)
Explaining and Harnessing Adversarial Examples
CoRR, abs/1412.6572
Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, M. Backes, P. Mcdaniel (2017)
On the (Statistical) Detection of Adversarial Examples
ArXiv, abs/1702.06280
Alexey Kurakin, I. Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, A. Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe (2018)
Adversarial Attacks and Defences Competition
ArXiv, abs/1804.00097
F. Carrara, Rudy Becarelli, R. Caldelli, F. Falchi, Giuseppe Amato (2018)
Adversarial Examples Detection in Features Distance Spaces
Yuxin Wu, Kaiming He (2018)
Group Normalization
International Journal of Computer Vision, 128
Nicholas Carlini, D. Wagner (2016)
Towards Evaluating the Robustness of Neural Networks
2017 IEEE Symposium on Security and Privacy (SP)
Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh (2019)
Stabilizing Neural ODE Networks with Stochasticity
Alexey Kurakin, I. Goodfellow, Samy Bengio (2016)
Adversarial examples in the physical world
ArXiv, abs/1607.02533
Reuben Feinman, Ryan Curtin, S. Shintre, Andrew Gardner (2017)
Detecting Adversarial Samples from Artifacts
ArXiv, abs/1703.00410
Jonas Rauber, Wieland Brendel, M. Bethge (2017)
Foolbox: A Python toolbox to benchmark the robustness of machine learning models
Hanshu Yan, Jiawei Du, V. Tan, Jiashi Feng (2019)
On Robustness of Neural Ordinary Differential Equations
ArXiv, abs/1910.05513
Nicolas Papernot, P. Mcdaniel, Xi Wu, S. Jha, A. Swami (2015)
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
2016 IEEE Symposium on Security and Privacy (SP)
Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun (2016)
Identity Mappings in Deep Residual Networks
A. Krizhevsky (2009)
Learning Multiple Layers of Features from Tiny Images
F. Carrara, R. Caldelli, F. Falchi, G. Amato (2019)
On the Robustness to Adversarial Examples of Neural ODE Image Classifiers
2019 IEEE International Workshop on Information Forensics and Security (WIFS)

Publisher: Multidisciplinary Digital Publishing Institute
Copyright: © 1996-2022 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. Terms and Conditions Privacy Policy
ISSN: 2078-2489
DOI: 10.3390/info13120555
Publisher site: See Article on Publisher Site

Abstract

information Article Improving the Adversarial Robustness of Neural ODE Image Classiﬁers by Tuning the Tolerance Parameter 1 2,3, 1 1 Fabio Carrara , Roberto Caldelli * , Fabrizio Falchi and Giuseppe Amato Istituto di Scienza e Tecnologie dell’Informazione, 56124 Pisa, Italy Media Integration and Communication Center, National Inter-University Consortium for Telecommunications (CNIT), 50134 Florence, Italy Faculty of Economics, Mercatorum University, 00186 Rome, Italy * Correspondence: [email protected] Abstract: The adoption of deep learning-based solutions practically pervades all the diverse areas of our everyday life, showing improved performances with respect to other classical systems. Since many applications deal with sensible data and procedures, a strong demand to know the actual reliability of such technologies is always present. This work analyzes the robustness characteristics of a speciﬁc kind of deep neural network, the neural ordinary differential equations (N-ODE) network. They seem very interesting for their effectiveness and a peculiar property based on a test-time tunable parameter that permits obtaining a trade-off between accuracy and efﬁciency. In addition, adjusting such a tolerance parameter grants robustness against adversarial attacks. Notably, it is worth highlighting how decoupling the values of such a tolerance between training and test time can strongly reduce the attack success rate. On this basis, we show how such tolerance can be adopted, during the prediction phase, to improve the robustness of N-ODE to adversarial attacks. In particular, we demonstrate how we can exploit this property to construct an effective detection Citation: Carrara, F.; Caldelli, R.; strategy and increase the chances of identifying adversarial examples in a non-zero knowledge attack Falchi, F.; Amato, G. Improving the scenario. Our experimental evaluation involved two standard image classiﬁcation benchmarks. This Adversarial Robustness of Neural showed that the proposed detection technique provides high rejection of adversarial examples while ODE Image Classiﬁers by Tuning the maintaining most of the pristine samples. Tolerance Parameter. Information 2022, 13, 555. https://doi.org/ Keywords: neural ordinary differential equation; adversarial defense; image classiﬁcation 10.3390/info13120555 Academic Editors: Willy Susilo, Jun Hu, Antonio Jiménez-Martín and Zahir M. Hussain 1. Introduction Received: 5 August 2022 Deep learning models have had huge success, mainly thanks to their undeniable Accepted: 22 November 2022 performance with respect to many complex tasks, e.g., visual perception, natural language Published: 26 November 2022 processing, self-driving cars, and multimedia analysis. Notwithstanding this, various ﬂaws and drawbacks still need to be tackled. Indeed, when neural networks are called to Publisher’s Note: MDPI stays neutral work in an unfair environment, as can happen in multimedia security applications, they with regard to jurisdictional claims in have demonstrated crucial vulnerabilities that a malevolent user could exploit through published maps and institutional afﬁl- the design of ad hoc adversarial manipulations in order to induce the model into a wrong iations. evaluation. Such an incorrect decision might be crucial for the consequent action to be taken. In the context of image classiﬁcation, also focused on in this work, an adversary can control and mislead a deep neural network classiﬁer by introducing a limited malicious Copyright: © 2022 by the authors. perturbation into the input image [1]. Licensee MDPI, Basel, Switzerland. The aforementioned phenomenon has been vastly analyzed with several neural net- This article is an open access article work architectures in multiple tasks. Attacking a deep model seems relatively easy due to distributed under the terms and its differentiability and complexity (many successful adversarial generation approaches ex- conditions of the Creative Commons ist [2–4]), but counteracting and defending from attacks is still an open problem. However, Attribution (CC BY) license (https:// multiple approaches aiming at strengthening the attacked model [5–7] achieve robustness creativecommons.org/licenses/by/ to weak adversaries, but stronger attacks usually can mislead also enhanced ones: ad- 4.0/). Information 2022, 13, 555. https://doi.org/10.3390/info13120555 https://www.mdpi.com/journal/information Information 2022, 13, 555 2 of 12 versarial examples appear to be an intrinsic shortcoming affecting every common deep learning architecture. This work focuses on the phenomenon of adversarial examples against neural or- dinary differential Equation (N-ODE) networks, which represent a recent deep learning model that generalizes deep residual networks through the solution of parametric ODEs. Among its peculiar properties, we are interested in the ability to tune at test time the precision–efﬁciency trade-off of the network by changing the tolerance of the adaptive ODE solver with respect to that used in the forward computation. Thanks to this property, neural ODE nets exhibited increased robustness to projected gradient descent (PGD) attacks with respect to standard architectures such as ResNets, as evidenced in [8]; higher tolerance values provided increased robustness at a negligible expense to the accuracy of the model. In reference [9], we further investigated how these phenomena occur under a stronger at- tacks, such as the Carlini and Wagner attack. In particular, we tested its performance when the values of the solver tolerance used for the adversarial generation and for the prediction phase are decoupled. Starting from this, ODE solver ’s tolerance has been introduced as a defensive property of neural ODEs against adversarial attacks, and adversarial detection approaches can be designed accordingly. Test-time tolerance randomization is presented as a possible defense approach in image classiﬁcation benchmarks under the assumption of a zero-knowledge adversary—i.e., the attacker can access the model but does not know about the defense strategy. Moreover, for the sake of completeness, we have also investigated the more general and challenging case of an adaptive attack scenario where the attacker knows that there is a defense procedure based on the ODE solver tolerance, and both the attacker and the defender can play with it. We have speciﬁcally analyzed how the attack success rate can vary in different circumstances. The contributions of the present work are the following: • We provide a complete study on neural ODE image classiﬁers and on how their robustness can vary by playing with the ODE solver tolerance against adversarial attacks such as the Carlini and Wagner one; • We demonstrate the defensive properties offered by ODE nets in a zero-knowledge adversarial scenario; • We analyze how the robustness offered by Neural ODE nets varies in the more strin- gent scenario of an active attacker that changes the attack-time solver tolerance. The rest of the paper is organized as follows: Section 2 introduces related work, and Section 3 brieﬂy recalls background knowledge on neural ODE nets and the Carlini and Wagner adversarial attack. In Section 4, the robustness to adversarial samples of neural ODEs in relation to the ODE solver tolerance is debated, and in Section 4.1, we introduce an adversarial detection scheme harnessing this property. Section 5 is dedicated to a novel analysis that takes into account an adaptive attacker and studies the effect of the solver tolerance on the attacker side. In Section 6, we report the implementation details of our experimental evaluation (code and resources to reproduce the experiments presented here are available at https://github.com/fabiocarrara/neural-ode-features/tree/master/ adversarial, accessed on 1 August 2022). Section 7 draws some conclusions and lays out future research directions. 2. Related Work In the scientiﬁc literature, the vulnerability to adversarial examples is studied diffusely. The majority of the analyzed deep models focus on deep convolutional network image classiﬁers [1,10,11] under a variety of attacks, such as PGD [12] or the stronger CW [4]. Defensive methodologies against adversarial samples have been devised speciﬁcally for attacks, such as model enhancement via distillation [6] and adversarial sample detection via statistical methods [13] or auxiliary models [14,15]. Among them, the most promising methods are based on the introduction of randomization in the prediction process [16,17]. Feinman et al. [18] proposed a detection scheme based on randomizing the output of the network using dropout. This approach relates to the rationale of our proposed detection Information 2022, 13, 555 3 of 12 method, as both resort to the stochasticity of the output [19]. Not so many works deal with analyzing and defending neural ODE architectures. Our previous works include Carrara et al. [8,9], which analyzes ODE nets under PGD and CW attacks and ﬁnds their superior robustness with respect to standard architectures. Such intrinsic resilience is also evidenced in Yan et al. [20], which presents an extended empirical study on this phenomenon and proposes a regularization based on the time-invariance property of steady states of ODE solutions in order to improve robustness. Finally, relevant to our proposed method is also the work of Liu et al. [21] that exploits stochasticity by injecting noise in the ODE to increase robustness to perturbations of initial conditions, including adversarial ones. 3. Background This section is dedicated to introducing the neural ordinary differential equation (N-ODE) networks and the Carlini and Wagner adversarial attack used in the present work. 3.1. The Neural ODE Networks Hereafter, a basic description of neural ODE (ordinary differential equations) is pro- vided; a more detailed discussion can be found in [22]. A neural ODE network is a parametric model which includes an ODE block. The computation of such a block is deﬁned by a parametric ordinary differential equation (ODE) whose solution gives the output result. The input of the ODE block is indicated with h , and it coincides with the initial state ODE at time t , as in Equation (1): 0 0 dh(t) = f (h(t), t, q) dt . (1) h(t ) = h 0 0 The function f (), which depends on the parameter q, deﬁnes the continuous dynamic of the state h(t). The output of the block h(t ) at a time t > t is obtained by integrating 1 1 0 the ODE (see Equation (2)). Z Z t t 1 1 dh(t) h(t ) = h(t ) + dt = h(t ) + f (h(t), t, q)dt . (2) 1 0 0 dt t t 0 0 The above integral can be computed with standard ODE solvers, such as Runge– Kutta or multi-step methods. Thus, the computation performed by the ODE block can be formalized as a call to a generic ODE solver: h(t ) = ODESolver( f , h(t ), t , t , q) . (3) 1 0 0 1 Generally, in image classiﬁcation applications, the function f () is implemented by means of a small, trainable convolutional neural network. During training, the gradients of the output h(t ) with respect to the input h(t ) and the parameter q can be obtained using 1 0 the adjoint sensitivity method. One of the more interesting properties shown by ODE networks and determined by their intrinsic structure is deﬁnitely the accuracy–efﬁciency trade-off, which is tunable at inference time by controlling the tolerance parameter t of adaptive ODE solvers. The ODE-Net image classiﬁer we consider in this work (see Figure 1 bottom part, ODE) is constituted by an ODE block (based on Equation (3)) responsible for the whole feature extraction chain. Before this block, a pre-processing stage comprised of a single K-ﬁlter 4 4 convolutional layer is inserted; it linearly maps the input image in a proper state space. The f () function in the ODE block is implemented as a standard residual block used in ResNets (described below). After the ODE block, the classiﬁcation step is implemented with a global average-pooling operation followed by a single fully-connected layer with softmax activation. Information 2022, 13, 555 4 of 12 Residual Network (RES) x6 Classifier … Avg.Pool + 10-d FC + Softmax Neural ODE Network (ODE) Classifier ODE Block Avg.Pool + 3x3, K 10-d FC + 3x3, K Softmax Figure 1. Convolutional layers are written in the format kernel width kernel height [/ stride], n. ﬁlters; padding is always set to 1. For MNIST, K = 64, and for CIFAR-10, K = 256. In addition to this, we consider also a standard ResNet (Figure 1 top part, RES) as baseline [22] for comparison with the ODE-Nets. It is composed of a 64-ﬁlter 3 3 convo- lutional layer and 8 residual blocks. Each residual block follows the standard formulation deﬁned in [23], where group normalization [24] is used instead of batch one. The sequence of layers comprising a residual block is GN-ReLU-Conv-GN-ReLU-Conv-GN, where GN stands for group normalization with 32 groups, and Conv is a 3 3 convolutional layer. The ﬁrst two blocks downsample their input by a factor of 2 using a stride of 2, and the subsequent blocks maintain the input dimensionality. Only the ﬁrst block uses 64-ﬁlters convolutions, and the subsequent ones employ K-ﬁlter convolutions, where K varies with the speciﬁc dataset. The ﬁnal classiﬁcation step is the same as before. 3.2. The Carlini and Wagner Attack This section brieﬂy introduces the Carlini and Wagner (CW) attack [4] that has been used in our work to test and evaluate the robustness of the ODE-Net to adversarial samples. The CW attack is currently considered one of the strongest available adversary techniques with which to attack neural networks designed for the image classiﬁcation task. Among the three existing versions (different metrics used to measure the perturbation), we have considered the CW-L , which is formalized as in Equation (4): adv adv min c g x + x x (4) with adv adv adv g(x ) = max max Z(x ) Z(x ) ,k (5) i t i6=t tanh(w) + 1 adv x = , (6) adv where g() is the objective function (misclassiﬁcation), x is the adversarial example in the pixel space, and w is its counterpart in the tanh space in which the optimization is carried out. Z() are the logits of a given input, t is the target class, k is a parameter that allows adjusting the conﬁdence with which the misclassiﬁcation occurs, and c is a positive constant whose value is set by exploiting a binary search procedure. The rationale behind the attack is to minimize at each iteration the highest conﬁdence among non-target classes (ﬁrst term of Equation (4)) while retaining the smallest possible perturbation (second term). It is worth mentioning the use of the term tanh(w) that represents a change in variable that allows one to move from the pixel to the tanh space. This helps regularize the image image 4x4 / 2, K 3x3, 64 3x3 / 2, 64 3x3, 64 3x3 / 2, K 3x3, K 3x3, K 3x3, K 3x3, K 3x3, K class class Information 2022, 13, 555 5 of 12 gradient in extremal regions of the perturbation space, thereby facilitating optimization with gradient-based optimizers. 4. Robustness via Tolerance Variation Though ODE-Nets are very promising and perform well, they are vulnerable to the same attacks as the standard networks. However, one of their properties, namely, the ability to change the ODE solver tolerance t at prediction time, is demonstrated to provide some degree of robustness against basic adversarial attacks [8]. Changing the tolerance value of an adaptive ODE solver causes the solver to adopt different step sizes during the computation of the ODE solution, and this leads to a perturbation of the forward pass that increases the adversarial robustness. Such property is observed even under the CW attack, which represents one of the strongest adversarial algorithms to fool neural networks in image classiﬁcation speciﬁcally. To prove this, a neural ODE model trained on the MNIST and CIFAR-10 datasets, two well-known 10-class image classiﬁcation benchmarks, has been considered. We used the train split (50k images) of each dataset to train the model and half of the test split (5k images) to generate adversarial samples with the CW attack (examples are reported in Figure 2). The training procedure was performed once with a ﬁxed tolerance value, and we considered multiple values of the tolerance when doing inference and generating adversarial samples. MNIST CIFAR-10 Figure 2. Adversarial examples found with the Carlini and Wagner attack on our neural ODE network on MNIST and CIFAR-10 datasets. Adversarial perturbations (Diff.) of CIFAR-10 samples have been ampliﬁed by a factor 10 for visualization purposes. In Table 1, we report the classiﬁcation error on the test subset, the attack success rate (in percentage), and the mean L norm of the adversarial perturbation for each tested tolerance value on the two datasets; just for comparison, the results obtained by a standard residual network classiﬁer were inserted. Details on the datasets, models, and adversary generation are available in Section 6. Note that the basic behavior of both the standard residual (RES) and ODE-Net (ODE) models is similar: they show a limited error rate on original images, but on the contrary, the CW attack achieves a very high attack success rate. However, in ODE-Nets, the value of the ODE solver tolerance t plays an essential role in determining the success rate of an attack; when we increase the value of the tolerance t Orig. Orig. Diff. Adv. Diff. Adv. Information 2022, 13, 555 6 of 12 used at test time and by the attacker (t = t ), the classiﬁcation error rate is rather test attack stable, but the required attack budget increases. This is quite clear for the MNIST dataset, where the attack success rate quickly decreases, but it is also appreciable for CIFAR-10 when looking at the mean perturbation introduced by the attack. Though the attack success rate continues to be 100%, an increasing cost is paid in terms of applied distortion. While this witnesses again to the strength of the CW attack, on the other hand, it conﬁrms that the sensibility to the tolerance variations, found in the case of the projected gradient descent (PGD) attack [8], is also shown by the CW attack, suggesting this being a more general defensive property of ODE-Nets. Table 1. Classiﬁcation error (Err, %), Carlini and Wagner attack success rate (ASR, %), and mean L norm perturbation (Pert) of RES and ODE on MNIST and CIFAR-10 test sets; obviously only for ODE are quantities varying the test-time adaptive solver tolerance t (t = t ) listed. attack test MNIST CIFAR-10 2 5 Err (%) ASR (%) Pert (10 ) Err (%) ASR (%) Pert (10 ) RES 0.4 99.7 1.1 7.3 100 2.6 0.5 99.7 1.4 9.1 100 2.2 ODE t = 10 ODE t = 10 0.5 90.7 1.7 9.2 100 2.4 ODE t = 10 0.6 74.4 1.9 9.3 100 4.1 0.8 71.6 1.7 10.6 100 8.0 ODE t = 10 ODE t = 10 1.2 69.7 1.9 11.3 100 13.7 Intuition suggests that introducing a decoupling (t 6= t ) between attacker and attack test defender should increase robustness. To verify such hypothesis, we generated adversarial samples by setting a ﬁxed tolerance t and measuring the model’s accuracy when attack varying the test-time tolerance t . By considering a zero-knowledge scenario, the value test of attack tolerance t = t was taken from the best choice the attacker can make. train attack At prediction time, the tolerance was drawn from a log-uniform distribution with the 5 1 3 interval [10 , 10 ] centered in t = 10 ; 20 values were sampled for each image to train be classiﬁed. Figure 3 reports the accuracy of the ODE-Net classiﬁer on original inputs (blue lines) and adversarial examples (orange lines) for MNIST and CIFAR-10 datasets, respectively; the tolerance on the x-axis is binned (21 bins) in the log space. It is evident that accuracy on natural inputs (blue lines) is always stable and very high for each tolerance value, averagely around the original network accuracy (100% for MNIST and 90% for CIFAR-10); this means that varying the tolerance does not signiﬁcantly affect accuracy on standard pristine samples. On the contrary, accuracy on CW-created adversarial inputs (orange lines) is quite poor (this demonstrates the power of such a technique again), but it is very interesting to note that in the central bin (around t = 10 ) the attack has train the highest effectiveness; this seems to mean that when the tolerance at test time coincides with that adopted by the CW attacker, the classiﬁer is strongly induced to misclassify. Furthermore, it can also be appreciated that if t is moved away from the central value test used by the CW attacker, accuracy increases. This means that changes in the tolerance can provide robustness against CW attack, achieving, for instance, accuracy on adversarial inputs of about 60% (and corresponding accuracy of around 90% on original images) for the CIFAR-10 dataset (see Figure 3 on the extreme right). Finally, it is worth observing that the trend of growth of the orange lines is asymmetric with respect to the central value t = 10 , and higher values were achieved for the right side: this shows, as generally train expected, that increasing the tolerance permits one to gain in robustness. Information 2022, 13, 555 7 of 12 100% 80% Sample Type Natural 60% Adversarial Dataset 40% MNIST CIFAR-10 20% 0% − 5.0 − 4.5 − 4.0 − 3.5 − 3.0 − 2.5 − 2.0 − 1.5 − 1.0 log (τ ) test Figure 3. Accuracy vs. test-time solver tolerance t . For each image, we sampled 20 values for t test 5 1 from a log-uniform distribution within the [10 , 10 ] interval. We report the mean accuracy of the ODENet classiﬁer on natural and adversarial examples for each tolerance bin (in log space, points’ x-coordinates indicate the bin centers). 4.1. Defensive Tolerance Randomization In light of these ﬁndings, we exploited tolerance variation as an active measure against adversarial attacks and measured to which extent this defensive property can be effective in an adversarial detection scenario where the defender is asked to discern adversarial samples from authentic ones. We considered a white-box attack scenario in which the attacker has access to the trained ODE-Net and knows the parameter settings of the classiﬁer—speciﬁcally, the solver tolerance t used during the network training. An attack is successful if the train CW algorithm ﬁnds an adversarial perturbation leading to a misclassiﬁcation without exceeding a preﬁxed attack budget deﬁned as the maximum number of optimization iterations. According to this, we have introduced an adversarial detection strategy based on ODE-Net, tolerance randomization, which collects several predictions with different randomly drawn test-time tolerance parameters t , to detect whether the classiﬁcation test system is subjected to an adversarial sample. t is sampled uniformly from a range test centered on t such that t = t 6= t . Introducing such a variability also train train attack test helps the defendant against knowledgeable adversaries, as simply changing t to a test different ﬁxed value can be easily counteracted by the adversary also changing t to attack the new value. By indicating with V the number of voting members (i.e., the number of t values randomly drawn) belonging to the ensemble, we will declare that an adversarial sample is detected if v < v , where v is the largest amount of members that agree min agree have reached the same decision on the test image (size of the majority), and v is the min minimum consensus threshold required for assessing the authenticity (non-maliciousness) of the input. The performance of this adversarial detection scheme is depicted in Figure 4. We can see that, once establishing the size V of the voting ensemble (different colored lines), by varying the threshold v with a step of 1, ROC curves can be obtained in terms of TPR min versus FPR, where true positive indicates the correct classiﬁcation of a natural input. Such graphs demonstrate that high TPRs can be registered in correspondence with limited FPRs. Accuracy Information 2022, 13, 555 8 of 12 This is particularly visible for the MNIST dataset (see Figure 4a), but it is still true for CIFAR- 10; if, just for example, we refer to Figure 4b when V = 20 (purple line), by increasing the value of v (going down along the curve), we can reduce the FPR while maintaining a min high TPR: with v =20 a TPR=92% and a corresponding FPR=15% are achieved (see the min bottom-left corner of Figure 4b). This experiment basically demonstrates that if the ODE- Net is subjected to a zero-knowledge Carlini and Wagner attack in a white-box scenario, by resorting to test-time tolerance randomization, it is possible both to preserve classiﬁcation performances on natural images and signiﬁcantly reduce the capacity of the CW attack to fool the ODE classiﬁer at the expense of performing multiple inferences. 100.0% 100% 99% 99.8% 98% 99.6% 97% 96% 99.4% 95% 99.2% 94% Ensemble Size V Ensemble Size V 93% 99.0% 92% 98.8% 91% 25% 50% 75% 100% 25% 50% 75% 100% FPR FPR (b) CIFAR-10 (a) MNIST Figure 4. Analysis of the detection performance with the randomized tolerance ensemble. ROC curves (TPR vs. FPR, where TP = “correctly detected natural input” and FP = “adversarial input misdetected as natural”) are obtained after varying the minimum majority size v , i.e., if the number min of majoritarian votes v in the ensemble is greater than v , the input is considered authentic agree min (positive), and otherwise, adversarial (negative). 5. Robustness under Adaptive Attackers To date, we have studied tolerance variation for defensive purposes under the as- sumption of a non-adaptive adversary. In this section, we extend the analysis described in Sections 4 and 4.1 by also exploring the effect of the attacker ’s tolerance when generating adversarial samples. As the defender, the attacker can vary the solver ’s tolerance to generate malicious samples. While we observed that the defender should set t to a value away from test the t , the attacker instead aims at setting t = t , where he is certain about test attack attack the success of the attack. We explored how the CW attack success rate varies over the (t , t ) 2 R space. Instead of a randomized exploration of this space, we performed test attack a logarithmic grid sampling of tolerance values by setting t = 10 , i 2 f4,3,2,1, 0g independently for t and t . As in previous sections, CW attacks were performed on test attack our trained neural ODE classiﬁers using the ﬁrst halves of the MNIST and CIFAR-10 test sets with the solver ’s tolerance set to t . We report results in Figure 5, where for each attack (t , t ) couple, we show the percentage of successful attacks (adversarial samples attack test increasing v min increasing v min TPR TPR Information 2022, 13, 555 9 of 12 that fooled the network as intended, in red), the percentage of failed attacks (adversarial samples that were either “recovered” as such or correctly classiﬁed by the network, in green), and the percentage of successful but changed attacks (adversarial samples that were still misclassiﬁed but were classiﬁed differently from what the attacker expected, in yellow). For this experiment, we ignored samples on which the attack failed to generate an adversarial perturbation in the ﬁrst place, i.e., we discarded samples that failed to generate an adversarial sample when t = t , thus having an attack success rate of 100% in test attack the diagonal entries. This permits focusing only on the effect of tolerance decoupling on the attack success rate while discarding the contributions to robustness already studied and presented in Table 1. It is quite evident that tolerance decoupling between attack and defense can be disruptive for an attacker. For instance, this led to an attack failure rate of 0 3 up to 78.3% for MNIST when (t , t ) = (10 , 10 ), and 66.2% for CIFAR-10 when test attack 3 0 (t , t ) = (10 , 10 ). In general, higher values of recovery tend to be concentrated attack test where the discrepancy between t and t is maximum, but this trend seems to saturate test attack as this discrepancy decreases. (a) MNIST Successful Attack Successful Attack (Target Changed) Recovered 51.0% 51.3% 49.3% 78.9% 10 100.0% 47.3% 43.5% 44.1% 20.6% 53.6% 54.5% 54.4% 79.2% 100.0% 43.7% 40.1% 39.4% 20.6% 40.2% 41.0% 63.6% 62.0% 10 100.0% 54.0% 54.8% 30.8% 31.2% 22.6% 22.8% 25.4% 60.9% 10 100.0% 73.2% 72.9% 71.2% 36.5% 17.8% 17.4% 21.5% 37.4% 10 100.0% 77.9% 78.3% 75.0% 60.5% 4 3 2 1 0 10 10 10 10 10 test (b) CIFAR-10 Successful Attack Successful Attack (Target Changed) Recovered 31.5% 34.4% 49.8% 92.1% 10 100.0% 62.4% 64.4% 49.4% 30.3% 33.6% 50.6% 76.9% 10 100.0% 66.2% 63.1% 48.7% 23.1% 35.9% 33.7% 52.0% 52.3% 100.0% 61.0% 62.2% 47.6% 47.3% 42.1% 41.8% 44.9% 45.4% 10 100.0% 55.6% 55.9% 52.8% 52.1% 37.0% 37.2% 38.7% 43.6% 10 100.0% 59.0% 58.7% 57.5% 54.1% 4 3 2 1 0 10 10 10 10 10 test Figure 5. Attack success rate (in red) and recovery rate (in green) when varying t (on y-axis) attack and t (on x-axis). In yellow, the percentage of classiﬁcations that changed with respect to the one test induced by the attack but are still adversarial. attack attack Information 2022, 13, 555 10 of 12 6. Experimental Details This section reports the implementation details of the experiments described in the previous sections of the paper. 6.1. Datasets: MNIST and CIFAR-10 All the models used in this analysis were trained on two standard and well-known image classiﬁcation benchmarks: MNIST [25] and CIFAR-10 [26]. MNIST is composed of 60,000 grayscale images subdivided into training (50,000) and testing (10,000) sets; images are 28 28 pixels and represent hand-written digits (from 0 to 9, so it consists of 10 classes). MNIST is substantially the de fact standard baseline for novel machine learning algorithms and is nearly the only dataset used in research concerning ODE networks. The second dataset taken into account in our analysis was CIFAR-10; it is a 10-class image classiﬁcation dataset too, comprised of 60,000 RGB images (size 32 32 pixels) of common objects subdivided into training/testing sets (50,000/10,000). 6.2. The Training Phase Both considered models, RES and ODE, apply dropout before the fully-connected classiﬁer with a drop probability of 0.5, and the SGD optimizer has a momentum of 0.9; 4 1 the weight decay is 10 , batch size is 128, and the learning rate is 10 reduced by a factor 10 every time the error plateaus. The number of ﬁlters K in the internal blocks is differently set for each dataset: 64 for MNIST and 256 for CIFAR-10. For the ODE net model, containing the ODE block, we used the Dormand–Prince variant of the ﬁfth- order Runge–Kutta ODE solver (implemented in https://github.com/rtqichen/torchdiffeq, accessed on 1 August 2022); in such an algorithm, the step size is adaptive and can be controlled by a tolerance parameter t (t = 10 was used in our experiments during train the training phase). The value of t constitutes a threshold for the maximum absolute and relative error (estimated using the difference between the fourth-order and the ﬁfth-order solution) tolerated when performing a step of integration; if such a step error exceeds t, the integration step is discarded, and the step size decreased. Both models, RES and ODE, the achieved classiﬁcation performances are comparable with the current state-of-the-art performances on MNIST and CIFAR-10 datasets (see Table 1). 6.3. Carlini and Wagner Attack Implementation Details The CW attack was implemented by resorting to Foolbox 2.0 [27] on PyTorch models. We adopted Adam to optimize Equation (4), setting the maximum iterations to 100 and performing 5 binary search steps to tune c starting from 10 . The learning rate of 0.05 was used for MNIST and 0.01 for CIFAR-10. The ﬁrst 5000 images of each test set were selected as original samples to be perturbed, discarding the images naturally misclassiﬁed by the classiﬁer. 7. Conclusions and Future Works In this paper, we have presented an analysis of the robustness of neural ODE image classiﬁers in an uncontrolled environment, and the behavior of N-ODE nets against the Carlini and Wagner (CW) attack was speciﬁcally studied. The CW attack was considered, as it is one of the most performing adversarial attacks for the image classiﬁcation task. Furthermore, we have focused on how the tolerance parameter of the adaptive ODE solver, which is generally used in neural ODE networks to tune the computational precision- efﬁciency trade-off, can affect the robustness against such attacks. We have observed that modifying the tolerance used during the prediction phase from that used when generating adversarial inputs tends to undermine attacks while maintaining high accuracy on pristine samples. According to this, we have proposed using the tolerance as a defensive property of neural ODE nets and demonstrated that it is possible by introducing a novel adversarial detection strategy for ODE nets based on tolerance randomization and a major voting ensemble scheme. Information 2022, 13, 555 11 of 12 Our evaluation performed on two standard image classiﬁcation benchmarks (MNIST and CIFAR-10) has shown that our simple detection technique can reject roughly 80% of strong CW adversarial examples while maintaining +90% of original samples under white-box attacks and zero-knowledge adversaries. We have also hypothesized that to overcome our method, the adversary should require high attack budgets to attack a wide range of tolerance values and distill them in a unique malicious input. We have also explored the defensive properties of tolerance variation in the scenario with adaptive adversaries and shown that the simple decoupling of attack and test toler- ances, without any additional defensive procedures, increases adversarial robustness up to roughly 78% and 66% for MNIST and CIFAR-10 datasets, respectively. Future works will be dedicated to gathering deeper insights into the relationship between attacker and defender tolerance settings by exploring the tolerance space on a ﬁner scale. In addition, we would be interested to investigate the dynamic scenario in which both the attacker and the defender try to adapt each other and analyze it in a game-theoretic framework. Author Contributions: Conceptualization, F.C., R.C., F.F. and G.A.; methodology, R.C., F.F. and G.A.; software, F.C.; investigation, F.C. and R.C.; writing—original draft preparation, F.C., R.C. and F.F.; writing—review and editing, F.C., R.C., F.F. and G.A.; supervision, F.F. and G.A. All authors have read and agreed to the published version of the manuscript. Funding: This work was partially funded by Tuscany POR FSE 2014-2020 AI-MAP (CNR4C program, CUP B15J19001040004), the AI4EU project (EC, H2020, n. 825619) and the AI4Media Project (EC, H2020, n. 951911). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: Code and resources to reproduce the experiments presented here are available at https://github.com/fabiocarrara/neural-ode-features/tree/master/adversarial (accessed on 1 August 2022). Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. 2. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. 3. Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. 4. Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. 5. Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial Examples in the Physical World; Chapman and Hall: London, UK, 2017. 6. Papernot, N.; McDaniel, P.D.; Wu, X.; Jha, S.; Swami, A. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [CrossRef] 7. Wu, J.; Xia, Z.; Feng, X. Improving Adversarial Robustness of CNNs via Maximum Margin. Appl. Sci. 2022, 12, 7927. [CrossRef] 8. Carrara, F.; Caldelli, R.; Falchi, F.; Amato, G. On the robustness to adversarial examples of neural ODE image classiﬁers. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Delft, The Netherlands, 9–12 December 2019; pp. 1–6. 9. Carrara, F.; Caldelli, R.; Falchi, F.; Amato, G. Defending Neural ODE Image Classiﬁers from Adversarial Attacks with Tolerance Randomization. In Proceedings of the Pattern Recognition—ICPR International Workshops and Challenges, Virtual Event, 15–20 January 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 425–438. 10. Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. 11. Kurakin, A.; Goodfellow, I.J.; Bengio, S.; Dong, Y.; Liao, F.; Liang, M.; Pang, T.; Zhu, J.; Hu, X.; Xie, C.; et al. Adversarial Attacks and Defences Competition. In The NIPS ’17 Competition: Building Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2018. Information 2022, 13, 555 12 of 12 12. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2018, arXiv:1706.06083. 13. Grosse, K.; Manoharan, P.; Papernot, N.; Backes, M.; McDaniel, P.D. On the (Statistical) Detection of Adversarial Examples. arXiv 2017, arXiv:1702.06280. 14. Metzen, J.H.; Genewein, T.; Fischer, V.; Bischoff, B. On Detecting Adversarial Perturbations. arXiv 2017, arXiv:1702.04267. 15. Carrara, F.; Becarelli, R.; Caldelli, R.; Falchi, F.; Amato, G. Adversarial examples detection in features distance spaces. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. 16. Taran, O.; Rezaeifar, S.; Holotyak, T.; Voloshynovskiy, S. Defending against adversarial attacks by randomized diversiﬁcation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11226–11233. 17. Barni, M.; Nowroozi, E.; Tondi, B.; Zhang, B. Effectiveness of random deep feature selection for securing image manipulation detectors against adversarial examples. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2977–2981. 18. Feinman, R.; Curtin, R.R.; Shintre, S.; Gardner, A.B. Detecting Adversarial Samples from Artifacts. arXiv 2017, arXiv:1703.00410. 19. Carlini, N.; Wagner, D. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. In Proceedings of the Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, Dallas, TX, USA, 3 November 2017; ACM: New York, NY, USA, 2017; pp. 3–14. [CrossRef] 20. Hanshu, Y.; Jiawei, D.; Vincent, T.; Jiashi, F. On robustness of neural ordinary differential equations. arXiv 2019, arXiv:1910.05513. 21. Liu, X.; Xiao, T.; Si, S.; Cao, Q.; Kumar, S.; Hsieh, C.J. Stabilizing Neural ODE Networks with Stochasticity. 2019. Available online: https://openreview.net/forum?id=Skx2iCNFwB (accessed on 1 August 2022). 22. Chen, T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2018; pp. 6572–6583. 23. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. 24. Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 3–19. 25. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [CrossRef] 26. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. 27. Rauber, J.; Brendel, W.; Bethge, M. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. arXiv 2017, arXiv:1707.04131.

Journal

Information – Multidisciplinary Digital Publishing Institute

Published: Nov 26, 2022

Keywords: neural ordinary differential equation; adversarial defense; image classification

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Improving the Adversarial Robustness of Neural ODE Image Classifiers by Tuning the Tolerance Parameter

Improving the Adversarial Robustness of Neural ODE Image Classifiers by Tuning the Tolerance Parameter

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Improving the Adversarial Robustness of Neural ODE Image Classifiers by Tuning the Tolerance Parameter

Improving the Adversarial Robustness of Neural ODE Image Classifiers by Tuning the Tolerance Parameter

References (27)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies