Developing Substitute Models Sample Clauses
Developing Substitute Models. The aim here is to train substitute models that are an estimation of O (the oracle). It is well-established that a model can be evaded by utilizing different model architectures and a dataset independent of it [207]. As detailed in Section 4.4.5, we particularly favor an ensemble of substitute models as diverse as possible to increase the chances of finding highly-transferable adversarial examples that will transfer to the oracle. for each attacker, the procedure for constructing the substitute models differs due to their capabilities. The black-box attacker has no capabilities beyond observing the predictions for their queries. Therefore, under this scenario, we develop a synthetic dataset (∆) that is an estimation of the input-output relations of the oracle. To develop ∆, we use a set Btrain (containing benign and malware input samples) and query O with each input sample X ∈ Btrain, recording the predicted output. for example, if O(X) = 0, then the input-output relation X '→ 0 is stored in ∆. ∆ is then used to train the ensemble of diverse substitute models (Σ) for our attack strategy. This provides white-box access to models that are an estimation of the oracle. Prior work in other domains has suggested that substitute models require a large training dataset [157] and, while impractical, an attacker could create an exact replica of the oracle with close to infinite queries. However, we show later that a high evasion rate can be achieved with a small set of input samples (i.e., |Btrain| ≤ 100). With fewer queries to the oracle, an attacker can remain stealthier, with lower chances of their malicious behavior being detected [52, 89]. Note that for the other attack strategies that we test (the Single DNN [157] and Ensemble DNN [131] strategies), ∆ is used to train the substitute model(s) according to their procedures. In contrast to the black-box attacker scenario, the gray-box attacker has access to the training data of the defense. Therefore, we train an ensemble of diverse substitute models (Σ) using this training data. The advantage is that no queries to the oracle are necessary, reducing the time cost and the risk of adversarial behavior being detected. However, as we demonstrate later in Section 4.6, using the training data of the target models may not always lead to the best attack performance. Because they are based on direct queries to each oracle, the substitute models developed by the black-box attacker may reflect their characteristics and beha...
