Problem and challenges. The standard setting in ML considers centralized datasets which are tightly integrated into the system. However, in most real-world scenarios, data is usually distributed among multiple entities. More specifically, centralized data collection is challenging due to the higher communication cost for sending data, when the devices create large volumes of data, serious privacy issues coming with the sharing of sensitive data, overfitting issues with the small datasets and the biased local datasets. As a solution, federated training is proposed where each user and server collaborate to train a unified neural network model. This ML approach was formally published by Google in 2016 as Federated Learning (FL). Simply, FL is a distributed learning concept, where end devices or workers are participating for learning process. The central entity or parameter server shares the training model and aggregates the local model updates coming from workers. Workers train the shared model locally using their own data and send the trained model back to the central server. Central server aggregates the received models and shares the aggregated model with workers. The final model needs to be as good as the centralized solution (ideally), or at least better than what each party can learn on its own. Typically, FL brings the advantages in terms of improving privacy awareness, low communication overhead, and low latency. Most importantly, FL is suitable to address the distributed networking scenarios in the more complex networks. However, FL is vulnerable to poisoning attacks by design (Figure 21). The central server can be poisoned using minimum of one adversarial worker. This will affect the learning process of the entire network. The problem is that the central server cannot guarantee that the workers provide accurate local models and have no control over the level of security at each worker. Another issue is that it is possible to encounter a single point of failure at the central server. Therefore, it is necessary to implement defence mechanisms at the central server to distinguish between poisonous and honest users. It is challenging since the central server has no validation data for verification of the model updates received by the workers.
Appears in 1 contract
Sources: Grant Agreement
Problem and challenges. The standard setting in ML considers centralized datasets which are tightly integrated into the system. However, in most real-world scenarios, data is usually distributed among multiple entities. More specifically, centralized data collection is challenging due to the higher communication cost for sending data, when the devices create large volumes of data, serious privacy issues coming with the sharing of sensitive data, overfitting issues with the small datasets and the biased local datasets. As a solution, federated training is proposed where each user and server collaborate to train a unified neural network model. This ML approach was formally published by Google in 2016 as Federated Learning (FL). Simply, FL is a distributed learning concept, where end End devices or workers are participating for learning process. The central entity or parameter server shares the training model and aggregates the local model updates coming from workers. Workers train the shared model locally using their own data and send the trained model back to the central server. Central server aggregates the received models and shares the aggregated model with workers. The final model needs to be as good as the centralized solution (ideally), or at least better than what each party can learn on its own. Typically, FL brings the advantages in terms of improving privacy awareness, low communication overhead, and low latency. Most importantly, FL is suitable to address the distributed networking scenarios in the more complex networks. However, FL is vulnerable to poisoning attacks by design (Figure 21). The central server can be poisoned using minimum of one adversarial worker. This will affect the learning process of the entire network. The problem is that the central server cannot guarantee that the workers provide accurate local models and have no control over the level of security at each worker. Another issue is that it is possible to encounter a single point of failure at the central server. Therefore, it is necessary to implement defence mechanisms at the central server to distinguish between poisonous and honest users. It is challenging since the central server has no validation data for verification of the model updates received by the workers.
Appears in 1 contract
Sources: Grant Agreement