Article from Paul Wu - June 12th 2022
In today’s world, large models with billions of parameters trained on terabytes of datasets have become the norm as language models are the foundations of natural language processing (NLP) applications. Several of these language models used in commercial products are also being trained on private information. An example would be Gmail’s auto-complete model. Its model is trained on the private communication that occurs amongst users, which contains sensitive information such as users’ names, SSN, and credit card information.
Why should businesses be concerned? Well, with great power comes great risks. The lack of awareness of the privacy risks of language models and protection measures can result in data breaches and cause life-long damage to a company’s reputation.
Note to the reader: to read the full article go to: https://www.private-ai.com/2022/06/17/the-privacy-risk-of-language-models/
Defining data privacy attacks
When understanding language modeling risks, it is important to note the different types of attacks. Firstly, we have membership inferences. A membership inference is an attack where the adversary can predict whether or not an example was used to train the model. Whereas, a training data extraction attack is more dangerous since it aims to reconstruct the training data points from the model output and attackers can extract private information memorized by the model, by crafting attack queries.
Then there is the black-box attack. In a black-box attack, the attacker can only see the output of the model on arbitrary input, but can’t access the model parameters (see the figure below). This is the setting of machine learning as a service platform. An example would be querying the google translate engine with any English text they want and observing the French translation.
Lastly, there is the white-box attack. In a white-box attack, the attacker has access to the full model, including the model architecture and parameters that are needed to use the model for predictions. Thus, one can also observe the intermediate computations at hidden layers, as shown below.