New method effectively sensitive AI training data protection | Meat news

Data comes with privacy costs. There are security techniques that protect sensitive user data, such as customer addresses, from attackers, who can try to make them a RCT from AI models – but they often make those models less accurate.

MIT researchers have recently developed a structure based on a new privacy metric called PAC Privacy, which can maintain the operation of the AI ​​model while ensuring sensitive data such as medical images or financial records, while staying safe from the attackers. Now, they can be used using virtually any algorithm without the need for the internal operation of the algorithm by making their technology more calculated, improving the trade between accuracy and privacy, and creating a formal paralysis.

The team used their new version of PAC privacy to privatize many classic algorithms for data analysis and machine-learning tasks.

They also showed that more “stable” algorithms are easier to privatize with their method. The forecasts of a stable algorithm remain consistent even when its training data is slightly modified. The Greater Stability Algorithm helps to predict more accurate on the previous invisible data.

Researchers say that the new PAC can follow the enhanced efficiency of the privacy structure, and to implement a four-step sample, it will facilitate the technique to deploy in real-world situations.

“We consider strength and privacy as unrelated, or maybe even in contraindications, build a high performance algorithm. First, we create a working algorithm, then we strengthen it, and then make it private.

It is joined in paper by Hanshen Ziao PhD ’24, which will start as an assistant professor at the University of Purdue in the fall; And senior author Shri Devdas, Advin Sibli Webster Professor of Electrical Engineering at MIT. Research on IEEE Symposium on security and privacy will be presented.

Sound estimate

To protect the sensitive data used to train the AI ​​model, engineers often add noise or general randomness to the model so that the original training data is difficult for the opponent. This sound reduces the accuracy of a model, so the more the sound can add, the better it is.

PAC privacy automatically estimates the lowest amount of noise that is required to add an algorithm to achieve the level of privacy to the desired level.

The original PAC privacy algorithm often operates the user’s AI model on different samples of the dataset. It uses this information to estimate the variations between these many outputs as well as the relationship and the sound need to be added to protect the data.

This new type of PAC privacy works the same way but does not need to represent the entire matrix of the data correlation in the output; It just needs output variants.

Sridhar explains, “Because the thing you are estimating is much less than the whole covarians matrix, you can do it much faster.” This means that anyone can scale up to big datasets.

Adding noise can damage the utility of the results, and reduce the loss of utility. Due to the calculation cost, the original PAC was limited to adding isotropic sound of the privacy algorithm, which is added uniformly in all directions. Because the new variant estimates the anisotropic sound, which corresponds to the specific characteristics of the training data, the user can add an overall noise to achieve the same level of privacy, accelerating the accuracy of the privately algorithm.

Privacy and stability

As he studied PAC’s privacy, Sridhar guesses that more stable algorithms would be easier to privatize with this technique. She used a more efficient type of PAC privacy to test this principle on many classical algorithms.

When their training data changes slightly, there is less difference in algorithms that are more stable in their output. The PAC breaks the dataset in the privacy part, operates an algorithm on each part of the data, and measures the difference between the output. More noise must be added to privatize the algorithm.

Using stability techniques to reduce the variation in the output of the algorithm will also reduce the amount of noise which needs to be added to privatize it.

“In the best cases, we can get these win-win scenes,” he says.

The team showed that this privacy guarantee has remained strong despite the algorithm they test, and the new form of PAC privacy requires less trial for the sound of sound estimates. They also tested the method of attack simulation, showing that its privacy guarantee can withstand sophisticated attacks.

Devdas says, “We want to explore how algorithms can co-design with PAC privacy, so the algorithm is more stable, secure and stronger than the beginning.” Researchers want to test their method with more complex algorithms and find the privacy-useful tradeoff more.

“Now the question is, when does these win-win situations happen, and how do we get them more often?” Sridhar says.

“I think the main advantage of PAC’s privacy in this setting on other privacy definitions is that it is a black B BOX-you do not need to manually analyze each individual query to privatize the circumstances. Creating a pack-enabled database actively to support the experimental, and functional database of experimental data analytics, “Wisconsin at Madison, who was not associated with this study.

This research was partially, Cisco Systems, Capital One, U.S. Supported by the Department of Defense and Mathwork Fellowship.

Scroll to Top