Editing
Privacy Ml
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> {{BloomIntro}} Privacy-preserving machine learning (PPML) is the study and practice of training and deploying ML models in ways that protect the privacy of the underlying data. As AI systems increasingly train on sensitive personal data β medical records, financial histories, behavioral data β techniques for learning from data without exposing individual records have become essential. PPML encompasses differential privacy (mathematical privacy guarantees), federated learning (training without centralizing data), secure multi-party computation (collaborating without sharing raw data), and homomorphic encryption (computing on encrypted data). </div> __TOC__ <div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Remembering</span> == * '''Differential privacy (DP)''' β A mathematical guarantee that the inclusion or exclusion of any single record makes little difference to the output, bounded by parameter Ξ΅. * '''Epsilon (Ξ΅) in DP''' β The privacy budget: smaller Ξ΅ = stronger privacy guarantee but more noise added. Typical values: Ξ΅=1β10. * '''Noise mechanism''' β Adding calibrated random noise to protect privacy: Laplace mechanism, Gaussian mechanism. * '''DP-SGD (Differentially Private SGD)''' β Training neural networks with differential privacy by clipping and noising gradients. * '''Federated learning''' β Training on data distributed across many devices without centralizing the raw data; only model updates are shared. * '''Secure aggregation''' β Aggregating federated model updates without the server seeing individual updates (using cryptographic protocols). * '''Homomorphic encryption (HE)''' β Cryptographic technique allowing computation on encrypted data without decryption. * '''Secure Multi-Party Computation (SMPC)''' β Multiple parties jointly compute a function on their private inputs without revealing those inputs. * '''Membership inference attack''' β An attack testing whether a specific record was in the training data; measures privacy leakage. * '''Model inversion attack''' β Reconstructing training data from a trained model's outputs; a privacy risk. * '''Data minimization''' β Collecting and using only the minimum data necessary; a GDPR principle. * '''Synthetic data (privacy)''' β Generating realistic but non-personal data to share instead of real records. * '''k-anonymity''' β A data protection model where each record is indistinguishable from at least k-1 others. * '''Privacy budget''' β The total privacy expenditure across multiple DP queries or training steps; must be managed carefully. </div> <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Understanding</span> == '''The core problem''': ML models memorize training data. This is well-documented: models can reveal training examples when queried appropriately. This creates serious privacy risks when training data includes medical records, financial transactions, or personal communications. '''Differential Privacy (DP)''' provides a rigorous mathematical definition of privacy. A mechanism M satisfies (Ξ΅, Ξ΄)-differential privacy if for any two adjacent datasets D and D' (differing by one record), and any output S: P(M(D) β S) β€ e^Ξ΅ Β· P(M(D') β S) + Ξ΄. This means the output distribution is nearly identical whether or not any individual's data was included β their privacy is protected regardless of what the attacker knows. '''DP-SGD''' is the standard technique for differentially private deep learning (Abadi et al., 2016): # Compute gradient for each sample individually. # Clip each gradient to bounded L2 norm (prevents any single example from having too much influence). # Add Gaussian noise calibrated to the privacy budget. # Average the noisy, clipped gradients and update model. The cost: additional noise degrades model utility, especially for complex models and small datasets. '''Federated learning''' keeps data on-device. Google's Gboard keyboard predicts the next word by training on user input directly on phones; only encrypted gradient updates are sent to a central server, aggregated, and used to update the global model. No raw text ever leaves the device. '''The tradeoff landscape''': Strong privacy β more noise β lower model accuracy. There is a fundamental tension between privacy and utility. The PPML field works to close this gap, but it cannot be eliminated entirely with current techniques. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Applying</span> == '''Differentially private model training with Opacus:''' <syntaxhighlight lang="python"> import torch import torch.nn as nn from opacus import PrivacyEngine from opacus.validators import ModuleValidator from torch.utils.data import DataLoader model = nn.Sequential( nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10) ) # Validate and fix model for DP-SGD compatibility model = ModuleValidator.fix(model) optimizer = torch.optim.SGD(model.parameters(), lr=0.05) # Attach privacy engine privacy_engine = PrivacyEngine() model, optimizer, train''loader = privacy''engine.make''private''with_epsilon( module=model, optimizer=optimizer, data''loader=DataLoader(train''dataset, batch_size=256), epochs=20, target_epsilon=1.0, # Privacy budget Ξ΅ (lower = stronger privacy) target_delta=1e-5, # Ξ΄ β small probability of privacy violation max''grad''norm=1.0, # Gradient clipping norm ) # Training loop (same as standard; Opacus handles DP automatically) for epoch in range(20): for X, y in train_loader: optimizer.zero_grad() output = model(X) loss = nn.CrossEntropyLoss()(output, y) loss.backward() optimizer.step() epsilon = privacy''engine.get''epsilon(delta=1e-5) print(f"Final privacy budget spent: Ξ΅ = {epsilon:.2f}") </div>
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Template used on this page:
Template:BloomIntro
(
edit
)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information