Materials and Resources
Slides, notebooks, setup instructions, recordings, and a practical next-step path.
Workshop assets
- Slides:
https://example.com/slides - Notebooks:
https://example.com/notebooks - Code repository:
https://github.com/example/anonymization-workshop
Preparation and setup
Prerequisites
Background knowledge
- Basic statistics (distributions, sampling, summary metrics)
- Practical experience with Python or R (examples use Python)
- Basic SQL familiarity
- Basic understanding of privacy risk or threat modeling concepts
System requirements
- OS: macOS 13+, Ubuntu 22.04+, or Windows 11 with WSL2
- RAM: minimum 8 GB (16 GB recommended)
- Disk: at least 6 GB free for notebooks and datasets
- Browser: recent Chrome, Firefox, Safari, or Edge
- Permissions: local admin rights for package installation
Tooling setup
- Install Python 3.11+.
- Create and activate a virtual environment.
- Install workshop dependencies.
- Launch the notebook sanity test.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
jupyter lab
Sanity check command
python checks/setup_check.py
Expected result: all checks pass (Python version, package imports, notebook kernel).
Datasets
- Dataset A: synthetic patient events (
~120 MB) - Dataset B: customer transactions (
~85 MB) - Dataset C: mobility logs (
~160 MB)
License and permissions:
- All workshop datasets are synthetic or safely simulated teaching artifacts.
- No real sensitive personal data is distributed in the workshop materials.
Download links (replace with final URLs before publishing):
https://example.com/data/dataset-a.ziphttps://example.com/data/dataset-b.ziphttps://example.com/data/dataset-c.zip
Pre-reading and glossary
Quick glossary
- Anonymization: processing intended to make identification reasonably infeasible.
- Pseudonymization: identifiers replaced but re-linking remains possible.
- Identifiability: ability to single out or link a person from available data.
- Quasi-identifiers: attributes that can identify when combined (e.g., ZIP, birth year, gender).
Optional references
- NIST de-identification engineering notes
- ENISA guidance on data pseudonymization/anonymization
- Introductory differential privacy primers
Recording (if available)
- Full recording:
https://example.com/recording - Suggested timestamps:
- 00:00-25:00: Scope and definitions
- 25:00-90:00: Threat modeling and classical methods
- 90:00-145:00: Differential privacy essentials
- 145:00-end: Labs and implementation checklist
What to do next
- Re-run both labs on your own internal test dataset.
- Build a small anonymization decision log template for your team.
- Pilot one risk-review checkpoint before external data release.
- Reassess utility metrics after each policy or transform change.
Reference list and tool catalog
- NIST privacy engineering references
- ENISA guidance documents
- OpenDP and SmartNoise resources
- ARX anonymization framework
- Synthetic data benchmark papers (recent surveys)