Materials and Resources

Slides, notebooks, setup instructions, recordings, and a practical next-step path.

Workshop assets

Slides: https://example.com/slides
Notebooks: https://example.com/notebooks
Code repository: https://github.com/example/anonymization-workshop

Preparation and setup

Prerequisites

Background knowledge

Basic statistics (distributions, sampling, summary metrics)
Practical experience with Python or R (examples use Python)
Basic SQL familiarity
Basic understanding of privacy risk or threat modeling concepts

System requirements

OS: macOS 13+, Ubuntu 22.04+, or Windows 11 with WSL2
RAM: minimum 8 GB (16 GB recommended)
Disk: at least 6 GB free for notebooks and datasets
Browser: recent Chrome, Firefox, Safari, or Edge
Permissions: local admin rights for package installation

Tooling setup

Install Python 3.11+.
Create and activate a virtual environment.
Install workshop dependencies.
Launch the notebook sanity test.

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
jupyter lab

Sanity check command

python checks/setup_check.py

Expected result: all checks pass (Python version, package imports, notebook kernel).

Datasets

Dataset A: synthetic patient events (~120 MB)
Dataset B: customer transactions (~85 MB)
Dataset C: mobility logs (~160 MB)

License and permissions:

All workshop datasets are synthetic or safely simulated teaching artifacts.
No real sensitive personal data is distributed in the workshop materials.

Download links (replace with final URLs before publishing):

https://example.com/data/dataset-a.zip
https://example.com/data/dataset-b.zip
https://example.com/data/dataset-c.zip

Pre-reading and glossary

Quick glossary

Anonymization: processing intended to make identification reasonably infeasible.
Pseudonymization: identifiers replaced but re-linking remains possible.
Identifiability: ability to single out or link a person from available data.
Quasi-identifiers: attributes that can identify when combined (e.g., ZIP, birth year, gender).

Optional references

NIST de-identification engineering notes
ENISA guidance on data pseudonymization/anonymization
Introductory differential privacy primers

Recording (if available)

Full recording: https://example.com/recording
Suggested timestamps:
- 00:00-25:00: Scope and definitions
- 25:00-90:00: Threat modeling and classical methods
- 90:00-145:00: Differential privacy essentials
- 145:00-end: Labs and implementation checklist

What to do next

Re-run both labs on your own internal test dataset.
Build a small anonymization decision log template for your team.
Pilot one risk-review checkpoint before external data release.
Reassess utility metrics after each policy or transform change.

Reference list and tool catalog

NIST privacy engineering references
ENISA guidance documents
OpenDP and SmartNoise resources
ARX anonymization framework
Synthetic data benchmark papers (recent surveys)