Materials and Resources

Slides, notebooks, setup instructions, recordings, and a practical next-step path.

Workshop assets

  • Slides: https://example.com/slides
  • Notebooks: https://example.com/notebooks
  • Code repository: https://github.com/example/anonymization-workshop

Preparation and setup

Prerequisites

Background knowledge

  • Basic statistics (distributions, sampling, summary metrics)
  • Practical experience with Python or R (examples use Python)
  • Basic SQL familiarity
  • Basic understanding of privacy risk or threat modeling concepts

System requirements

  • OS: macOS 13+, Ubuntu 22.04+, or Windows 11 with WSL2
  • RAM: minimum 8 GB (16 GB recommended)
  • Disk: at least 6 GB free for notebooks and datasets
  • Browser: recent Chrome, Firefox, Safari, or Edge
  • Permissions: local admin rights for package installation

Tooling setup

  1. Install Python 3.11+.
  2. Create and activate a virtual environment.
  3. Install workshop dependencies.
  4. Launch the notebook sanity test.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
jupyter lab

Sanity check command

python checks/setup_check.py

Expected result: all checks pass (Python version, package imports, notebook kernel).

Datasets

  • Dataset A: synthetic patient events (~120 MB)
  • Dataset B: customer transactions (~85 MB)
  • Dataset C: mobility logs (~160 MB)

License and permissions:

  • All workshop datasets are synthetic or safely simulated teaching artifacts.
  • No real sensitive personal data is distributed in the workshop materials.

Download links (replace with final URLs before publishing):

  • https://example.com/data/dataset-a.zip
  • https://example.com/data/dataset-b.zip
  • https://example.com/data/dataset-c.zip

Pre-reading and glossary

Quick glossary

  • Anonymization: processing intended to make identification reasonably infeasible.
  • Pseudonymization: identifiers replaced but re-linking remains possible.
  • Identifiability: ability to single out or link a person from available data.
  • Quasi-identifiers: attributes that can identify when combined (e.g., ZIP, birth year, gender).

Optional references

  • NIST de-identification engineering notes
  • ENISA guidance on data pseudonymization/anonymization
  • Introductory differential privacy primers

Recording (if available)

  • Full recording: https://example.com/recording
  • Suggested timestamps:
    • 00:00-25:00: Scope and definitions
    • 25:00-90:00: Threat modeling and classical methods
    • 90:00-145:00: Differential privacy essentials
    • 145:00-end: Labs and implementation checklist

What to do next

  1. Re-run both labs on your own internal test dataset.
  2. Build a small anonymization decision log template for your team.
  3. Pilot one risk-review checkpoint before external data release.
  4. Reassess utility metrics after each policy or transform change.

Reference list and tool catalog

  • NIST privacy engineering references
  • ENISA guidance documents
  • OpenDP and SmartNoise resources
  • ARX anonymization framework
  • Synthetic data benchmark papers (recent surveys)