SDNist v1.3: Temporal Map Challenge Environment

Description

SDNist (v1.3) is a set of benchmark data and metrics for the evaluation of synthetic data generators on structured tabular data. This version (1.3) reproduces the challenge environment from Sprints 2 and 3 of the Temporal Map Challenge. These benchmarks are distributed as a simple open-source python package to allow standardized and reproducible comparison of synthetic generator models on real world data and use cases. These data and metrics were developed for and vetted through the NIST PSCR Differential Privacy Temporal Map Challenge, where the evaluation tools, k-marginal and Higher Order Conjunction, proved effective in distinguishing competing models in the competition environment.SDNist is available via `pip` install: `pip install sdnist==1.2.8` for Python >=3.6 or on the [USNIST/Github](https://github.com/usnistgov/Differential-Privacy-Temporal-Map-Challenge-assets/). The sdnist Python module will download data from NIST as necessary, and users are not required to download data manually.

Resources

Name Format Description Link
5 A jinja2 report template to help humans read the k-marginal data https://data.nist.gov/od/ds/mds2-2515/report2.jinja2
5 https://data.nist.gov/od/ds/mds2-2515/GA_NC_SC_10Y_PUMS.parquet
23 NY_PA_10Y_PUMS.json https://data.nist.gov/od/ds/mds2-2515/NY_PA_10Y_PUMS.json
23 GA_NC_SC_10Y_PUMS.json https://data.nist.gov/od/ds/mds2-2515/GA_NC_SC_10Y_PUMS.json
5 https://data.nist.gov/od/ds/mds2-2515/IL_OH_10Y_PUMS.parquet
23 IL_OH_10Y_PUMS.json https://data.nist.gov/od/ds/mds2-2515/IL_OH_10Y_PUMS.json
5 https://data.nist.gov/od/ds/mds2-2515/NY_PA_10Y_PUMS.parquet
5 https://data.nist.gov/od/ds/mds2-2515/taxi2016.parquet
23 taxi2016.json https://data.nist.gov/od/ds/mds2-2515/taxi2016.json
5 https://data.nist.gov/od/ds/mds2-2515/taxi2020.parquet
23 taxi2020.json https://data.nist.gov/od/ds/mds2-2515/taxi2020.json
5 https://data.nist.gov/od/ds/mds2-2515/taxi.parquet
23 taxi.json https://data.nist.gov/od/ds/mds2-2515/taxi.json
0 https://doi.org/10.18434/mds2-2515
0 SDNist: Benchmark data and evaluation tools for synthetic data generators https://github.com/usnistgov/SDNist/
8 Three compressed CSV files to run the 'Census'-related functions in SDNist. https://data.nist.gov/od/ds/mds2-2515/census-datasets-CSVs.zip
8 Three compressed CSV files to run the 'Taxi'-related functions in SDNist. https://data.nist.gov/od/ds/mds2-2515/taxi-datasets-CSVs.zip

Tags

  • differential-privacy
  • privacy
  • synthetic-data
  • private-information-sharing
  • benchmarks

Topics

Categories