BUTTER - Empirical Deep Learning Dataset

Description

The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

Resources

Name Format Description Link
21 BUTTER Empirical Deep Learning Dataset on AWS in S3 https://data.openei.org/s3_viewer?bucket=oedi-data-lake&prefix=butter%2F
28 A dataset readme describing schema, organization, and contents of the dataset. https://github.com/openEDI/documentation/blob/main/BUTTER.md
21 This repository contains code, notebooks, instructions and examples to access the NREL Butter Empirical Deep Learning Dataset via AWS S3 and to reproduce the figures and analysis in our upcoming paper about the dataset. https://github.com/NREL/BUTTER-Better-Understanding-of-Training-Topologies-through-Empirical-Results
21 AWS public dataset program registry page for data released under the Department of Energy's (DOE) Open Energy Data Initiative (OEDI). The registry page contains information about dataset documentation, access, and contact, for each of the OEDI Data Lake datasets. https://registry.opendata.aws/oedi-data-lake/

Tags

  • training-epoch
  • epoch
  • topology
  • deep-learning
  • label-noise
  • empirical-machine-learning
  • network-shape
  • benchmark
  • regularization
  • learning-rate
  • network-topology
  • neural-networks
  • empirical
  • neural-architecture-search
  • shape
  • training
  • minibatch-size
  • machine-learning
  • depth
  • batch-size
  • empirical-deep-learning

Topics

Categories