How to write up machine learning experiments? Here’s a process to follow

It’s sometimes difficult, especially for newcomers, to understand all the steps of a machine learning (ML) analysis. Here’s a practical list of the items I’m myself considering when leading an ML research project. Our use of ML is applied; we do research related to social computing (also known as computational social sciences), marketing, analytics, and human-computer interaction (HCI). Overall, we’ve used ML in more than a dozen published research papers.

ML analysis writeup

  • Problem statement
  • Data collection
  • Data cleaning
  • Exploratory data analysis
  • Algorithm selection
  • Data preprocessing
  • Feature selection
  • Experiment set-up
  • Model evaluation
  • Analysis write-up
  • Next steps

Detailed instructions for each section:

  • Problem statement — here, define what’s the business problem you’re trying to solve. Then, describe the problem as an ML task (e.g., classification, regression, clustering).
  • Data collection — explain how the data was collected
  • Data cleaning — apply basic cleaning steps as per your data type
  • Exploratory data analysis — explore your data incl. visualizations and summary stats
  • Algorithm selection — select algorithms to test
  • Feature selection — select the features to include into the analysis
  • Data preprocessing — preprocess the data to suit the algorithms
  • Experiment set-up — define experimental factors such as train/test split, cross-validation, hyperparameter optimization, replicability via random state values, and evaluation metrics
  • Model evaluation — (model = algorithms + data) evaluate the performance, consider computational complexity vs. performance
  • Analysis write-up — provide a brief write-up of each step, ideally in a computational notebook
  • Next steps — offer suggestions for the next steps (e.g., save the best model for production)

We more or less use this formula in our research using applied ML. Hope it helps!