Skip to content

How to write up machine learning experiments? Here’s a process to follow

Last updated on January 1, 2025

It’s sometimes difficult, especially for newcomers, to understand all the steps of a machine learning (ML) analysis. Here’s a practical list of the items I’m myself considering when leading an ML research project. Our use of ML is applied; we do research related to social computing (also known as computational social sciences), marketing, analytics, and human-computer interaction (HCI). Overall, we’ve used ML in more than a dozen published research papers.

ML analysis writeup

  • Problem statement
  • Data collection
  • Data cleaning
  • Exploratory data analysis
  • Algorithm selection
  • Data preprocessing
  • Feature selection
  • Experiment set-up
  • Model evaluation
  • Analysis write-up
  • Next steps

Detailed instructions for each section:

  • Problem statement — here, define what’s the business problem you’re trying to solve. Then, describe the problem as an ML task (e.g., classification, regression, clustering).
  • Data collection — explain how the data was collected
  • Data cleaning — apply basic cleaning steps as per your data type
  • Exploratory data analysis — explore your data incl. visualizations and summary stats
  • Algorithm selection — select algorithms to test
  • Feature selection — select the features to include into the analysis
  • Data preprocessing — preprocess the data to suit the algorithms
  • Experiment set-up — define experimental factors such as train/test split, cross-validation, hyperparameter optimization, replicability via random state values, and evaluation metrics
  • Model evaluation — (model = algorithms + data) evaluate the performance, consider computational complexity vs. performance
  • Analysis write-up — provide a brief write-up of each step, ideally in a computational notebook
  • Next steps — offer suggestions for the next steps (e.g., save the best model for production)

We more or less use this formula in our research using applied ML. Hope it helps!

Published inenglish