How to write up machine learning experiments? Here's a process to follow

It’s sometimes difficult, especially for newcomers, to understand all the steps of a machine learning (ML) analysis. Here’s a practical list of the items I’m myself considering when leading an ML research project. Our use of ML is applied; we do research related to social computing (also known as computational social sciences), marketing, analytics, and human-computer interaction (HCI). Overall, we’ve used ML in more than a dozen published research papers.

ML analysis writeup

Problem statement
Data collection
Data cleaning
Exploratory data analysis
Algorithm selection
Data preprocessing
Feature selection
Experiment set-up
Model evaluation
Analysis write-up
Next steps

Detailed instructions for each section:

Problem statement — here, define what’s the business problem you’re trying to solve. Then, describe the problem as an ML task (e.g., classification, regression, clustering).
Data collection — explain how the data was collected
Data cleaning — apply basic cleaning steps as per your data type
Exploratory data analysis — explore your data incl. visualizations and summary stats
Algorithm selection — select algorithms to test
Feature selection — select the features to include into the analysis
Data preprocessing — preprocess the data to suit the algorithms
Experiment set-up — define experimental factors such as train/test split, cross-validation, hyperparameter optimization, replicability via random state values, and evaluation metrics
Model evaluation — (model = algorithms + data) evaluate the performance, consider computational complexity vs. performance
Analysis write-up — provide a brief write-up of each step, ideally in a computational notebook
Next steps — offer suggestions for the next steps (e.g., save the best model for production)

We more or less use this formula in our research using applied ML. Hope it helps!

How to write up machine learning experiments? Here’s a process to follow

ML analysis writeup

Detailed instructions for each section: