It’s sometimes difficult, especially for newcomers, to understand all the steps of a machine learning (ML) analysis. Here’s a practical list of the items I’m myself considering when leading an ML research project. Our use of ML is applied; we do research related to social computing (also known as computational social sciences), marketing, analytics, and human-computer interaction (HCI). Overall, we’ve used ML in more than a dozen published research papers.
ML analysis writeup
- Problem statement
- Data collection
- Data cleaning
- Exploratory data analysis
- Algorithm selection
- Data preprocessing
- Feature selection
- Experiment set-up
- Model evaluation
- Analysis write-up
- Next steps
Detailed instructions for each section:
- Problem statement — here, define what’s the business problem you’re trying to solve. Then, describe the problem as an ML task (e.g., classification, regression, clustering).
- Data collection — explain how the data was collected
- Data cleaning — apply basic cleaning steps as per your data type
- Exploratory data analysis — explore your data incl. visualizations and summary stats
- Algorithm selection — select algorithms to test
- Feature selection — select the features to include into the analysis
- Data preprocessing — preprocess the data to suit the algorithms
- Experiment set-up — define experimental factors such as train/test split, cross-validation, hyperparameter optimization, replicability via random state values, and evaluation metrics
- Model evaluation — (model = algorithms + data) evaluate the performance, consider computational complexity vs. performance
- Analysis write-up — provide a brief write-up of each step, ideally in a computational notebook
- Next steps — offer suggestions for the next steps (e.g., save the best model for production)
We more or less use this formula in our research using applied ML. Hope it helps!