"We got high SDs in our user study? Now what?"

A major problem in user studies are high SDs (=standard deviation). People generally experience and perceive things differently. This imposes problems for statistical analysis of user study data, because methods are often based on means, e.g., means comparison (t-test, ANOVA).

Consider two cases: participant A gives on a five-point likert scale the rating 5 to a system (maximum), participant B gives 1 (minimum). Their mean rating is (5+1)/2=3. In another case, participant A gives 3 and participant B also gives 3; also in this case the mean rating is 3. So, in both cases the mean rating is 3, so would it be correct to say that in both cases “the system was average” (because 3 is average on a 5-point scale)? Obviously not: it was only average in the latter case, where people agreed. In the former case, it was average to no-one. One person thought it was the best system they ever used while another thought it was the worst.

So, how do we tackle this problem when analyzing user study data? Here are four approaches:

DESCRIPTIVE APPROACH:
Always report standard deviations alongside means, and consider using confidence intervals rather than just point estimates. Similarly, visualize distributions that show how the data is spread (box plots are the standard way). You may also define some threshold values (e.g., people how many people rated 4-5, how many 1-2) to give more information about variation in people’s answers.

MODELING APPROACH:
You can use mixed-effect models to try and incorporate individual variation into the analysis. Mixed-effects models work well when you have repeated measures – like when each participant rates multiple systems or completes multiple tasks. In these cases, you could have enough data to control for participant-level differences and get clearer comparisons between systems.

SEGMENTATION APPROACH:
Simply visualizing and looking at the distributions of the dependent variables can give some insights into how people vary in their perceptions. This can be followed up with a more formal clustering of participants based on their rating patterns and demographics to identify distinct user groups. What looks like high variance might actually be various subgroups with different needs or preferences.

QUALITATIVE APPROACH:
Looking at the open-ended answers by people who gave extreme ratings can give information about what was particularly good or bad for specific people. This can help contextualize findings from the other approaches.

Probably the worst way to deal with the “high SD problem” is to just ignore it. Unfortunately, a lot of user study analyses do just that!

“We got high SDs in our user study? Now what?”

Related posts: