Problem of averages applies to social media community guidelines

(This post is based on a discussion with a couple of other online hate researchers.)

  1. given a general policy (community guidelines), it is possible to create specific explanations that cover most cases of violating the policy (example of explanation: “your message was flagged as containing islamophobic content”). This is based on the idea that ultimately the policy itself is finite, so even though cases of islamophobia might be many, the policy always either contains or does not contain this form of hate speech. If the general policy itself is lacking, then it needs to be fixed first.
  2. the problem of explaining hate speech moderation could be seen as a classification problem, where each explanation is a class. Here, we observed the ground truth problem, which we referred to as inherent subjectivity of hate speech. In other words, it is not possible to create uncontestable or “immutable” hate speech.
  3. the solution to this inherent subjectivity can take place at two levels: (a) at the user level by finetuning/adjusting the hate speech detection algorithm based on user preferences and not community guidelines, i.e., learning to flag what the user finds offensive rather than defining it a priori. This would make community guidelines obsolete or at least very much less influential (possibly only focusing on some hateful words that could not be used in a non-offensive way, if those exist).
    …or, (b) at the community level, where each community (e.g., page, person) within the platform defines its own rules as to what speech is allowed. By joining that community, a user acknowledges those rules. This essentially shifts the community guideline creation from the platform to subcommunities within the platform.
  4. both a and b above rely on the notion that a platform’s community guidelines essentially suffer from the problem of averages: average is good in general, but perfect for nobody. The only way I can see around that is by incorporating user or community features (=preferences) to essentially allow/disallow certain speech in certain communities. Then, users who do not wish to see certain speech simply unfollow the community. This affords the flexibility of creating all kinds of spaces. Simultaneously, I would give more moderation tools to the communities themselves which, again, I think is a better approach than a global “one shoe fits all” policy.