March 30, 2017
About the author : Joni holds a PhD in marketing. He is currently working as a postdoctoral researcher at Qatar Computing Research Institute and Turku School of Economics. Contact: joolsa (at) utu.fi
Had an interesting chat with Sami Kuusela from Underhood.co. Based on that, got some inspiration for an analysis framework which I’ll briefly describe here.
Figure 1 Identifying and analyzing topical text material
I’m not sure whether steps 4 and 5 would improve the system’s ability to identify topics. It might be that a more general model is not required because the system already can detect the key themes. Would be interesting to test this with a developer.
The whole point is to acknowledge that each large topic naturally divides into small sub-topics, which are dimensions that people perceive relevant for that particular topic. For example, in politics it could be things like “economy”, “domestic policy”, “immigration”, “foreign policy”, etc. While the dimensions can have some consistency based on the field, e.g. all political candidates share some dimensions, the exact mix is likely to be unique, e.g. dimensions of social media texts relating to Trump are likely to be considerably different from those of Clinton. That’s why the analysis ultimately needs to be done case-by-case.
In any case, it is important to note that instead of giving a general sentiment or engagement score of, say a political candidate, we can use an approach like this to give a more in-depth or segmented view of them. This leads to better understanding of “what works or not”, which is information that can be used in strategic decision-making. In addition, the topic-segmented sentiment data could be associated with predictors in a predictive model, e.g. by multiplying each topic sentiment with the weight of the respective topic (assuming the topic corresponds with the predictor).
This is just a conceptual model. As said, would be interesting to test it. There are many potential issues, such as handling with cluster overlap (some text pieces can naturally be placed into several clusters which can cause classification problems) and hierarchical issues (e.g., “employment” is under “economy” and should hence influence the former’s sentiment score).