Archive for the english category

Joni

How to teach machines common sense? Solutions for ambiguity problem in artificial intelligence

english

Introduction

The ambiguity problem illustrated:

User: “Siri, call me an ambulance!”

Siri: “Okay, I will call you ‘an ambulance’.”

You’ll never reach the hospital, and end up bleeding to death.

Solutions

Two potential solutions:

A. machine builds general knowledge (“common sense”)

B. machine identifies ambiguity & asks for clarification from humans

The whole “common sense” problem can be solved by introducing human feedback into the system. We really need to tell the machine what is what, just like a child. It is iterative learning, in which trials and errors take place.

But, in fact, A. and B. converge by doing so. Which is fine, and ultimately needed.

Contextual awareness

To determine which solution to an ambiguous situation is proper, the machine needs contextual awareness; this can be achieved by storing contextual information from each ambiguous situation, and being explained “why” a particular piece of information results in disambiguity. It’s not enough to say “you’re wrong”, but there needs to be an explicit association to a reason (concept, variable). Equally, it’s not enough to say “you’re right”, but again the same association is needed.

The process:

1) try something

2) get told it’s not right, and why (linking to contextual information)

3) try something else, corresponding to why

4) get rewarded, if it’s right.

The problem is, currently machines are being trained by data, not by human feedback.

New thinking on teaching the machine

So we would need to build machine-training systems which enable training by direct human feedback, i.e. a new way to teach and communicate with the machine. It’s not a trivial thing, since the whole machine-learning paradigm is based on data. From data and probabilities, we would need to move into associations and concepts. A new methodology is needed. Potentially, individuals could train their own AIs like pets (think Tamagotchi), or we could use large numbers of crowd workers who would explain the machine why things are how they are (i.e., create associations). A specific type of markup (=communication) would probably also be needed.

Through mimicking human learning we can teach the machine common sense. This is probably the only way; since common sense does not exist beyond human cognition, it can only be learnt from humans. An argument can be made that this is like going back in time, to era where machines followed rule-based programming (as opposed to being data-driven). However, I would argue rule-based learning is much closer to human learning than the current probability-based one, and if we want to teach common sense, we therefore need to adopt the human way.

Conclusion: machines need education

Machine learning may be at par, but machine training certainly is not. The current machine learning paradigm is data-driven, whereas we could look into ways for concept-driven training approaches.

Joni

Rule-based AdWords bidding: Hazardous loops

english

1. Introduction

In rule-based bidding, you want to sometimes have step-backs where you first adjust your bid based on a given condition, and then adjust it back after the condition has passed.

An example. An use case would be to decrease bids for weekend, and increase back to normal level for weekdays.

However, defining the step-back rate is not done how most people would think. I’ll tell you how.

2. Step-back bidding

For step-back bidding you need two rules: one to change the bid (increase/decrease) and another one to do the opposite (decrease/increase). The values applied by these rules must cancel one another.

So, if your first rule raises the bid from $1 to $2, you want the second rule to drop it back to $1.

Call these

x = raise by percentage

y = lower by percentage

Where most people get confused is by assuming x=y, so that you use the same value for both the rules.

Example 1:

x = raise by 15%

y = lower by 15%

That should get us back to our original bid, right? Wrong.

If you do the math (1*1.15*0.85), you get 0.997, whereas you want 1 (to get back to the baseline).

The more you iterate with the wrong step-back value, the farther from the baseline you end. To illustrate, see the following simulation, where the loop is applied weekly for three months (12 weeks * 2 = 24 data points).

Figure 1 Bidding loop

As you can see, the wrong method will take you more and more off from the correct pattern as the time goes by. For a weekly rule the difference might be manageable, especially if the rule’s incremental change is small, but imagine if you are running the rule daily or each time you bid (intra-day).

3. Solution

So, how to get to 1?

It’s very simple, really. Consider

  • B = baseline value (your original bid)
  • x = the value of the first rule (e.g., raise bid by 15% –> 0.15)
  • y = the value of the second rule (dependant on the 1st rule)

You want to solve y from

B(1+x) * y = 1

That is,

y = 1 / B(1+x)

For the value in Example 1,

y = 1 / 1*(1+0.15)

multiplying that by the increased value results in 1, so that

1.15 * (1/1*(1+0.15) = 1

Conclusion

Remember to consider elementary mathematics, when applying AdWords bidding rules!

Joni

Affinity analysis in political social media marketing – the missing link

english

Introduction. Hm… I’ve figured out how to execute successful political marketing campaign on social media [1], but one link is missing still. Namely, applying affinity analysis (cf. market basket analysis).

Discounting conversions. Now, you are supposed to measure “conversions” by some proxy – e.g., time spent on site, number of pages visited, email subscription. Determining which measurable action is the best proxy for likelihood of voting is a crucial sub-problem, which you can approach with several tactics. For example, you can use the closest action to final conversion (vote), i.e. micro-conversion. This requires you have an understanding of the sequence of actions leading to final conversion. You could also use a relative cut-off point; e.g. the nth percentile with the highest degree of engagement is considered as converted.

Anyhow, this is very important because once you have secured a vote, you don’t want to waste your marketing budget by showing ads to people who already have decided to vote for your candidate. Otherwise, you risk “preaching to the choir”. Instead, you want to convert as many uncertain voters to voters as possible, by using different persuasion tactics.

Affinity analysis. The affinity analysis can be used to accomplish this. In ecommerce, you would use it as a basis for recommendation engine for cross-selling or up-selling (“customers who bought this item also bought…” à la Amazon). First you detemine which sets of products are most popular, and then show those combinations to buyers interested in any item belonging to that set.

In political marketing, affinity analysis means that because a voter is interested in topic A, he’s also interested in topic B. Therefore, we will show him information on topic B, given our extant knowledge his interests, in order to increase likelihood of conversion. This is a form of associative

Operationalization. But operationalizing this is where I’m still in doubt. One solution could be building an association matrix based on website behavior, and then form corresponding retargeting audiences (e.g., website custom audiences on Facebook). The following picture illustrates the idea.

Figure 1 Example of affinity analysis (1=Visited page, 0=Did not visit page)

For example, we can see that themes C&D and A&F commonly occur together, i.e. people visit those sub-pages in the campaign site. You can validate this by calculating correlations between all pairs. When you set your data in binary format (0/1), you can use Pearson correlation for the calculations.

Facebook targeting. Knowing this information, we can build target audiences on Facebook, e.g. “Visited /Theme_A; NOT /Theme_F; NOT /confirmation”, where confirmation indicates conversion. Then, we would show ads on Theme F to that particular audience. In practice, we could facilitate the process by first identifying the most popular themes, and then finding the associated themes. Once the user has been exposed to a given theme, and did not convert, he needs to be exposed to another theme (with the highest association score). The process is continued until themes run out, or the user converts, which ever comes first. Applying the earlier logic of determining proxy for conversion, visiting all theme sub-pages can also be used as a measure for conversion.

Finally, it is possible to use more advanced methods of associative learning. That is, we could determine that {Theme A, Theme F} => {Theme C}, so that themes A and B predict interest in theme C. However, it is more appropriate to predict conversion rather than interest in other themes, because ultimately we’re interested in persuading more voters.

Footnotes

[1] Posts in Finnish:

https://www.facebook.com/joni.salminen.33/posts/10212240031455606

https://www.facebook.com/joni.salminen.33/posts/10212237230465583

Joni

Total remarketing – the concept

english

Here’s a definition:

Total remarketing is remarketing in all possible channels with all possible list combinations.

Channels:

  • Programmatic display networks (e.g., Adroll)
  • Google (GDN, RLSA)
  • Facebook (Website Custom Audience)
  • Facebook (Video viewers / Engaged with ads)
  • etc.

How to apply:

  1. Test 2-3 different value propositions per group
  2. Prefer up-selling and cross-selling over discounts (the goal is to increase AOV, not reduce it; e.g. you can include an $20 gift voucher when basket size exceeds $100)
  3. Configure well; exclude those who bought; use information you have to improve remarketing focus (e.g. time of site, products or categories visited — the same remarketing for all groups is like the same marketing for all groups)
  4. Consider automation options (dynamic retargeting; behavior based campaign suggestions for the target)
Joni

In 2016, Facebook bypassed Google in ads. Here’s why.

english
In 2016, Facebook bypassed Google in ads. Here’s why.

Introduction

The gone 2016 was the first year I thought Facebook ends up beating Google in the ad race, despite the fact Google still dominates in revenue ($67Bn vs. $17Bn in 2015). I’ll explain why.

First, consider that Google’s growth is restricted by three things:

  1. natural demand
  2. keyword volumes, and
  3. approach of perfect market.

More demand than supply

First, at any given time there is a limited number of people interested in a product/service. The interest can be of purchase intent or just general interest, but either way it translates into searches. Each search is an impression that Google can sell to advertisers through its AdWords bidding. The major problem is this: even when I’d like to spend more money on AdWords, I cannot. There is simply not enough search volume to satisfy my budget (in many cases there is, but in highly targeted and profitable campaigns many times there isn’t). So, the excess budget I will spend elsewhere where the profitable ad inventory is not limited (that is, Facebook at the moment).

Limited growth

According to estimates, search volume is growing by 10-15% annually [1]. Yet, Google’s revenue is expected to grow even by 26% [2]. Over the year, Google’s growth rate in terms of search volume has substantially decreased, although this is perceived as a natural phenomenon (after trillion searches it’s hard to keep growing double digits). In any case, the aforementioned dynamics reflect to search volumes – when the volumes don’t grow much and new advertisers keep entering the ad auction, there is more competition over the same searches. In other words, supply stays stable but demand increases, resulting in more intense bid wars.

Approaching perfect market

For a long time now, I’ve added +15% increase in internal budgeting for AdWords, and last year that was hard to maintain. Google is still a profitable channel, but the advertisers’ surplus is decreasing year by year, incentivizing them to look for alternative channels. While Google is restrained by its natural search volumes, Facebook’s ad inventory (=impressions) are practically limitless. The closer AdWords gets to a perfect market (=no economic rents), the less attractive it is for savvy marketers. Facebook is less exploited, and allows rents.

What will Google do?

Finally, I don’t like the Alphabet business. Already in the beginning it signals to investors that Google is in “whatever comes to mind” business instead of strategic focus on search. Most likely Alphabet ends up draining resources from the mother company, producing loss and taking human capital off from succeeding in online ads business (which is where their money comes from). In contrast, Facebook is very focused on social; it buys off competitors and improves fast. That said, I do have to recognize that Google’s advertising system is still much better than that of Facebook, and in fact still the best in the world. But momentum seems to be shifting to Facebook’s side.

Conclusion

The maximum number of impressions (=ad inventory) of Facebook is much higher than that of Google, because Google is limited by natural demand and Facebook is not. In the marketplace, there is always more supply than demand which is why advertisers want to spend more than what Google enables. These factors combined with Facebook’s continously increasing ability to match interested people with the right type of ads, makes Facebook’s revenue potential much bigger than Google’s.

From advertiser’s perspective, Facebook and Google both are and are not competitors. They are competitors for ad revenue, but they are not competitors in the online channel mix. Because Google is for demand capture and Facebook for demand creation, most marketers want to include both in their channel mix. This means Google’s share of online ad revenue might decrease, but a rational online advertisers will not drop its use so it will remain as a (less important) channel into foreseeable future.

References

[1] http://www.internetlivestats.com/google-search-statistics/

[2] http://venturebeat.com/2016/09/27/4-graphs-show-the-state-of-facebook-and-googles-revenue-dominance/

Joni

Buying and selling complement bundles: When individual selling maximizes profit

english

Introduction

When we were young, me and my brother used to buy and sell game consoles on Huuto.net (local eBay) and on various gamer discussion forums (Konsolifin BBS, for example). We didn’t have much money, so this was a great way to earn some cash — plus it taught us some useful business lessons along the years.

What we would often do was to buy a bundle (console+games), break it apart and sell the pieces individually. At that time we didn’t know anything about economics, but intuitively it felt the right thing to do. Indeed, we would always make money with that strategy, as we knew the market prices (or their range) of each individual item.

Looking back, I can now try and explain with economic terms why this was a successful strategy. In other words, why individual selling of items in a complement bundle is a winning strategy.

Why does individual selling provide a better profit than the selling of a bundle?

Let’s first define the concepts.

  • individual selling = buy complement bundle, break it apart and sell individual pieces
  • a complement bundle = a central unit and its complements (e.g., a game console and games)

Briefly, it is so because the tastes of the market are randomly distributed and do not align with the exact contents of the bundle. It then follows that the exact set of complements does not maximize any individual’s utility, so they will bid accordingly (e.g., “I like those two games (out of five), but not the three so I don’t put much value to them”) and the market price of the bundle will set below the full value of its individual parts.

In contrast, by breaking apart and selling individually each complement can be appraised at full value (“I like that game, so I’ll pay its real value”). In other words, the seller will need to find a buyer for each piece who appreciates that piece to its full value (=has a preference for it).

The intuition

Tastes and preferences differ, which reflects to individuals’ utility functions and therefore willingness to pay. Selling a bundle is a compromise from the perspective of the seller – he compromises his full price, because the buyer is willing to pay only according to his preferences (utility function) which do not match completely with the contents of the bundle.

Limitations

There are two exceptions I can think of:

1) Highly valued complements (or homogeneous tastes)

Say all the complements are of high value in the market (e.g., popular hit games). Then, a large portion of the market assigns full value to them, and the bundle sets close or equal to the sum of individual full prices. Similarly, if all the buyers value the complements in a similar way, i.e. their taste is homogeneous, the randomness required for the individual selling to perform does not exist.

2) Information asymmetry

Sometimes, you can get a higher price by selling a bundle than by selling the individual pieces. We would use this strategy when the value of complements is very little to an “expert”. Then, if you were less experienced you could see a game console + 5 games the 5 games, however, had very little value in the market and it would therefore make sense to include them in the bundle and to attract less-informed buyers. In other words, benefiting from information asymmetries.

Finally, the buyer of a complement bundle needs to be aware of the market price (or the range of it) of each item. Otherwise, he might end up paying more than the value of the sum of individual items.

Conclusion

Finding bundles and selling the pieces individually is a great way for young people to practice business. Luckily, there are always sellers in the market who are not looking to optimize their asking price, but appreciate the speed and comfort associated with selling bundles (i.e., dealing with one buyer). The actors with more time and less sensitivity to comfort can then take advantage of that condition to make some degree of profit.

EDIT: My friend Zeeshan pointed out that a business may actually prefer bundling even when the price is lower than in individual selling, if they assign a transaction cost (search, bargaining) to individual selling and the sum of transaction costs of selling individual items is higher than the sum of differences between the full price and bundle price of complements. (Sounds complicated but means that you’d spend too much time selling each item in comparison to profit.) For us as kids this didn’t matter since we had plenty of time, but for businesses the cost of selling does matter.

Joni

Polling social media users to predict election outcomes

english

The 45th President of the USA

Introduction

The problem of predicting election outcomes with social media is that the data, such as likes, are aggregate, whereas the election system is not — apart from simple majority voting, in which you only have the classic representativeness problem that Gallup solved in 1936. To solve the aggregation problem, one needs to segment the polling data so that it 1) corresponds to the prevailing election system and 2) accurately reflects the voters according to that system. For example, in the US presidential election each state has a certain number of electoral votes. To win, a candidate needs to reach 270 electoral votes.

Disaggregating the data

One obvious solution would be track like sources to profiles and determine the state based on publically given information by the user. This way we could filter out foreign likers as well. However, there are some issues of using likes as indicators of votes. Most importantly, “liking” something on social media does not in itself predict future behavior of an individual to a sufficient degree.

Therefore, I suggest here a simple polling method via social media advertising (Facebook Ads) and online surveys (Survey Monkey). Polling is partly facing the same aforementioned problem of future behavior than using likes as the overarching indicator which is why in the latter part of this article I discuss how these approaches could be combined.

At this point, it is important to acknowledge that online polling does have significant advantages relating to 1) anonymity, 2) cost, and 3) speed. That is, people may feel more at ease expressing their true sentiment to a machine than another human being. Second, the method has the potential to collect a sizeable sample in a more cost-effective fashion than calling. Finally, a major advantage is that due to the scalable nature of online data collection, the predictions can be updated faster than via call-based polling. This is particularly important because election cycles can involve quick and hectic turns. If the polling delay is from a few days to a week, it is too late to react to final week events of a campaign which may still carry a great weight in the outcome. In other words: the fresher the data, the better. (An added bonus is that by doing several samples, we could consider momentum i.e. growth speed of a candidate’s popularity into our model – albeit this can be achieved with traditional polling as well.)

Social media polling (SMP)

The method, social media polling or SMP, is described in the following picture.

Figure 1 Social media polling

The process:

1. Define segmentation criteria

First, we understand the election system. For example, in the US system every state has a certain weight expressed by its share of total electoral votes. There are 50 states, so these become our segmentation criteria. In case we deem appropriate to do further segmentation (e.g., gender, age), we can do so by creating additional segments which are reflected in target groups and surveys. (These sub-segments can also be analyzed in the actual data later on.)

2. Create unique surveys

Then, we create a unique survey for each segment so that the answers will be bucketed. The questions of the survey are identical – they are just behind different links to enable easy segmentation. We create a survey rather than use a visible poll (app) or picture-type of poll (“like if you vote Trump, heart if you vote Hillary”), because we want to avoid social desirability bias. A click on Facebook will lead the user to the unique survey of their segment, and their answers won’t be visible to the public.

3. Determine sample size

Calculating sample size is one of those things that will make your head spin, because there’s no easy answer as to what is a good sample size. Instead, “it depends.” However, we can use some heuristical rules to come up with decent alternatives in the context of elections. Consider two potential sample sizes.

  • Sample size: 500
  • Confidence level: 95%
  • Margin of error: +/- 4.4%
  • Sample size: 1,000
  • Confidence level: 95%
  • Margin of error: +/- 3%

These are seen as decent options among election pollsters. However, the margin of error is still quite sizeable in both of them. For example, if there are two candidates and their “true” support values are A=49%, B=51%, the large margin of error makes us easily go wrong. We could solve this by increasing the sample size, but the problem is that if we would like to reduce the margin of error from +/- 3% to say 1%, our required sample size grows dramatically (more precisely, with a 95% confidence and population size of 1M, it’s 9512 – unpractically high for a 50-state model). In other words, we have to accept the risk of wrong predictions in this type of situation.

All states have over 1,000,000 million people so each of them are considered as “large” populations (this is a mathematical thing – required sample size stabilizes after reaching a certain population size). Although US is characterized as one population, in the context of election prediction it’s actually several different populations (because we have independent states that vote). The procedure we apply is stratified random sampling in which the large general population is split into sub-groups. In practice, each sub-group requires its own sample, and therefore our approach requires a considerably larger sample size than a prediction that would only consider the whole population of the country. But, exactly because of this it should be more accurate.

So, with this lengthy explanation let us say we satisfice with a sample size of 500 per state. That would be 500×50=25,000 respondents. If it would cost 0.60$ to get a respondent via running Facebook ads, the cost for data collection would be 15,000$. For repetitive purposes, there are a few strategies. First, the sample size can be reduced for states that show a large difference between the candidates. In other words, we don’t need to collect a large number of respondents if we “know” the popularity difference between candidates is high. The important thing is that the situation is measured periodically, and sample sizes are flexibly adjusted according to known results. In a similar vein, we can increase the sample size for states where the competition is tight, to reduce the margin of error and therefore to increase the accuracy of our prediction. To my understanding, the opportunity of flexible sampling is not efficiently used by all pollsters.

4. Create Facebook campaigns

For each segment, a target group is created in Facebook Ads. The target group is used to advertise to that particular group; for example, the Michigan survey link is only shown to people from Michigan. That way, we minimize the risk of people outside the segment responding (however, they can excluded later on by IP). At this stage, creating attractive ads help keeping the cost per response low.

5. Run until sample size is reached

The administrator observes the results and stops the data collection once a segment has reached the desired sample size. When all segments are ready, the data collection is stopped.

6. Verify data

Based on IP, we can filter out respondent who do not belong to our particular geographical segment (=state).

Ad clicks can be used to determine sample representativeness by other factors – in other words, we can use Facebook’s campaign reports to segment age and gender information. If a particular group is under-represented, we can correct by altering the targeting towards them and resume data collection. However, we can also accept the under-representation if we have no valid reference model as to voting behavior of the sub-segments. For example, millennials might be under-represented in our data, but this might correspond with their general voting behavior as well – if we assume survey response rate corresponds with voting rate of the segments, then there is no problem.

7. Analyze results

The analysis process is straight-forward:

segment-level results x weights = prediction outcome

For example, in the US presidential election, segment-level results would be each state (who polls highest in the state is the winner there) which would be multiplied by the share of electoral votes of each state. The candidate who gets at least 270 votes is the predicted winner.

Other methods

Now, as for other methods, we can use behavioral data. I have previously argued behavioral data is a stronger indicator of future actions since it’s free from reporting bias. In other words, people say they do, but won’t end up doing. This is a very common problem, but in research and daily life.

To correct for that, we consider two approaches here:

1) The volume of likes method, which parallels a like to a vote (the more likes a candidate has in relation to another candidate, the more likely they are to win)

For this method to work, the “intensity of like”, i.e. its correlation to behavior should be determined, as not all likes are indicators of voting before. Likes don’t readily translate into votes, and there does not appear to be other information we can use to further examine their correlation (like is a like). We could, however, add contextual information of the person, or use rules such as “the more likes a person gives, the more likely (s)he is to vote for a candidate.”

Or, we could use another solution which I think is better:

2) Text analysis/mining

By analyzing social media comments of a person, we can better infer the intensity of their attitude towards a given topic (in this case, a candidate). If a person is using strongly positive vocabulary while referring to a candidate, then (s)he is more likely to vote for him/her than if the comments are negative or neutral. Notice that the mere positive-negative range is not enough, because positivity has degrees of intensity we have to consider. It is different to say “he is okay” than “omg he is god emperor”. The more excitement and associated feelings – which need to be carefully mapped and defined in the lexicon – a person exhibits, the more likely voting behavior is.

Limitations

As I mentioned, even this approach risks shortcoming of representativeness. First, the population on Facebook may not correspond with the population at large. It may be that the user base is skewed by age or some other factor. The choice of platform greatly influences the sample; for example, SnapChat users are on average younger than Facebook users, whereas Twitter users are more liberal. It is not clear whether Facebook’s user base represents a skewed sample or not. Second, the people voicing their opinions may be part of “vocal minority” as opposed to “silent majority”. In that case, we apply the logic of Gaussian standard distribution and assumed that the general population is more lenient to middle ground than the extremes — if, in addition, we would assume the central tendency to be symmetrical (meaning people in the middle are equally likely to tip into either candidate in a dual race), the analysis of extremes can still yield a valid prediction.

Another limitation may be that advertising targeting is not equivalent to random sampling, but has some kind of bias. That bias could emerge e.g. from 1) ad algorithm favoring a particular sub-set of the target group, i.e. showing more ads to them, whereas we would like to get all types of respondents; or 2) self-selection in which the respondents are of similar kind and again not representative to the population. Out of my head, I’d say number two is less of a problem because those people who show enough interest are also the ones who vote – remember, essentially we don’t need to care about the opinions of the people who don’t vote (that’s how elections work!). But number one could be a serious issue, because ad algorithm directs impressions based on responses and might identify some hidden pattern we have no control over. Basically, the only thing we can do is examine superficial segment information on the ad reports, and evaluate if the ad rotation was sufficient or not.

Combining different approaches

As both approaches – traditional polling and social media analysis – have their shortcomings and advantages, it might be feasible to combine the data under a mixed model which would factor in 1) count of likes, 2) count of comments with high affinity (=positive sentiment), and 3) polled preference data. A deduplicating process would be needed to not count twice those who liked and commented – this requires associating likes and comments to individuals. Note that the hybrid approach requires geographic information as well, because otherwise segmentation is diluted. Anyhow, taking user as the central entity could be a step towards determining voting propensity:

user (location, count of likes, count of comments, comment sentiment) –> voting propensity

Another way to see this is that enriching likes with relevant information (in regards to the election system) can help model social media data in a more granular and meaningful way.

Joni

Analyzing sentiment of topical dimensions in social media

english

Introduction

Had an interesting chat with Sami Kuusela from Underhood.co. Based on that, got some inspiration for an analysis framework which I’ll briefly describe here.

The model

Figure 1 Identifying and analyzing topical text material

The description

  1. User is interested in a given topic (e.g., Saara Aalto, or #saaraaalto). He enters the relevant keywords.
  2. The system runs a search and retrieves text data based on that (e.g., tweets).
  3. A cluster analysis (e.g., unsupervised topic model) identifies central themes from the data.
  4. Vectorization of representative keywords based on cluster analysis (e.g., 10 most popular) is run to extract words from a reference lexicon of words that have a similar meaning. This increases the generality of each topic cluster by associating them with other words that are close in the vector space.
  5. Text mining is run to refine the themes, i.e. placing the right text pieces under the correct themes. These are now called “dimensions”, since they describe the key dimensions of the text corpus (e.g., Saara’s voice, performance, song choices…).
  6. Sentiment analysis can be run to score the general (pos/neg/neu) or specific (e.g., emotions: joy, excitement, anger, disappointment, etc.) sentiment of each dimension. This could be done by using a machine-learning model with annotated training data (if the data-set is vast), or some sentiment lexicon (if the data-set is small).

I’m not sure whether steps 4 and 5 would improve the system’s ability to identify topics. It might be that a more general model is not required because the system already can detect the key themes. Would be interesting to test this with a developer.

Anyway, what’s the whole point?

The whole point is to acknowledge that each large topic naturally divides into small sub-topics, which are dimensions that people perceive relevant for that particular topic. For example, in politics it could be things like “economy”, “domestic policy”, “immigration”, “foreign policy”, etc. While the dimensions can have some consistency based on the field, e.g. all political candidates share some dimensions, the exact mix is likely to be unique, e.g. dimensions of social media texts relating to Trump are likely to be considerably different from those of Clinton. That’s why the analysis ultimately needs to be done case-by-case.

In any case, it is important to note that instead of giving a general sentiment or engagement score of, say a political candidate, we can use an approach like this to give a more in-depth or segmented view of them. This leads to better understanding of “what works or not”, which is information that can be used in strategic decision-making. In addition, the topic-segmented sentiment data could be associated with predictors in a predictive model, e.g. by multiplying each topic sentiment with the weight of the respective topic (assuming the topic corresponds with the predictor).

Limitations

This is just a conceptual model. As said, would be interesting to test it. There are many potential issues, such as handling with cluster overlap (some text pieces can naturally be placed into several clusters which can cause classification problems) and hierarchical issues (e.g., “employment” is under “economy” and should hence influence the former’s sentiment score).

Joni

Agile methods for predicting contest outcomes by social media analysis

english

People think, or seem to assume, that there is some magical machine that spits out accurate predictions of future events from social media data. There is not, and that’s why each credible analysis takes human time and effort. But therein also lies the challenge: when fast decisions are needed, time-taking analyses reduce agility. Real-time events would require real-time analysis, whereas data analysis is often cumbersome and time-taking effort, including data collection, cleaning, machine training, etc.

It’s a project for weeks or days, not for hours. All the practical issues of the analysis workflow make it difficult to provide accurate predictions at a fast pace (although there are other challenges as well).

An example is Underhood.co – they predicted Saara Aalto to win X-Factor UK based on social media sentiment, but ended up being wrong. While there are many potential reasons for this, my conclusion is that their indicators lack sufficient predictive power. They are too reliant on aggregates (in this case country-level data), and had a problematic approach to begin with – just like with any prediction, the odds change on the go as new information becomes available, so you should never predict the winner weeks ahead. Of course, theirs was just a publicity stunt where they hoped being right would prove the value of their service. Another example, of course, is the US election where prediction markets were completely wrong of the outcome. That was, according to my theory, because of wrong predictors – polls ask what is your preference or what you would do, whereas social media engagement shows what people do (in social media), and as such are closer to real behavior, hence better predictors.

Even if I do think human analysts are still needed in the near future, more solutions for quick collection and analysis of social media data are needed, especially to combine the human and machine work in the best possible way. Some of these approaches can be based on automation, but others can be methodological, such as quick definition of relevant social media outlets for sampling.

Here are some ideas I have been thinking of:

I. Data collection

  1. Quick definition of choice space (e.g., candidates in a political election, X-Factor contestants)
  2. Identification of related media social media outlets (i.e., communities, topic hashtags)
  3. Collecting sample (API, scraping, or copy-paste (crowdsourcing))

Each part is case-dependent and idiosyncratic – for whatever event, I’m thinking competitions here, you have to this work from scratch. Ultimately, you cannot get the whole Internet as your data, but you want the sample to be as representative as possible. For example, it was obvious that Twitter users showed much more negative sentiment towards Trump than Facebook users, and in both platforms you had supporter groups/topic concentrations that should first be identified before any data collection. Then, the actual data collection is tricky. People again seem to assume all data is easily accessible. It’s not – while Twitter and Facebook have an API, Youtube and Reddit don’t, for example. This means the comments that you use for predicting the outcome (by analyzing their relative share of the total as well as the strength of the sentiment beyond pos/neg) need to be fetched either by webscraping or manually copying them to a spreadsheet. Due to large volumes of data, crowdsourcing could be useful — e.g., setting up a Google Sheet where crowdworkers each paste the text material in clean format. The raw text content, e.g. tweets, Facebook comments, Reddit comments, is put in separate sheets for each candidate.

II. Data analysis

  1. Cluster visualization (defining clusters, visualizing their respective sizes (plot # of voters), breakdown by source platform and potential other factors)
  2. Manual training (classifying the sentiment, or “likelihood to vote”)
  3. Machine classification (calculating the number of likely voters)

In every statistical analysis, the starting point should be visualizing the data. This shows an aggregate “helicopter view” of the situation. Such a snapshot is useful also for demonstrating the results for the end-user, to let the data speak for itself. Candidates are bubbles in the chart, their sizes in respect to the number of calculated likely voters. The data could be broken down according to source platforms, or other factors, by using the candidate as a point of gravity for the cluster.

Likelihood to vote could be classified as a scale, not binary. That is, instead of saying “sentiment is positive: YES/NO”, we could say “How likely is the person to vote?” which is the same as asking how enthusiastic or engaged he or she is. Therefore, a scale is better, e.g. ranging from -5 (definitely not voting for this candidate) to +5 (definitely voting for this candidate). The manual training, which also could be done with the help of crowd, helps the machine classifier to improve its accuracy on the go. Based on training data, it would generalize classification to all material. Now, the material is bucketed so that each candidate is evaluated separately and the number of likely voters can be calculated. It is possible that the machine classifier could benefit from training input from both candidates, inasmuch the language showing positive and negative engagement is not significantly different.

It is important to note that negative sentiment does not really matter. What we are interested in is the number of likely voters. This is because of the election dynamics – it does not matter how poor a candidates aggregate sentiment is, i.e. the ratio between haters and sympathizers, as long as his or her number of likely voters is higher than that of the competition. This effect was evident in the recent US presidential election.

The crucial thing is keep the process alive during the whole election/competition period. There is no such point where it becomes certain that one loses and the other remains, although the divide can become substantial and therefore increase the accuracy of the prediction.

III. Presentation of results

  • constantly updating feed (à la Facebook video stream)
  • cluster visualization
  • search trend widget (source: Google Trends)
  • live updating predictions (manual training –> machine model)

The results could be shown as a form of dashboard to the end user. Search trend graph and the above mentioned cluster visualization could be viable alternatives. In addition, it would be interesting to see the count of voters evolving in time – in such a way that it, along with the visualization, could be “played back” to examine the development in time. In other words, interactive visualization. As noted, the prediction, or the count of likely votes, should update real-time as a result of combined human-machine work.

Conclusion and discussion

The idea behind development of more agile methods to use social media data to predict content outcomes is that the accuracy of the prediction is based on the choice of indicators rather than the finesse of the method. For example, complex Bayesian models falsely predicted Hillary Clinton would win the election. It’s not that the models were poorly built; they just used the wrong indicators, namely polling data. This is the usual case of ‘garbage in, garbage out’, and it shows that the choice of indicators is more important than technical features of the predictive model.

The choice of indicators should be done based on their predictive power and although I don’t have strict evidence on it, it intuitively makes sense that social media engagement is a stronger indicator in many instances than survey data, because it’s based on actual preferences instead of stated preferences. Social scientists know from long tradition of survey research that there are myriad of social effects reducing the reliability of data (e.g., social desirability bias). Those, I would argue, are much smaller issue in social media engagement data.

However, to be fair, there can be issues of bias in the social media engagement data. The major concern is low participation rate: a common heuristic is that 1/10 of participants actually contribute in writing, while the other 9/10 are readers whose real thoughts remain unknown. It’s then a question of how well does the vocal minority reflect the opinion of the silent majority. Or, in some cases, this is irrelevant for competitions if the overall voting share remains low. For example, if it’s 60% it is relative much more important to mobilize the active base than if voting was close to 100% where one would need a universal acceptance.

Another issue is the non-representative sampling. This is a concern when the voting takes place offline, and online data does not accurately reflect the voting of those who do not express themselves online. However, as social media participation is constantly increasing, this becomes less of a problem. In addition, compared to other methods of data collection – apart from stratified polling, perhaps – social media is likely to give a good result on competitive predictions because of their political nature. People who strongly support a candidate are more likely to be vocal about it, and the channel for voicing their opinion is the social media.

It is evident that the value of social media engagement as a predictor is currently underestimated, as proven by the large emphasis put on political polls and virtually zero discussion on social media data. As a direct consequence of this, those who are able to leverage the social media data in the proper way will gain competitive advantage, be it betting market, or any other purpose where prediction accuracy plays a key role. The prediction work will remain a hybrid effort by man and machine.

Joni

On complexity of explaining business failure

english
On complexity of explaining business failure

Introduction

During the research period for my dissertation based on startup failures, I realized there are multiple layers of failure factors associated with any given company (or, in reverse, success factors).

These are:

  1. generic business problems (e.g., cash-flow)
  2. individual-level problems (e.g., personal chemistry)
  3. company type problems (e.g., lack of funding for startups)
  4. business model problems (e.g., chicken-and-egg for platforms)

Only if you combine these multiple layers – or perspectives – can you understand why one business venture fails and another one succeeds. However, it is also a relative and interpretative task — I would argue there can be no objective dominant explanation but failure as an outcome is always a combination of reasons and cannot therefore be reduced into simple explanations at all.

A part of the reason for the complexity is the existence of parallel root causes.

For example,

  • A company can said to have failed because it runs out of money.
  • However, why did it run out of money? Because customers would not buy.
  • Why didn’t they buy? Because the product was bad.
  • Why was the product bad? Because the team failed to recognize true need in the market.
  • Why did they fail to recognize it? They lacked such competence.
  • Why did they lack the competence? Because they had not enough funding to acquire it.

Alas! We ended up making a circular argument. That can happen with any failure explanation, as can coming up with a different root cause. In a team of many, while also considering several stakeholders, it is common that people’s explanations to cause and effect vary a great deal. It is just a feature of social reality that we have a hard time of finding unambiguity.

Conclusion

In general, it is hard to dissect cause and effect. Human beings are inclined to form narratives where they choose a dominant explanation and discard others. By acknowledging a multi-layered view on failure, one can examine a business case by applying different lenses one after another. This includes interviewing different stakeholder groups and understanding multiple perspectives ranging from individual to structural issues.

There are no easy answers as to why a particular company succeeds or fails, even though the human mind and various success stories would lead you to believe so!