Joni

Hakukoneoptimointi toimittajan näkökulmasta

suomeksi
Hakukoneoptimointi toimittajan näkökulmasta

Johdanto

Media on riippuvainen mainostuloista. On jatkuva kiistelyn aihe, miten paljon toimittajien tulisi kirjoittaa juttuja, jotka saavat klikkejä ja näyttöjä suhteessa juttuihin, joiden yhteiskunnallinen merkitys on korkea. Nämä kaksi kun eivät aina kulje käsi kädessä.

Sosiaalisen median ja hakukoneiden merkitys toimittajan työssä

Käytännössä toimittajat joutuvat työnsä puolesta huomioimaan juttujen kiinnostavuuden sosiaalisessa mediassa. Tämä on tärkeää vaikka haluaisi kirjoittaa vain yhteiskunnallisesti tärkeistä aiheista, koska huomion saaminen kilpailevan sisällön keskellä on ainut tapa saada viestinsä läpi. Sosiaalisen median osalta on siis huomioitava sellaisia seikkoja kuin 1) vetävän otsikon muotoilu, 2) vetävän esikatselukuvan valinta, ja 3) object graph -metatietojen muokkaaminen (vaikuttavat siihen miltä linkki näyttää sosiaalisessa mediassa).

Sosiaalisen median lisäksi toimittajan on huomioitava hakukoneoptimointi, sillä somen ohella hakukoneet ovat tyypillisesti merkittävä liikenteen lähde. Mitä paremmin artikkelit on optimoitu, sitä todennäköisemmin ne sijoittuvat tärkeillä avainsanoilla korkealle Googlen tuloksissa.

Mitä toimittajan on tiedettävä hakukoneoptimoinnista?

Juttuja kirjoittaessaan toimittajan on huomioitava seuraavat seikat hakukoneiden kannalta:

  1. Avainsanat – kaikessa pitää lähteä siitä, että tunnistetaan oikeat avainsanat, joilla artikkelin halutaan löytyvän. Tässä kannattaa hyödyntää avainsanatutkimuksen työkaluja, kuten Googlen avainsanatyökalua (Keyword Planner).
  2. Pääotsikko ja väliotsikot – valittujen avainsanojen tulee näkyä jutun otsikossa ja väliotsikoissa. Väliotsikot (h2) ovat tärkeitä, sillä ne luovat hakukoneelle ymmärrettävissä olevan rakenteen, sekä tukevat käyttäjien luontaista, skannaukseen pohjautuvaa verkkolukemista.
  3. Linkit – jutussa tulee olla linkkejä muihin lähteisiin oikeanlaisilla ankkuriteksteillä merkittynä. Ei “lisää tietoa lisäravinteista saat klikkaamalla tänne“, vaan “esimerkiksi Helsingin Sanomat on kirjoittanut useita juttuja lisäravinteista“.
  4. Teksti – kappaleiden tulee olla lyhyitä, sisältää selkeästi luettavissa olevaa kieltä ja optimoitavia avainsanoja sopiva määrä. Sopivan määrän mitta on se, että avainsanoja on luonnolliselta tuntuva määrä – liikaa toistoa ei saa olla, koska Google voi tulkita sen manipulointiyritykseksi.

Ennen kaikkea kirjoitetun artikkelin tulee olla sekä käyttäjälle miellyttävä lukea, että hakukoneelle helposti ymmärrettävä. Nämä kaksi seikkaa yhdistämällä hakukoneoptimoinnin perusteet ovat kunnossa.

Joni

Affinity analysis in political social media marketing – the missing link

english

Introduction. Hm… I’ve figured out how to execute successful political marketing campaign on social media [1], but one link is missing still. Namely, applying affinity analysis (cf. market basket analysis).

Discounting conversions. Now, you are supposed to measure “conversions” by some proxy – e.g., time spent on site, number of pages visited, email subscription. Determining which measurable action is the best proxy for likelihood of voting is a crucial sub-problem, which you can approach with several tactics. For example, you can use the closest action to final conversion (vote), i.e. micro-conversion. This requires you have an understanding of the sequence of actions leading to final conversion. You could also use a relative cut-off point; e.g. the nth percentile with the highest degree of engagement is considered as converted.

Anyhow, this is very important because once you have secured a vote, you don’t want to waste your marketing budget by showing ads to people who already have decided to vote for your candidate. Otherwise, you risk “preaching to the choir”. Instead, you want to convert as many uncertain voters to voters as possible, by using different persuasion tactics.

Affinity analysis. The affinity analysis can be used to accomplish this. In ecommerce, you would use it as a basis for recommendation engine for cross-selling or up-selling (“customers who bought this item also bought…” à la Amazon). First you detemine which sets of products are most popular, and then show those combinations to buyers interested in any item belonging to that set.

In political marketing, affinity analysis means that because a voter is interested in topic A, he’s also interested in topic B. Therefore, we will show him information on topic B, given our extant knowledge his interests, in order to increase likelihood of conversion. This is a form of associative

Operationalization. But operationalizing this is where I’m still in doubt. One solution could be building an association matrix based on website behavior, and then form corresponding retargeting audiences (e.g., website custom audiences on Facebook). The following picture illustrates the idea.

Figure 1 Example of affinity analysis (1=Visited page, 0=Did not visit page)

For example, we can see that themes C&D and A&F commonly occur together, i.e. people visit those sub-pages in the campaign site. You can validate this by calculating correlations between all pairs. When you set your data in binary format (0/1), you can use Pearson correlation for the calculations.

Facebook targeting. Knowing this information, we can build target audiences on Facebook, e.g. “Visited /Theme_A; NOT /Theme_F; NOT /confirmation”, where confirmation indicates conversion. Then, we would show ads on Theme F to that particular audience. In practice, we could facilitate the process by first identifying the most popular themes, and then finding the associated themes. Once the user has been exposed to a given theme, and did not convert, he needs to be exposed to another theme (with the highest association score). The process is continued until themes run out, or the user converts, which ever comes first. Applying the earlier logic of determining proxy for conversion, visiting all theme sub-pages can also be used as a measure for conversion.

Finally, it is possible to use more advanced methods of associative learning. That is, we could determine that {Theme A, Theme F} => {Theme C}, so that themes A and B predict interest in theme C. However, it is more appropriate to predict conversion rather than interest in other themes, because ultimately we’re interested in persuading more voters.

Footnotes

[1] Posts in Finnish:

https://www.facebook.com/joni.salminen.33/posts/10212240031455606

https://www.facebook.com/joni.salminen.33/posts/10212237230465583

Joni

Total remarketing – the concept

english

Here’s a definition:

Total remarketing is remarketing in all possible channels with all possible list combinations.

Channels:

  • Programmatic display networks (e.g., Adroll)
  • Google (GDN, RLSA)
  • Facebook (Website Custom Audience)
  • Facebook (Video viewers / Engaged with ads)
  • etc.

How to apply:

  1. Test 2-3 different value propositions per group
  2. Prefer up-selling and cross-selling over discounts (the goal is to increase AOV, not reduce it; e.g. you can include an $20 gift voucher when basket size exceeds $100)
  3. Configure well; exclude those who bought; use information you have to improve remarketing focus (e.g. time of site, products or categories visited — the same remarketing for all groups is like the same marketing for all groups)
  4. Consider automation options (dynamic retargeting; behavior based campaign suggestions for the target)
Joni

Koneoppimisen jämähtämisongelma

suomeksi

Konekin voi joskus jäätyä.

Kone oppii kuten ihminen: empiirisen havaintoaineiston (= datan) perusteella.

Tästä syystä samoin kuin ihmisen on hankala oppia pois huonoista tavoista ja asenteista (ennakkoluulot, stereotypiat), on koneen vaikea oppia nopeasti pois virheellisestä tulkinnasta.

Kysymys ei ole poisoppimisesta, mikä lienee monessa tapauksessa mahdotonta, vaan uuden oppimisesta, niin että vanhat muistirakenteet (= mallin ominaisuudet) korvataan tehokkaasti uusilla. Tehokkaasti, koska mitä kauemmin vanhat epäpätevät mallit ovat käytössä, sitä enemmän koneellinen päätöksenteko ehtii tehdä vahinkoa. Ongelma korostuu laajan mittakaavan päätöksentekojärjestelmässä, jossa koneen vastuulla voi olla tuhansia tai jopa miljoonia päätöksiä lyhyen ajan sisällä.

Esimerkki: Kone on oppinut diagnosoimaan sairauden X oireiden {x} perusteella. Tuleekin uutta tutkimustietoa, jonka mukaan sairaus X liitetään oireisiin {y}, jotka ovat lähellä oireita {x} mutta eivät identtisiä. Koneelta kestää kauan oppia uusi assosiaatio, jos sen pitää tunnistaa itse eri oireiden yhteydet sairauksiin samalla unohtaen vanhoja malleja.

Miten tätä prosessia voidaan nopeuttaa? Ts. säilyttää koneoppimisen edut (= löytää oikeat ominaisuudet, esim. oireyhdistelmät) ihmistä nopeammin, mutta ihminen voi kuitenkin ohjatusti korjata koneen oppimaa mallia paremman tiedon varassa.

Teknisesti ongelman voisi mieltää ns. bandit-algoritmin kautta: Jos algoritmi toteuttaa sekä eksploraatiota että eksploitaatiota, voisi ongelmaa pyrkiä ratkomaan rajoittamalla hakuavaruutta. Koneelle voisi myös syöttää tarpeeksi evidenssiä, jotta se oppisi suhteen nopeasti – ts. jos kone ei ole löytänyt samaa asiaa kuin tietty tieteellinen tutkimus, tämän tieteellisen tutkimuksen dataa voisi käyttää kouluttamaan konetta niin paljon (jopa ylipainottamalla, jos se suhteessa hukkuu muuhun dataan) että luokittelumalli korjautuu.

Joni

In 2016, Facebook bypassed Google in ads. Here’s why.

english
In 2016, Facebook bypassed Google in ads. Here’s why.

Introduction

The gone 2016 was the first year I thought Facebook ends up beating Google in the ad race, despite the fact Google still dominates in revenue ($67Bn vs. $17Bn in 2015). I’ll explain why.

First, consider that Google’s growth is restricted by three things:

  1. natural demand
  2. keyword volumes, and
  3. approach of perfect market.

More demand than supply

First, at any given time there is a limited number of people interested in a product/service. The interest can be of purchase intent or just general interest, but either way it translates into searches. Each search is an impression that Google can sell to advertisers through its AdWords bidding. The major problem is this: even when I’d like to spend more money on AdWords, I cannot. There is simply not enough search volume to satisfy my budget (in many cases there is, but in highly targeted and profitable campaigns many times there isn’t). So, the excess budget I will spend elsewhere where the profitable ad inventory is not limited (that is, Facebook at the moment).

Limited growth

According to estimates, search volume is growing by 10-15% annually [1]. Yet, Google’s revenue is expected to grow even by 26% [2]. Over the year, Google’s growth rate in terms of search volume has substantially decreased, although this is perceived as a natural phenomenon (after trillion searches it’s hard to keep growing double digits). In any case, the aforementioned dynamics reflect to search volumes – when the volumes don’t grow much and new advertisers keep entering the ad auction, there is more competition over the same searches. In other words, supply stays stable but demand increases, resulting in more intense bid wars.

Approaching perfect market

For a long time now, I’ve added +15% increase in internal budgeting for AdWords, and last year that was hard to maintain. Google is still a profitable channel, but the advertisers’ surplus is decreasing year by year, incentivizing them to look for alternative channels. While Google is restrained by its natural search volumes, Facebook’s ad inventory (=impressions) are practically limitless. The closer AdWords gets to a perfect market (=no economic rents), the less attractive it is for savvy marketers. Facebook is less exploited, and allows rents.

What will Google do?

Finally, I don’t like the Alphabet business. Already in the beginning it signals to investors that Google is in “whatever comes to mind” business instead of strategic focus on search. Most likely Alphabet ends up draining resources from the mother company, producing loss and taking human capital off from succeeding in online ads business (which is where their money comes from). In contrast, Facebook is very focused on social; it buys off competitors and improves fast. That said, I do have to recognize that Google’s advertising system is still much better than that of Facebook, and in fact still the best in the world. But momentum seems to be shifting to Facebook’s side.

Conclusion

The maximum number of impressions (=ad inventory) of Facebook is much higher than that of Google, because Google is limited by natural demand and Facebook is not. In the marketplace, there is always more supply than demand which is why advertisers want to spend more than what Google enables. These factors combined with Facebook’s continously increasing ability to match interested people with the right type of ads, makes Facebook’s revenue potential much bigger than Google’s.

From advertiser’s perspective, Facebook and Google both are and are not competitors. They are competitors for ad revenue, but they are not competitors in the online channel mix. Because Google is for demand capture and Facebook for demand creation, most marketers want to include both in their channel mix. This means Google’s share of online ad revenue might decrease, but a rational online advertisers will not drop its use so it will remain as a (less important) channel into foreseeable future.

References

[1] http://www.internetlivestats.com/google-search-statistics/

[2] http://venturebeat.com/2016/09/27/4-graphs-show-the-state-of-facebook-and-googles-revenue-dominance/

Joni

Buying and selling complement bundles: When individual selling maximizes profit

english

Introduction

When we were young, me and my brother used to buy and sell game consoles on Huuto.net (local eBay) and on various gamer discussion forums (Konsolifin BBS, for example). We didn’t have much money, so this was a great way to earn some cash — plus it taught us some useful business lessons along the years.

What we would often do was to buy a bundle (console+games), break it apart and sell the pieces individually. At that time we didn’t know anything about economics, but intuitively it felt the right thing to do. Indeed, we would always make money with that strategy, as we knew the market prices (or their range) of each individual item.

Looking back, I can now try and explain with economic terms why this was a successful strategy. In other words, why individual selling of items in a complement bundle is a winning strategy.

Why does individual selling provide a better profit than the selling of a bundle?

Let’s first define the concepts.

  • individual selling = buy complement bundle, break it apart and sell individual pieces
  • a complement bundle = a central unit and its complements (e.g., a game console and games)

Briefly, it is so because the tastes of the market are randomly distributed and do not align with the exact contents of the bundle. It then follows that the exact set of complements does not maximize any individual’s utility, so they will bid accordingly (e.g., “I like those two games (out of five), but not the three so I don’t put much value to them”) and the market price of the bundle will set below the full value of its individual parts.

In contrast, by breaking apart and selling individually each complement can be appraised at full value (“I like that game, so I’ll pay its real value”). In other words, the seller will need to find a buyer for each piece who appreciates that piece to its full value (=has a preference for it).

The intuition

Tastes and preferences differ, which reflects to individuals’ utility functions and therefore willingness to pay. Selling a bundle is a compromise from the perspective of the seller – he compromises his full price, because the buyer is willing to pay only according to his preferences (utility function) which do not match completely with the contents of the bundle.

Limitations

There are two exceptions I can think of:

1) Highly valued complements (or homogeneous tastes)

Say all the complements are of high value in the market (e.g., popular hit games). Then, a large portion of the market assigns full value to them, and the bundle sets close or equal to the sum of individual full prices. Similarly, if all the buyers value the complements in a similar way, i.e. their taste is homogeneous, the randomness required for the individual selling to perform does not exist.

2) Information asymmetry

Sometimes, you can get a higher price by selling a bundle than by selling the individual pieces. We would use this strategy when the value of complements is very little to an “expert”. Then, if you were less experienced you could see a game console + 5 games the 5 games, however, had very little value in the market and it would therefore make sense to include them in the bundle and to attract less-informed buyers. In other words, benefiting from information asymmetries.

Finally, the buyer of a complement bundle needs to be aware of the market price (or the range of it) of each item. Otherwise, he might end up paying more than the value of the sum of individual items.

Conclusion

Finding bundles and selling the pieces individually is a great way for young people to practice business. Luckily, there are always sellers in the market who are not looking to optimize their asking price, but appreciate the speed and comfort associated with selling bundles (i.e., dealing with one buyer). The actors with more time and less sensitivity to comfort can then take advantage of that condition to make some degree of profit.

EDIT: My friend Zeeshan pointed out that a business may actually prefer bundling even when the price is lower than in individual selling, if they assign a transaction cost (search, bargaining) to individual selling and the sum of transaction costs of selling individual items is higher than the sum of differences between the full price and bundle price of complements. (Sounds complicated but means that you’d spend too much time selling each item in comparison to profit.) For us as kids this didn’t matter since we had plenty of time, but for businesses the cost of selling does matter.

Joni

Polling social media users to predict election outcomes

english

The 45th President of the USA

Introduction

The problem of predicting election outcomes with social media is that the data, such as likes, are aggregate, whereas the election system is not — apart from simple majority voting, in which you only have the classic representativeness problem that Gallup solved in 1936. To solve the aggregation problem, one needs to segment the polling data so that it 1) corresponds to the prevailing election system and 2) accurately reflects the voters according to that system. For example, in the US presidential election each state has a certain number of electoral votes. To win, a candidate needs to reach 270 electoral votes.

Disaggregating the data

One obvious solution would be track like sources to profiles and determine the state based on publically given information by the user. This way we could filter out foreign likers as well. However, there are some issues of using likes as indicators of votes. Most importantly, “liking” something on social media does not in itself predict future behavior of an individual to a sufficient degree.

Therefore, I suggest here a simple polling method via social media advertising (Facebook Ads) and online surveys (Survey Monkey). Polling is partly facing the same aforementioned problem of future behavior than using likes as the overarching indicator which is why in the latter part of this article I discuss how these approaches could be combined.

At this point, it is important to acknowledge that online polling does have significant advantages relating to 1) anonymity, 2) cost, and 3) speed. That is, people may feel more at ease expressing their true sentiment to a machine than another human being. Second, the method has the potential to collect a sizeable sample in a more cost-effective fashion than calling. Finally, a major advantage is that due to the scalable nature of online data collection, the predictions can be updated faster than via call-based polling. This is particularly important because election cycles can involve quick and hectic turns. If the polling delay is from a few days to a week, it is too late to react to final week events of a campaign which may still carry a great weight in the outcome. In other words: the fresher the data, the better. (An added bonus is that by doing several samples, we could consider momentum i.e. growth speed of a candidate’s popularity into our model – albeit this can be achieved with traditional polling as well.)

Social media polling (SMP)

The method, social media polling or SMP, is described in the following picture.

Figure 1 Social media polling

The process:

1. Define segmentation criteria

First, we understand the election system. For example, in the US system every state has a certain weight expressed by its share of total electoral votes. There are 50 states, so these become our segmentation criteria. In case we deem appropriate to do further segmentation (e.g., gender, age), we can do so by creating additional segments which are reflected in target groups and surveys. (These sub-segments can also be analyzed in the actual data later on.)

2. Create unique surveys

Then, we create a unique survey for each segment so that the answers will be bucketed. The questions of the survey are identical – they are just behind different links to enable easy segmentation. We create a survey rather than use a visible poll (app) or picture-type of poll (“like if you vote Trump, heart if you vote Hillary”), because we want to avoid social desirability bias. A click on Facebook will lead the user to the unique survey of their segment, and their answers won’t be visible to the public.

3. Determine sample size

Calculating sample size is one of those things that will make your head spin, because there’s no easy answer as to what is a good sample size. Instead, “it depends.” However, we can use some heuristical rules to come up with decent alternatives in the context of elections. Consider two potential sample sizes.

  • Sample size: 500
  • Confidence level: 95%
  • Margin of error: +/- 4.4%
  • Sample size: 1,000
  • Confidence level: 95%
  • Margin of error: +/- 3%

These are seen as decent options among election pollsters. However, the margin of error is still quite sizeable in both of them. For example, if there are two candidates and their “true” support values are A=49%, B=51%, the large margin of error makes us easily go wrong. We could solve this by increasing the sample size, but the problem is that if we would like to reduce the margin of error from +/- 3% to say 1%, our required sample size grows dramatically (more precisely, with a 95% confidence and population size of 1M, it’s 9512 – unpractically high for a 50-state model). In other words, we have to accept the risk of wrong predictions in this type of situation.

All states have over 1,000,000 million people so each of them are considered as “large” populations (this is a mathematical thing – required sample size stabilizes after reaching a certain population size). Although US is characterized as one population, in the context of election prediction it’s actually several different populations (because we have independent states that vote). The procedure we apply is stratified random sampling in which the large general population is split into sub-groups. In practice, each sub-group requires its own sample, and therefore our approach requires a considerably larger sample size than a prediction that would only consider the whole population of the country. But, exactly because of this it should be more accurate.

So, with this lengthy explanation let us say we satisfice with a sample size of 500 per state. That would be 500×50=25,000 respondents. If it would cost 0.60$ to get a respondent via running Facebook ads, the cost for data collection would be 15,000$. For repetitive purposes, there are a few strategies. First, the sample size can be reduced for states that show a large difference between the candidates. In other words, we don’t need to collect a large number of respondents if we “know” the popularity difference between candidates is high. The important thing is that the situation is measured periodically, and sample sizes are flexibly adjusted according to known results. In a similar vein, we can increase the sample size for states where the competition is tight, to reduce the margin of error and therefore to increase the accuracy of our prediction. To my understanding, the opportunity of flexible sampling is not efficiently used by all pollsters.

4. Create Facebook campaigns

For each segment, a target group is created in Facebook Ads. The target group is used to advertise to that particular group; for example, the Michigan survey link is only shown to people from Michigan. That way, we minimize the risk of people outside the segment responding (however, they can excluded later on by IP). At this stage, creating attractive ads help keeping the cost per response low.

5. Run until sample size is reached

The administrator observes the results and stops the data collection once a segment has reached the desired sample size. When all segments are ready, the data collection is stopped.

6. Verify data

Based on IP, we can filter out respondent who do not belong to our particular geographical segment (=state).

Ad clicks can be used to determine sample representativeness by other factors – in other words, we can use Facebook’s campaign reports to segment age and gender information. If a particular group is under-represented, we can correct by altering the targeting towards them and resume data collection. However, we can also accept the under-representation if we have no valid reference model as to voting behavior of the sub-segments. For example, millennials might be under-represented in our data, but this might correspond with their general voting behavior as well – if we assume survey response rate corresponds with voting rate of the segments, then there is no problem.

7. Analyze results

The analysis process is straight-forward:

segment-level results x weights = prediction outcome

For example, in the US presidential election, segment-level results would be each state (who polls highest in the state is the winner there) which would be multiplied by the share of electoral votes of each state. The candidate who gets at least 270 votes is the predicted winner.

Other methods

Now, as for other methods, we can use behavioral data. I have previously argued behavioral data is a stronger indicator of future actions since it’s free from reporting bias. In other words, people say they do, but won’t end up doing. This is a very common problem, but in research and daily life.

To correct for that, we consider two approaches here:

1) The volume of likes method, which parallels a like to a vote (the more likes a candidate has in relation to another candidate, the more likely they are to win)

For this method to work, the “intensity of like”, i.e. its correlation to behavior should be determined, as not all likes are indicators of voting before. Likes don’t readily translate into votes, and there does not appear to be other information we can use to further examine their correlation (like is a like). We could, however, add contextual information of the person, or use rules such as “the more likes a person gives, the more likely (s)he is to vote for a candidate.”

Or, we could use another solution which I think is better:

2) Text analysis/mining

By analyzing social media comments of a person, we can better infer the intensity of their attitude towards a given topic (in this case, a candidate). If a person is using strongly positive vocabulary while referring to a candidate, then (s)he is more likely to vote for him/her than if the comments are negative or neutral. Notice that the mere positive-negative range is not enough, because positivity has degrees of intensity we have to consider. It is different to say “he is okay” than “omg he is god emperor”. The more excitement and associated feelings – which need to be carefully mapped and defined in the lexicon – a person exhibits, the more likely voting behavior is.

Limitations

As I mentioned, even this approach risks shortcoming of representativeness. First, the population on Facebook may not correspond with the population at large. It may be that the user base is skewed by age or some other factor. The choice of platform greatly influences the sample; for example, SnapChat users are on average younger than Facebook users, whereas Twitter users are more liberal. It is not clear whether Facebook’s user base represents a skewed sample or not. Second, the people voicing their opinions may be part of “vocal minority” as opposed to “silent majority”. In that case, we apply the logic of Gaussian standard distribution and assumed that the general population is more lenient to middle ground than the extremes — if, in addition, we would assume the central tendency to be symmetrical (meaning people in the middle are equally likely to tip into either candidate in a dual race), the analysis of extremes can still yield a valid prediction.

Another limitation may be that advertising targeting is not equivalent to random sampling, but has some kind of bias. That bias could emerge e.g. from 1) ad algorithm favoring a particular sub-set of the target group, i.e. showing more ads to them, whereas we would like to get all types of respondents; or 2) self-selection in which the respondents are of similar kind and again not representative to the population. Out of my head, I’d say number two is less of a problem because those people who show enough interest are also the ones who vote – remember, essentially we don’t need to care about the opinions of the people who don’t vote (that’s how elections work!). But number one could be a serious issue, because ad algorithm directs impressions based on responses and might identify some hidden pattern we have no control over. Basically, the only thing we can do is examine superficial segment information on the ad reports, and evaluate if the ad rotation was sufficient or not.

Combining different approaches

As both approaches – traditional polling and social media analysis – have their shortcomings and advantages, it might be feasible to combine the data under a mixed model which would factor in 1) count of likes, 2) count of comments with high affinity (=positive sentiment), and 3) polled preference data. A deduplicating process would be needed to not count twice those who liked and commented – this requires associating likes and comments to individuals. Note that the hybrid approach requires geographic information as well, because otherwise segmentation is diluted. Anyhow, taking user as the central entity could be a step towards determining voting propensity:

user (location, count of likes, count of comments, comment sentiment) –> voting propensity

Another way to see this is that enriching likes with relevant information (in regards to the election system) can help model social media data in a more granular and meaningful way.

Joni

Analyzing sentiment of topical dimensions in social media

english

Introduction

Had an interesting chat with Sami Kuusela from Underhood.co. Based on that, got some inspiration for an analysis framework which I’ll briefly describe here.

The model

Figure 1 Identifying and analyzing topical text material

The description

  1. User is interested in a given topic (e.g., Saara Aalto, or #saaraaalto). He enters the relevant keywords.
  2. The system runs a search and retrieves text data based on that (e.g., tweets).
  3. A cluster analysis (e.g., unsupervised topic model) identifies central themes from the data.
  4. Vectorization of representative keywords based on cluster analysis (e.g., 10 most popular) is run to extract words from a reference lexicon of words that have a similar meaning. This increases the generality of each topic cluster by associating them with other words that are close in the vector space.
  5. Text mining is run to refine the themes, i.e. placing the right text pieces under the correct themes. These are now called “dimensions”, since they describe the key dimensions of the text corpus (e.g., Saara’s voice, performance, song choices…).
  6. Sentiment analysis can be run to score the general (pos/neg/neu) or specific (e.g., emotions: joy, excitement, anger, disappointment, etc.) sentiment of each dimension. This could be done by using a machine-learning model with annotated training data (if the data-set is vast), or some sentiment lexicon (if the data-set is small).

I’m not sure whether steps 4 and 5 would improve the system’s ability to identify topics. It might be that a more general model is not required because the system already can detect the key themes. Would be interesting to test this with a developer.

Anyway, what’s the whole point?

The whole point is to acknowledge that each large topic naturally divides into small sub-topics, which are dimensions that people perceive relevant for that particular topic. For example, in politics it could be things like “economy”, “domestic policy”, “immigration”, “foreign policy”, etc. While the dimensions can have some consistency based on the field, e.g. all political candidates share some dimensions, the exact mix is likely to be unique, e.g. dimensions of social media texts relating to Trump are likely to be considerably different from those of Clinton. That’s why the analysis ultimately needs to be done case-by-case.

In any case, it is important to note that instead of giving a general sentiment or engagement score of, say a political candidate, we can use an approach like this to give a more in-depth or segmented view of them. This leads to better understanding of “what works or not”, which is information that can be used in strategic decision-making. In addition, the topic-segmented sentiment data could be associated with predictors in a predictive model, e.g. by multiplying each topic sentiment with the weight of the respective topic (assuming the topic corresponds with the predictor).

Limitations

This is just a conceptual model. As said, would be interesting to test it. There are many potential issues, such as handling with cluster overlap (some text pieces can naturally be placed into several clusters which can cause classification problems) and hierarchical issues (e.g., “employment” is under “economy” and should hence influence the former’s sentiment score).

Joni

CLV-pohjainen markkinointibudjetin allokaatio alustojen välillä

suomeksi

Odottelen, että ilmaantuisi helppo tapa tehdä järkevää markkinointia. Nythän tilanne on niin, että markkinointibudjetit allokoidaan joko fiiliksen mukaan (lehdet, TV, radio) taikka konversiokustannusten mukaan (digi). Eli toisin sanoen jälkimmäisessä tapauksessa lasketaan CPA:ta, kun pitäisi laskea CLV:tä.

Miksi CLV:tä ei lasketa?

CLV:tä ei lasketa, koska se on niin hankalaa. Pitäisi erottaa ostohistoria ja kytkeä jokainen konversio asiakkaaseen (luokittelu: uusi/vanha), jotta saataisiin elinkaariarvo. Tieto on piilossa omissa järjestelmissä (CRM/SQL) ja sen kytkeminen Web-analytiikkaan vaatii räätälöintiä. Kertaluonteisen konversion, tyypillisesti myynnin, hintaa on helppo seurata yhdellä skriptin pätkällä ja siksi sitä käytetään budjetin allokoimisen pohjana. Tyydytään saatavilla olevaan dataan, koska se on helpointa.

Miksi CLV pitäisi laskea?

CLV eli asiakkaan elinkaariarvo määrittää monta asiaa. Asiakkaat voidaan jakaa kannattavuuden mukaan eri segmentteihin, joille tarjotaan eri palvelutaso ja ehdotetaan erilaisia tuotteita taikka annetaan lisäetuja kiitoksena uskollisuudesta. Elinkaariarvo voi vaihdella kanavoittain, ikäluokittain, sijainnin mukaan, yms. Erottelevat tekijät olisi tärkeä tunnistaa, jotta kannattavimmille asiakkaille yhteiset piirteet ohjaisivat markkinoinnin kohdistamista.

CLV-laskelmien jalkauttaminen

Parhaimmillaan kohdentamisen voi automatisoida. Ainakin kolme tapaa tulee mieleen:

1. Budjetin/bidin säätö CLV-estimoinnin perusteella: kannattavimmille kohderyhmille osoitetaan enemmän budjettia taikka niiden huomiosta tarjotaan mainoshuutokaupassa tietty ylikerroin, sillä perusteella että ne ovat kannattavampia. Tämän voi toteuttaa joko vain nykyiselle asiakaskannalle, tai laajemmalle kohderyhmälle jolla on kannattavinta asiakasryhmää vastaavat piirteet (vrt. “lookalike”-logiikka). Erilaiset dataratkaisut (DMP) mahdollistavat teoriassa tietyn kohderyhmän saavuttamisen mediariippumattomasti, vaikka käytännössä eri alustat joutuu konfiguroimaan erikseen. Mikäli rakennetaan väliohjelmisto (à la Smartly), voidaan budjetointi/bid-päätökset tehdä joustavasti niin että ohjelmisto kerää tiedot alustoilta, yhdistää yrityksen omaan dataan, päivittää CLV-laskelman ja sen mukaisesti tekee em. päätökset.

2. Erillinen palvelukokemus: tarjotaan kannattavimmille asiakkaille eri tuotesuositukset tai lisäpalvelut verkossa taikka sen ulkopuolella. Ts. tyypillinen verkkosivun dynaaminen personointi, mutta tässä tapauksessa CLV-arvon perusteella.

3. Erilliset kampanjat/tarjoukset: kohdistetaan sähköposti- ja muuta suoramarkkinointia joka on räätälöity kannattavimmille asiakasryhmille. Ts. dynaamiset listat, jotka päivittyvät CLV-laskelman perusteella, ja kullekin listalle räätälöidyt sisällöt/polut.

Filosofinen tausta on, että pyritään vähentämään hukkaa tavoittamalla vain sitoutuneet asiakkaat. Aika voidaan käyttää siihen, että mietitään miten palvella heitä entistä paremmin, sen sijaan että jahdataan jatkuvasti uusia asiakkaita. Tuloksen on markkinointi-investointien tuottavuuden (ROI) kasvu. Riskinä on häirikönti – viestinnän pitää tuntua oikea-aikaiselta ja sisällön puolesta osuvalta. Liian usein toteutukset kuitenkin ovat tylsiä ja eivät pureudu tarpeeksi kohderyhmän aitoihin motiiveihin. Sen vuoksi kampanjoiden saamaa responssia ja etenkin negatiivisia “pisteitä” tulee seurata aktiivisesti.

Johtopäätös

Markkinointibudjettien allokointi on yhä edelleen mutun varassa. Toimialalla ei ole tarjolla helppoja ratkaisuja ongelman ratkaisuun. Tulevaisuudessa CLV-laskelmat mahdollistava teknologia toivottavasti yleistyy ja tulee kaikkien saataville kohtuuhintaan. Alalla on vielä tilaa useammalle startupille, vaikka lopulta Facebookin ja Googlen kaltaiset toimijat tulevat levittämään CLV-laskelmat kaikkien mainostajien saataville.

Joni

Agile methods for predicting contest outcomes by social media analysis

english

People think, or seem to assume, that there is some magical machine that spits out accurate predictions of future events from social media data. There is not, and that’s why each credible analysis takes human time and effort. But therein also lies the challenge: when fast decisions are needed, time-taking analyses reduce agility. Real-time events would require real-time analysis, whereas data analysis is often cumbersome and time-taking effort, including data collection, cleaning, machine training, etc.

It’s a project for weeks or days, not for hours. All the practical issues of the analysis workflow make it difficult to provide accurate predictions at a fast pace (although there are other challenges as well).

An example is Underhood.co – they predicted Saara Aalto to win X-Factor UK based on social media sentiment, but ended up being wrong. While there are many potential reasons for this, my conclusion is that their indicators lack sufficient predictive power. They are too reliant on aggregates (in this case country-level data), and had a problematic approach to begin with – just like with any prediction, the odds change on the go as new information becomes available, so you should never predict the winner weeks ahead. Of course, theirs was just a publicity stunt where they hoped being right would prove the value of their service. Another example, of course, is the US election where prediction markets were completely wrong of the outcome. That was, according to my theory, because of wrong predictors – polls ask what is your preference or what you would do, whereas social media engagement shows what people do (in social media), and as such are closer to real behavior, hence better predictors.

Even if I do think human analysts are still needed in the near future, more solutions for quick collection and analysis of social media data are needed, especially to combine the human and machine work in the best possible way. Some of these approaches can be based on automation, but others can be methodological, such as quick definition of relevant social media outlets for sampling.

Here are some ideas I have been thinking of:

I. Data collection

  1. Quick definition of choice space (e.g., candidates in a political election, X-Factor contestants)
  2. Identification of related media social media outlets (i.e., communities, topic hashtags)
  3. Collecting sample (API, scraping, or copy-paste (crowdsourcing))

Each part is case-dependent and idiosyncratic – for whatever event, I’m thinking competitions here, you have to this work from scratch. Ultimately, you cannot get the whole Internet as your data, but you want the sample to be as representative as possible. For example, it was obvious that Twitter users showed much more negative sentiment towards Trump than Facebook users, and in both platforms you had supporter groups/topic concentrations that should first be identified before any data collection. Then, the actual data collection is tricky. People again seem to assume all data is easily accessible. It’s not – while Twitter and Facebook have an API, Youtube and Reddit don’t, for example. This means the comments that you use for predicting the outcome (by analyzing their relative share of the total as well as the strength of the sentiment beyond pos/neg) need to be fetched either by webscraping or manually copying them to a spreadsheet. Due to large volumes of data, crowdsourcing could be useful — e.g., setting up a Google Sheet where crowdworkers each paste the text material in clean format. The raw text content, e.g. tweets, Facebook comments, Reddit comments, is put in separate sheets for each candidate.

II. Data analysis

  1. Cluster visualization (defining clusters, visualizing their respective sizes (plot # of voters), breakdown by source platform and potential other factors)
  2. Manual training (classifying the sentiment, or “likelihood to vote”)
  3. Machine classification (calculating the number of likely voters)

In every statistical analysis, the starting point should be visualizing the data. This shows an aggregate “helicopter view” of the situation. Such a snapshot is useful also for demonstrating the results for the end-user, to let the data speak for itself. Candidates are bubbles in the chart, their sizes in respect to the number of calculated likely voters. The data could be broken down according to source platforms, or other factors, by using the candidate as a point of gravity for the cluster.

Likelihood to vote could be classified as a scale, not binary. That is, instead of saying “sentiment is positive: YES/NO”, we could say “How likely is the person to vote?” which is the same as asking how enthusiastic or engaged he or she is. Therefore, a scale is better, e.g. ranging from -5 (definitely not voting for this candidate) to +5 (definitely voting for this candidate). The manual training, which also could be done with the help of crowd, helps the machine classifier to improve its accuracy on the go. Based on training data, it would generalize classification to all material. Now, the material is bucketed so that each candidate is evaluated separately and the number of likely voters can be calculated. It is possible that the machine classifier could benefit from training input from both candidates, inasmuch the language showing positive and negative engagement is not significantly different.

It is important to note that negative sentiment does not really matter. What we are interested in is the number of likely voters. This is because of the election dynamics – it does not matter how poor a candidates aggregate sentiment is, i.e. the ratio between haters and sympathizers, as long as his or her number of likely voters is higher than that of the competition. This effect was evident in the recent US presidential election.

The crucial thing is keep the process alive during the whole election/competition period. There is no such point where it becomes certain that one loses and the other remains, although the divide can become substantial and therefore increase the accuracy of the prediction.

III. Presentation of results

  • constantly updating feed (à la Facebook video stream)
  • cluster visualization
  • search trend widget (source: Google Trends)
  • live updating predictions (manual training –> machine model)

The results could be shown as a form of dashboard to the end user. Search trend graph and the above mentioned cluster visualization could be viable alternatives. In addition, it would be interesting to see the count of voters evolving in time – in such a way that it, along with the visualization, could be “played back” to examine the development in time. In other words, interactive visualization. As noted, the prediction, or the count of likely votes, should update real-time as a result of combined human-machine work.

Conclusion and discussion

The idea behind development of more agile methods to use social media data to predict content outcomes is that the accuracy of the prediction is based on the choice of indicators rather than the finesse of the method. For example, complex Bayesian models falsely predicted Hillary Clinton would win the election. It’s not that the models were poorly built; they just used the wrong indicators, namely polling data. This is the usual case of ‘garbage in, garbage out’, and it shows that the choice of indicators is more important than technical features of the predictive model.

The choice of indicators should be done based on their predictive power and although I don’t have strict evidence on it, it intuitively makes sense that social media engagement is a stronger indicator in many instances than survey data, because it’s based on actual preferences instead of stated preferences. Social scientists know from long tradition of survey research that there are myriad of social effects reducing the reliability of data (e.g., social desirability bias). Those, I would argue, are much smaller issue in social media engagement data.

However, to be fair, there can be issues of bias in the social media engagement data. The major concern is low participation rate: a common heuristic is that 1/10 of participants actually contribute in writing, while the other 9/10 are readers whose real thoughts remain unknown. It’s then a question of how well does the vocal minority reflect the opinion of the silent majority. Or, in some cases, this is irrelevant for competitions if the overall voting share remains low. For example, if it’s 60% it is relative much more important to mobilize the active base than if voting was close to 100% where one would need a universal acceptance.

Another issue is the non-representative sampling. This is a concern when the voting takes place offline, and online data does not accurately reflect the voting of those who do not express themselves online. However, as social media participation is constantly increasing, this becomes less of a problem. In addition, compared to other methods of data collection – apart from stratified polling, perhaps – social media is likely to give a good result on competitive predictions because of their political nature. People who strongly support a candidate are more likely to be vocal about it, and the channel for voicing their opinion is the social media.

It is evident that the value of social media engagement as a predictor is currently underestimated, as proven by the large emphasis put on political polls and virtually zero discussion on social media data. As a direct consequence of this, those who are able to leverage the social media data in the proper way will gain competitive advantage, be it betting market, or any other purpose where prediction accuracy plays a key role. The prediction work will remain a hybrid effort by man and machine.