Skip to content

Month: March 2017

User feedback: A startup perspective

Introduction – the first-order problem

The first-order problem for startups often is, they are not making something people want enough to pay for. As you can see from the CB Insights data, founders identify this as the most common reason for failure.

Figure 1 Reasons for startups failure

Notice the connection between 1 and 2: We can paraphrase that as “there was no market need, therefore the startup ran out of cash.” Investor hype aside (think of Twitter), most startups don’t have the luxury of living years with negative profitability. This is why I emphasise the part ‘enough to pay for’ – contrary to Andrew Chen and others who advice to first get users and then figure out how to make money [1], I’m of the small but growing ‘direct monetization’ school of thought [2].

The solution for this problem is evident: find out from users what they want or need, and then build it.

Second-order problem

But, this is followed by a second-order problem: How should you learn from the users? It’s not evident at all; let me elaborate.

  • First, if you ask users what they want, you get inaccurate feedback because the users don’t know all the possibilities. In other words, people don’t know what they want (feel free to insert a Henry Ford quote here).
  • Second, if you show them a demo, you get inaccurate feedback because your product is not ready, and the users cannot magically “imagine” how it would solve their problems, if it was ready.

Lean to the rescue?

Eric Ries (video), and a large number of his followers (video), advocate ‘Minimum Viable Product’ (MVP) as the solution. The theory goes that “it’s enough your MVP demonstrates the solution”, as potential customers is shown how the product essentially solves their problem, and for the rest they fill in the gaps. The key difference to a demo is the tight connection to the problem – we only need to show the logical connection between problem and solution (this is referred to as product-solution fit [2]), and for that we necessarily do not need even a laptop.

However, the MVP approach has two major problems. First of all, for many problems you cannot create an effective MVP. Consider Apple Pen, or many other products of that company. “See, here’s a pen – would you use it?”. It’s not very effective – you miss all the subtleties that the final product has and that the people pay for. Oftentimes, they pay for fine details, not for the hard crude solution. For this reason, the final product often ends up being very different from an MVP which is closer to a prototype. Second, there are complex problems which have, say, one main problem and two sub-problems: For example, to speeden up the set-up of a manufacturing plant, you need to solve logistical bottlenecks. But how do you capture that complexity in your MVP? For this kind of problems, it’s all or nothing: a partial solution won’t do. Moreover, they require the kind of deep customer understanding of the customer’s circumstances which is not usually part of the MVP gospel, centered on simple consumer software products as opposed to, say, B2B industry solutions.

I grant that the MVP approach has advantages, as technically, you could solve a complex problem on a flowchart, or communicate your solution as a video (as Dropbox did). I’m just highlighting that it has shortcomings, too. Most importantly, the final product that the people end up buying is often something very different from the MVP. So, maybe MVP could be used as a starting point, but not as the end solution.

How, then?

The best solution, as far as I can see, is this:

Learn and much of the nature of the problems, and then bridge that knowledge with the technical possibilities.

As you can see, this approach closely follows Steve Blank’s customer development.

The main difference is that Blank argues strongly it’s “not focus group” (read: market research), in my opinion it’s exactly that. In fact, you can apply both traditional and novel methods of market research to get to the bottom of users needs and wants. These include etnography, surveys, qualitative interviews, etc. I wrote a separate blog post about market research for startups.

At the core of Blank’s idea is the notion that the founders are testing their hypotheses by customer development. However, those hypotheses originate from innate assumptions about the customer’s reality, and are likely to be biased and flawed. Challenging the hypotheses is therefore must, and not a bad solution at all. However, we can also start by learning about the problem, not from the hypothesis formulation. Ultimately, I believe you can reach the same outcome either by starting from the founders’ hypotheses, or by inducing them from market research [3]. Which is faster and more efficient, probably depends on details of execution. Given the execution is equal, then the accuracy of the original hypotheses is the determining factor — if they are far off, more adjustment has to be done. In comparison, inductive market research, in theory, arrives straight away to the core of the user problems.


In the proposed approach, we take any means necessary to find out what is needed or wanted, and then combine that with the information of what is possible. If you look close enough, this is what marketing is all about – matching supply and demand. Consequently, the role, or competence, of a market researcher, is crucial for a startup organization. They need someone to bridge the technical knowledge, existing in developers’ heads, and customer knowledge, existing in customers’ heads. Often, these two groups don’t speak the same language, so the individual who is mediating is acting as a kind of an interpreter. (S)he has to have the ability to understand both languages — that of technology, and that of ordinary people.


[1] It’s the well-known Y-Combinator motto: “make something that people want”. This can be interpreted as getting users being the priority, which is why I like to re-phrase it as “make something that people want to pay for.”

[2] The major exception for foregoing direct monetization is subvention: e.g., in platforms that seems to be the de facto necessity to even enter the market, while for all startup is may be when users are recruited to learn about them. From an economic point of view, this equals subventing one group of users (early adopters) to improve access to another group (main market).

[2] Problem-solution fit precedes product-market fit, which essentially deals with having a product with a lucrative market.

[3] The same separation exists in the academia: There are hypothetico-deductive studies, and inductive studies.

My other writings on startup problems:

Startups! Are you using a ‘mean’ or an ‘outlier’ as a reference point?


This post is about startup thinking. In my dissertation about startup dilemmas [1], I argued that startups can exhibit what I call as ‘reference point bias’. My evidence was emerging from the failure narratives of startup founders, where they reported having experienced this condition.

The reference point bias is a false analogy where the founder compares their startup with a great success case (e.g., Facebook, Google, Groupon).

According to this false analogy: “If Facebook did X and succeeded, we will do X and succeed, too.

A_x | B_x –> S

or doing A ‘x’, given that B did ‘x’, results in success.

According to a contrary logic, they ought to consider the “mean” (failures) rather than the “outlier” (success) because that enables better preparation for the thousand-and-one problems they will face. (This is equivalent to thinking P(s) = 1- P(f), or that eliminating failure points (f) one can achieve success (s); which was a major underlying motivation for my dissertation.)

Why is this a problem?

Firstly, because in the process of making decisions under the reference point bias, you are likely to miss all the hardship left out from the best practices outlined by the example of your outliers. In other words, your reference point suffes from survivorship bias and post-hoc rationalization.

But a bigger, and a more substantial problem in my opinion, is the fundamental discrepancy between the conditions of the referred success case and the startup at hand.

Let me elaborate. Consider

A{a} ≠ B{b},

where the conditions (a) taking place in your startup’s (A) market differ from the conditions (b) of your reference point (B). As a corollary, as the set of market conditions of A approach B, the better suited those reference points (and their stories & best practices) become to your particular scenario. But startups rarely perform a systematic analysis for discovering how close the conditions whereupon certain advice or best practice were conceived match those at hand.

As a result, discrepancies originating from local differences, e.g. culture, competition, etc., emerge. Some of these dimensions can be modeled or captured by using the BMC (Business Model Canvas) framework. For example, customer segments, distribution channels, value propositions — all these can differ from one geographical location or point in time to another, and can be systematically analyzed with BMC.

In addition to BMC, it is important to note the impact of competitive conditions (a major deficit in the BMC framework), and especially that of the indirect competition [2]. At a higher level of abstraction, we can define discrepancies originating from spatial, temporal, or cultural distance. Time is an important aspect since, in business, different tactics expire (e.g., in advertising we speak of fatigue or burn indicating the loss of effectiveness), and there are generally “windows of opportunity” which result in the importance of choosing the correct time-to-market (you can easily be too early or too late).

So, overall, reference point bias is dangerous, because you end up taking best practices from Twitter literally, and never end up making actual money. In particular, platform and freemium businesses are tricky, and based on my experience something like 90% of the reference point outliers can be located to those fields. It should be kept in mind that platforms naturally suffer from high mortality due to winner-take-all dynamics [3].

In fact, one of the managerial implications of my dissertation was that platform business may not be a recommended business model at all; at least it is one order of magnitude harder than a your conventional product business. The same goes for freemium: giving something for free in the hopes of at some point charging for it turns out, more often than not, wishful thinking. Yet, startups time after time are drawn towards these challenging business models instead of more linear ones.

That is why the general rule “This is not Google, and you’re not Sergey Brin.” is a great leveler for founders overlooking cruel business realities.

But, when is outlier a good thing?

All that being said, later on, I have realized there is another logic behind using reference points. It is simply the classic: “Aim for the stars, land on the moon.”

Namely, having these idols, even though flawed ones, encourage thousands and thousands of young minds to enter the startup scene. And that’s a good thing, resulting in a net positive effect. Sometimes it’s better not knowing how hard a problem is, because if you knew, you would never take on the challenge.


In conclusion, my advice to founders would be two-fold:

1) Use reference points as a source of inspiration, i.e. something you strive to become (it’s okay wanting to be as successful as Facebook)

2) But, don’t apply their strategies and tactics literally in your context.

Each context is unique, and the exact same business model rarely applies in a different market, defined by spatial, temporal and cultural distance. So the next time you hear a big-shot from Google or Facebook telling how they made it, listen carefully, but with a critical mind. Try to systematically analyze the conditions where they took place, not only “why” they worked.

End notes

[1] Salminen, J. (2014, November 7). Startup dilemmas – Strategic problems of early-stage platforms on the internet. Turku School of Economics, Turku.

[2] That is, how local people do things differently: A good example is WhatsApp which was not popular in the US because operators gave free SMS; the rest of the world was, and is, very different.

[2] Katz, M. L., & Shapiro, C. (1985). Network Externalities, Competition, and Compatibility. The American Economic Review, 75(3), 424–440.

Miten startupit voisivat oikeasti ratkoa ongelmia? Näkymättömän alaluokan merkitys


Luin mielenkiintoisen artikkelin:

Teesinä on, että startupit keskittyvät yhteiskunnan kannalta “vääriin” ongelmiin. Ne keskittyvät joko eliitin ongelmiin (korkeasti koulutetut kosmopoliitit) tai eksoottisiin kolmannen maailman ongelmiin, joihin usein luovat lumeratkaisuja kestävien ratkaisujen sijaan. Sen sijaan alemman keskiluokan ongelmat jätetään huomiotta: esim. työttömyys, uudelleenkouluttautuminen, sotaveteraanit (USA). Tätä kohderyhmää kuvataan näkymättömäksi “alaluokaksi”, koska startupeille he eivät ole olemassa.

Miksi näin on?

Kirjoitin tästä ilmiöstä yhdessä konferenssipaperissa [1] pari vuotta sitten. Syyt ovat selvät: Ensinnäkin ongelman tunnistus lähtee ratkojan omasta kokemuskentästä. Koska useimmat ovat korkeasti koulutettuja kosmopoliitteja, he ratkovat omiensa ongelmia. (Tämä näkyy selkeästi opiskelijoiden startup-ideoissa: aina samat baarisovellusideat vuodesta toiseen.)

Kuvio kuvaa tätä ilmiötä.

Kuvio 1. Rajallisen kokemuksen efekti [2]

Toiseksi sosiaalisia ongelmia ei voi tyypillisesti ratkoa pelkästään teknologian avulla, vaan ne vaativat joko institutionaalisia ratkaisuja tai vähintään hybridiratkaisuja jotka yhdistävät yhteiskunnallisen muutoksen ja teknologian.


Ongelma_1: Taidot ja osaamiset eivät kohtaa työmarkkinoilla => työttömyyttä

Ratkaisu: Parempi uudelleenkoulutus (=nopea, saatavilla oleva, “helppo”), joka opettaa työmarkkinoilla tarvittavia taitoja.

Ongelma_2: Vain virallinen tutkinto merkitsee työmarkkinoilla [3], ts. startupin sovellus ei sovi institutionaaliseen kehikkoon. Insitutionaalinen kehikko muuttuu hitaammin kuin sosiaalinen maailma, mikä on ongelman juurisyy.

On tyypillistä, että startupien ongelmat ketjutuvat [4]. Usein n-tason ongelmat tulevat instituutioiden tasolta, minkä vuoksi institutionaalinen uusiutumisen nopeuttaminen on keskeinen osa liiketoimintaratkaisua.


Polku toimivaan sosiaalisen ongelmien ratkomiseen on kaksijakoinen:

1) Jotta startupit voisivat todella ratkoa sosiaalisia ongelmia, niiden pitää laajentaa kokemuskenttäänsä kosmopoliittisen maailmankuvan ulkopuolelle.

2) Jotta startupit voisivat todella ratkoa sosiaalisia ongelmia, tarvitaan hybridiratkaisu, joka tarkoittaa että markkinamekanismin toimintaa ei estetä instituutioiden tasolla.

Nykyisellään startupit voidaan ymmärtää osana “eliittiä”, joka ei välitä alemman keskiluokan ongelmista, ja osaltaan tämä asetelma edesauttaa yhteiskunnallisten ongelmien syntymistä ja tyytymättömyyden purkautumista esimerkiksi radikaalien johtajien valinnan kautta. Kuten historiasta tiedetään, radikaalien johtajien valinta ei ole suotuisaa kehitystä yhteiskunnan kannalta. Startupien tulisi pyrkiä olemaan osa ratkaisua ja ennakkoluulottomasti kartoittaa uusia markkinoita, uusia kohderyhmiä (työttömät, yh-äidit, vanhukset, jne.). Jotta startupit voisivat todella ratkoa sosiaalisia ongelmia, niiden pitää laajentaa kokemuskenttäänsä kosmopoliittisen maailmankuvan ulkopuolelle. Tämä edistäisi yhteiskunnallista kehitystä ja olisi hyväksi myös bisnesmielessä, koska “missä ongelma, siellä markkina”.


[1] Konferenssipaperi:

[2] Esitys Åbo Akademissa:

[3] Toki löytyy esimerkkejä päinvastaisesta, mutta pääasiassa tilanne on näin.

[4] Väitöskirja:

Experimenting with IBM Watson Personality Insights: How accurate is it?


I ran an analysis with IBM Watson Personality Insights. It retrieved my tweets and analyzed their text content to describe me as a person.

Doing so is easy – try it here:

I’ll briefly discuss the accuracy of the findings in this post.

TL;DR: The accuracy of IBM Watson is a split decision – some classifications seem to be accurate, while others are not. The inaccuracies are probably due to lack of source material exposing a person’s full range of preferences.


The tool analyzed 25,082 words and labelled the results as “Very Strong Analysis”. In the following, I will use introspection to comment the accuracy of the findings.

“You are a bit critical, excitable and expressive.”

Introspection: TRUE

“You are philosophical: you are open to and intrigued by new ideas and love to explore them. You are proud: you hold yourself in high regard, satisfied with who you are. And you are authority-challenging: you prefer to challenge authority and traditional values to help bring about positive changes.”

Introspection: TRUE

“Your choices are driven by a desire for efficiency.”

Introspection: TRUE

“You are relatively unconcerned with both tradition and taking pleasure in life. You care more about making your own path than following what others have done. And you prefer activities with a purpose greater than just personal enjoyment.”

Introspection: TRUE

At this point, I was very impressive with the tool. So far, I would completely agree with its assessment of my personality, although it’s only using my tweets which are short and mostly shared links.

While the description given by Watson Personality Insights was spot on (introspection agreement: 100%), I found the categorical evaluation to be lacking. In particular, “You are likely to______”

“be concerned about the environment”

Introspection: FALSE (I am not particularly concerned about the environment, as in nature, although I am worried about societal issues like influence of automation on jobs, for example)

“read often”

Introspection: TRUE

“be sensitive to ownership cost when buying automobiles”

Introspection: TRUE

Actually, the latter one is quite amazing because it describes my consumption patterns really well. I’m a very frugal consumer, always taking into consideration the lifetime cost of an acquisition (e.g., of a car).

In addition, the tool tells also that “You are unlikely to______”

“volunteer to learn about social causes”

Introspection: TRUE

“prefer safety when buying automobiles”

Introspection: FALSE (In fact, I’m thinking of buying a car soon and safety is a major criteria since the city I live in has rough traffic.)

“like romance movies”

Introspection: FALSE (I do like them! Actually just had this discussion with a friend offline, which is a another funny coincidence.)

So, the overall accuracy rate here is only 3/6 = 50%.

I did not read into the specification in more detail, but I suspect the system chooses the evaluated categories based on the available amount of data; i.e. it simply leaves off topics with inadequate data. Since there is a very broad number of potential topics (ranging from ‘things’ to ‘behaviors’), the probability of accumulating enough data points on some topics increases as the amount of text increases. In other words, you are more likely to hit some categories and, after accumulating enough data on them, you can present quite a many descriptors about the person (while simply leaving out those you don’t have enough information on).

However, the choice of topics was problematic: I have never tweeted anything relating to romantic movies (at least to my recall) which is why it’s surprising that the tool chose it as a feature. The logic must be: “in absence of Topic A, there is no interest in Topic A” which is somewhat fallible given my Twitter behavior (= leaning towards professional content). Perhaps this is the root of the issue – if my tweets had a higher emphasis on movies/entertainment, it could better predict my preferences. But as of now, it seems the system has some gaps in describing the full spectrum of a user’s preferences.

Finally, Watson Personality Insights gives out numerical scores for three dimensions: Personality, Consumer needs, and Values. My scores are presented in the following figure.

Figure 1 IBM Watson Personality Scores

I won’t go through all of them here, but the verdict here is also split. Some are correct, e.g. practicality and curiousity, while others I would say are false (e.g., liberty, tradition). In sum, I would say it’s more accurate than not (i.e., beats chance).


The system is surprisingly accurate given the fact it is analyzing unstructured text. It would be interesting to know how the accuracy fares in comparison to personality scales (survey data). This is because those scales rely on structured form of inquiry which has less noise, at least in theory. In addition, survey scales may result in a comprehensive view of traits, as all the traits can be explicitly asked about. Social media data may systematically miss certain expressions of personality: for example, my tweets focus on professional content and therefore are more likely to misclassify my liking of romantic movies – a survey could explicitly ask both my professional and personal likings, and therefore form a more balanced picture.

Overall, it’s exciting and even a bit scary how well a machine can describe you. The next step, then, is how could the results be used? Regardless of all the hype surrounding the Cambridge Analytica’s ability to “influence the election”, in reality combining marketing messages with personality descriptions is not straight-forward as it may seem. This is because preferences are much more complex than just saying you are of personality type A, therefore you approve message B. Most likely, the inferred personality traits are best used as additional signals or features in decision-making situations. They are not likely to be only ones, or even the most important ones, but have the potential to improve models and optimization outcomes for example in marketing.

How to Win the Google Online Marketing Challenge

GOMCHA European Winners 2016

1. Introduction

In the past couple of weeks, a few people have approached me asking for tips on how to do well in the Google Online Marketing Challenge. So, I thought I might as well gather some of my experiences in a blog post, and share them with everybody.

A little bit of background: I’ve been the professor of two winning teams (GOMC Europe 2013 & GOMC Europe 2016). Although the most credit is obviously due to the students that do all the hard work (the students at Turku School of Economics simply rock!), guidance does play an important role since most commonly the students have no prior experience in SEM/PPC, and need to be taught quickly where to focus on.

2. Advice to teachers

The target audience for this post is anyone participating in the challenge. For the teachers, I have one important advice:

Learn the system if you’re teaching it. There’s no substitute for real experience. The students are likely to have a million questions, and you need to give better answers than “google it.” Personally, I was fortunate enough to have done SEM for many years before starting to teach it. Without that experience, it would have been impossible to guide the teams do well. However, if you don’t have the same advantage, but you want your students to do well, turn to the industry. Many SEM companies out there are interested in mentoring/sparring the students, because that way they can also spot talented individuals for future hiring (win-win, right?).

3. How to win GOMCHA?

3.1 Overview

That said, here are my TOP3 “critical success factors” for winning the challenge:

  1. Choose your case wisely
  2. Focus on Quality Score
  3. Show impact

That’s it! Follow these principles and you will do well. Now, that being said, behind each of them is a whole layer of complexity 🙂 Let’s explore each point.

3.2 Choosing the AdWords case

First, one of the earliest questions students are going to ask is how to choose the company/organization they’re doing the campaign for. And that’s also one of the most important ones. How I do it: I let each team choose and find their own case; however, I tell them what is a good case and what is not. I wrote a separate post about choosing a good AdWords case. Read the post, and internalize the information.

Update: one more point to the linked post – choose one that preferably has some brand searches already. This helps you get higher overall CTR, and lower the overall CPC.

The choice of a good case is crucial, because you can be the best optimizer in the world, but if you have a bad case, you will fail. An example was a team that chose a coffee company — it was not a good case to choose because it had low product range and relatively few searches. For some reason, the team, which consisted of several students with *real experience* in AdWords, wanted to choose it. Not surprisingly, they struggled due to the above reasons and were easily overshadowed by other teams with no experience but a good case. Therefore, the formula here is: success = case * skills.

By the way, that is one of the most important lessons for any marketing student in general: Always choose your case wisely, and never market something whose potential you don’t believe in.

3.3 Choosing the metrics

Another common question relates to the metrics: What should we optimize for? While there are many important metrics, including CTR and CPC, I would say one is above the others. That is clearly the Quality Score, which seems to be very influential in Google’s ranking algorithm for the competition.

Note that I don’t have any insider information on this, but I’m saying *seems* because of this reason: In 2015, I instructed the teams to focus on a wide range of metrics, including CTR, CPC, and QS. What came out where several great teams that, in my opinion, had better overall metrics than many of the finalists that year (none of my teams were finalists). Last year, however, I switched the strategy and instructed the teams to heavily focus on Quality Score, even at the cost of other metrics. For example, to the team that ended up winning in 2016, I said “your goal is 10 x 10”, meaning they should get 10 keywords with QS 10. They ended up getting 12, and the rest is history 🙂

3.4 Why is Quality Score that important?

In my view, it’s because all optimization efforts basically culminate to that metric. To maximize your QS, you essentially need to do all the right things in terms of optimization, including account structure, ad creation, and landing pages. To get these things nailed, refer to this post. And google for more tips: blogs such as PPC Hero, Wordstream, and Certified Knowledge have plenty of subject matter to learn from. I also have complied an extensive list of digital marketing blogs that you can utilize.

However, do note that all third-party information is to some degree unreliable. Use it with caution, combined with your first-hand experiments (i.e., do what you see working the best in the light of numbers). The most reliable source of information is of course Google, because they know the system from the inside, any of the experts (including myself) don’t. So, use Google’s AdWords help as your main reference.

3.5 Show real impact

The last step, since many teams can score high on metrics, is to show real-life impact. This is pretty much the only way to differentiate when all finalist teams are good. The thing you can do here is, first of all, to meticulously follow Google’s guidelines for the reports to highlight your greatness. As a member of the academic panel, I know some cases have been failed due to not following the technical guidelines, so make sure your output is in line with them. However, that is not the main point; the main point is to show how you brought real results to your case organization. Although not part of the official ranking, if you look at the past winners, most of them have gained a lot of conversions. By knowing that, you can do the math. The reports of the winners from earlier years can be found at the challenge website.

4. List of practical tips

Finally, some practical tips (the list is in no particular order, and not comprehensive at all):

  1. Optimize every day like you were obsessed with AdWords
  2. Don’t be afraid to ask advice from the experts; take every help you can get to learn faster
  3. Prefer using ‘exact match’ keywords
  4. Never mix display campaigns with search campaigns (i.e., avoid ‘display select’)
  5. Avoid GDN altogether; you can experiment with it using a little budget, but focus 99% on search campaigns
  6. When possible, direct the keywords to a specific landing page (not homepage)
  7. Create ad groups based on semantic similarity of keywords (if you don’t know what this means, find out)
  8. Don’t stress about the initial bid price; set it at some level based on the Keyword Planner estimates and change according to results
  9. Or, alternatively, set it as high as possible to get a good Avg. Pos. and therefore improved CTR, and improved QS
  10. Set the bid price manually per keyword
  11. Use GA to report after-click performance (good for campaign report)
  12. Use as many AdWords features as possible (good for campaign report)

Finally, read Google’s materials, including the challenge website. Follow their advice meticulously, and read read read about search-engine advertising from digital marketing blogs and Google’s website.

Good luck!! 🙂

CAVEAT: I’m a member at the Google Online Marketing Challenge’s academic panel. These are my personal opinions and don’t necessarily represent the official panel views. The current judging criteria for the competition can be found at:

UPDATE (May, 2017): Together with Elina Ojala (next to me in the picture above), we had a Skype call with students of Lappeenranta University of Technology (LUT). Elina pointed out some critical things: it’s important 1) to be motivated, 2) have a really good team without free riding, 3) share tasks efficiently (e.g., analytics, copywriting; based on individual interests), and 3) go through extra effort (e.g., changing the landing pages, using GA). I added that for teachers it’s important to motivate the students: aim HIGH !! And to stress there is zero chance of winning if the team doesn’t work every day (=linear relationship between hours worked and performance).

Resources (some in Finnish)

The black sheep problem in machine learning

Just a picture of a black sheep.

Introduction. Hal Daumé III wrote an interesting blog post about language bias and the black sheep problem. In the post, he defines the problem as follows:

The “black sheep problem” is that if you were to try to guess what color most sheep were by looking and language data, it would be very difficult for you to conclude that they weren’t almost all black. In English, “black sheep” outnumbers “white sheep” about 25:1 (many “black sheep”s are movie references); in French it’s 3:1; in German it’s 12:1. Some languages get it right; in Korean it’s 1:1.5 in favor of white sheep. This happens with other pairs, too; for example “white cloud” versus “red cloud.” In English, red cloud wins 1.1:1 (there’s a famous Sioux named “Red Cloud”); in Korean, white cloud wins 1.2:1, but four-leaf clover wins 2:1 over three-leaf clover.

Thereafter, Hal accurately points out:

“co-occurance frequencies of words definitely do not reflect co-occurance frequencies of things in the real world”

But the mistake made by Hal is to assume language describes objective reality (“the real world”). Instead, I would argue that it describes social reality (“the social world”).

Black sheep in social reality. The higher occurence of ‘black sheep’ tells us that in social reality, there is a concept called ‘black sheep’ which is more common than the concept of white (or any color) sheep. People are using that concept, not to describe sheep, but as an abstract concept in fact describing other people (“she is the black sheep of the family”). Then, we can ask: Why is that? In what contexts is the concept used? And try to teach the machine its proper use through associations of that concept to other contexts (much like we teach kids when saying something is appropriate and when not). As a result, the machine may create a semantic web of abstract concepts which, if not leading to it understanding them, at least helps in guiding its usage of them.

We, the human. That’s assuming we want it to get closer to the meaning of the word in social reality. But we don’t necessarily want to focus on that, at least as a short-term goal. In the short-term, it might be more purposeful to understand that language is a reflection of social reality. This means we, the humans, can understand human societies better through its analysis. Rather than trying to teach machines to imputate data to avoid what we label an undesired state of social reality, we should use the outputs provided by the machine to understand where and why those biases take place. And then we should focus on fixing them. Most likely, technology plays only a minor role in that.

Conclusion. The “correction of biases” is equivalent to burying your head in the sand: even if they magically disappeared from our models, they would still remain in the social reality, and through the connection of social reality and objective reality, echo in the everyday lives of people.

How to teach machines common sense? Solutions for ambiguity problem in artificial intelligence


The ambiguity problem illustrated:

User: “Siri, call me an ambulance!”

Siri: “Okay, I will call you ‘an ambulance’.”

You’ll never reach the hospital, and end up bleeding to death.


Two potential solutions:

A. machine builds general knowledge (“common sense”)

B. machine identifies ambiguity & asks for clarification from humans

The whole “common sense” problem can be solved by introducing human feedback into the system. We really need to tell the machine what is what, just like a child. It is iterative learning, in which trials and errors take place.

But, in fact, A. and B. converge by doing so. Which is fine, and ultimately needed.

Contextual awareness

To determine which solution to an ambiguous situation is proper, the machine needs contextual awareness; this can be achieved by storing contextual information from each ambiguous situation, and being explained “why” a particular piece of information results in disambiguity. It’s not enough to say “you’re wrong”, but there needs to be an explicit association to a reason (concept, variable). Equally, it’s not enough to say “you’re right”, but again the same association is needed.

The process:

1) try something

2) get told it’s not right, and why (linking to contextual information)

3) try something else, corresponding to why

4) get rewarded, if it’s right.

The problem is, currently machines are being trained by data, not by human feedback.

New thinking on teaching the machine

So we would need to build machine-training systems which enable training by direct human feedback, i.e. a new way to teach and communicate with the machine. It’s not a trivial thing, since the whole machine-learning paradigm is based on data. From data and probabilities, we would need to move into associations and concepts. A new methodology is needed. Potentially, individuals could train their own AIs like pets (think Tamagotchi), or we could use large numbers of crowd workers who would explain the machine why things are how they are (i.e., create associations). A specific type of markup (=communication) would probably also be needed.

Through mimicking human learning we can teach the machine common sense. This is probably the only way; since common sense does not exist beyond human cognition, it can only be learnt from humans. An argument can be made that this is like going back in time, to era where machines followed rule-based programming (as opposed to being data-driven). However, I would argue rule-based learning is much closer to human learning than the current probability-based one, and if we want to teach common sense, we therefore need to adopt the human way.

Conclusion: machines need education

Machine learning may be at par, but machine training certainly is not. The current machine learning paradigm is data-driven, whereas we could look into ways for concept-driven training approaches.

Rule-based AdWords bidding: Hazardous loops

1. Introduction

In rule-based bidding, you want to sometimes have step-backs where you first adjust your bid based on a given condition, and then adjust it back after the condition has passed.

An example. An use case would be to decrease bids for weekend, and increase back to normal level for weekdays.

However, defining the step-back rate is not done how most people would think. I’ll tell you how.

2. Step-back bidding

For step-back bidding you need two rules: one to change the bid (increase/decrease) and another one to do the opposite (decrease/increase). The values applied by these rules must cancel one another.

So, if your first rule raises the bid from $1 to $2, you want the second rule to drop it back to $1.

Call these

x = raise by percentage

y = lower by percentage

Where most people get confused is by assuming x=y, so that you use the same value for both the rules.

Example 1:

x = raise by 15%

y = lower by 15%

That should get us back to our original bid, right? Wrong.

If you do the math (1*1.15*0.85), you get 0.997, whereas you want 1 (to get back to the baseline).

The more you iterate with the wrong step-back value, the farther from the baseline you end. To illustrate, see the following simulation, where the loop is applied weekly for three months (12 weeks * 2 = 24 data points).

Figure 1 Bidding loop

As you can see, the wrong method will take you more and more off from the correct pattern as the time goes by. For a weekly rule the difference might be manageable, especially if the rule’s incremental change is small, but imagine if you are running the rule daily or each time you bid (intra-day).

3. Solution

So, how to get to 1?

It’s very simple, really. Consider

  • B = baseline value (your original bid)
  • x = the value of the first rule (e.g., raise bid by 15% –> 0.15)
  • y = the value of the second rule (dependant on the 1st rule)

You want to solve y from

B(1+x) * y = 1

That is,

y = 1 / B(1+x)

For the value in Example 1,

y = 1 / 1*(1+0.15)

multiplying that by the increased value results in 1, so that

1.15 * (1/1*(1+0.15) = 1


Remember to consider elementary mathematics, when applying AdWords bidding rules!

Hakukoneoptimointi toimittajan näkökulmasta


Media on riippuvainen mainostuloista. On jatkuva kiistelyn aihe, miten paljon toimittajien tulisi kirjoittaa juttuja, jotka saavat klikkejä ja näyttöjä suhteessa juttuihin, joiden yhteiskunnallinen merkitys on korkea. Nämä kaksi kun eivät aina kulje käsi kädessä.

Sosiaalisen median ja hakukoneiden merkitys toimittajan työssä

Käytännössä toimittajat joutuvat työnsä puolesta huomioimaan juttujen kiinnostavuuden sosiaalisessa mediassa. Tämä on tärkeää vaikka haluaisi kirjoittaa vain yhteiskunnallisesti tärkeistä aiheista, koska huomion saaminen kilpailevan sisällön keskellä on ainut tapa saada viestinsä läpi. Sosiaalisen median osalta on siis huomioitava sellaisia seikkoja kuin 1) vetävän otsikon muotoilu, 2) vetävän esikatselukuvan valinta, ja 3) object graph -metatietojen muokkaaminen (vaikuttavat siihen miltä linkki näyttää sosiaalisessa mediassa).

Sosiaalisen median lisäksi toimittajan on huomioitava hakukoneoptimointi, sillä somen ohella hakukoneet ovat tyypillisesti merkittävä liikenteen lähde. Mitä paremmin artikkelit on optimoitu, sitä todennäköisemmin ne sijoittuvat tärkeillä avainsanoilla korkealle Googlen tuloksissa.

Mitä toimittajan on tiedettävä hakukoneoptimoinnista?

Juttuja kirjoittaessaan toimittajan on huomioitava seuraavat seikat hakukoneiden kannalta:

  1. Avainsanat – kaikessa pitää lähteä siitä, että tunnistetaan oikeat avainsanat, joilla artikkelin halutaan löytyvän. Tässä kannattaa hyödyntää avainsanatutkimuksen työkaluja, kuten Googlen avainsanatyökalua (Keyword Planner).
  2. Pääotsikko ja väliotsikot – valittujen avainsanojen tulee näkyä jutun otsikossa ja väliotsikoissa. Väliotsikot (h2) ovat tärkeitä, sillä ne luovat hakukoneelle ymmärrettävissä olevan rakenteen, sekä tukevat käyttäjien luontaista, skannaukseen pohjautuvaa verkkolukemista.
  3. Linkit – jutussa tulee olla linkkejä muihin lähteisiin oikeanlaisilla ankkuriteksteillä merkittynä. Ei “lisää tietoa lisäravinteista saat klikkaamalla tänne“, vaan “esimerkiksi Helsingin Sanomat on kirjoittanut useita juttuja lisäravinteista“.
  4. Teksti – kappaleiden tulee olla lyhyitä, sisältää selkeästi luettavissa olevaa kieltä ja optimoitavia avainsanoja sopiva määrä. Sopivan määrän mitta on se, että avainsanoja on luonnolliselta tuntuva määrä – liikaa toistoa ei saa olla, koska Google voi tulkita sen manipulointiyritykseksi.

Ennen kaikkea kirjoitetun artikkelin tulee olla sekä käyttäjälle miellyttävä lukea, että hakukoneelle helposti ymmärrettävä. Nämä kaksi seikkaa yhdistämällä hakukoneoptimoinnin perusteet ovat kunnossa.

Affinity analysis in political social media marketing – the missing link

Introduction. Hm… I’ve figured out how to execute successful political marketing campaign on social media [1], but one link is missing still. Namely, applying affinity analysis (cf. market basket analysis).

Discounting conversions. Now, you are supposed to measure “conversions” by some proxy – e.g., time spent on site, number of pages visited, email subscription. Determining which measurable action is the best proxy for likelihood of voting is a crucial sub-problem, which you can approach with several tactics. For example, you can use the closest action to final conversion (vote), i.e. micro-conversion. This requires you have an understanding of the sequence of actions leading to final conversion. You could also use a relative cut-off point; e.g. the nth percentile with the highest degree of engagement is considered as converted.

Anyhow, this is very important because once you have secured a vote, you don’t want to waste your marketing budget by showing ads to people who already have decided to vote for your candidate. Otherwise, you risk “preaching to the choir”. Instead, you want to convert as many uncertain voters to voters as possible, by using different persuasion tactics.

Affinity analysis. The affinity analysis can be used to accomplish this. In ecommerce, you would use it as a basis for recommendation engine for cross-selling or up-selling (“customers who bought this item also bought…” à la Amazon). First you detemine which sets of products are most popular, and then show those combinations to buyers interested in any item belonging to that set.

In political marketing, affinity analysis means that because a voter is interested in topic A, he’s also interested in topic B. Therefore, we will show him information on topic B, given our extant knowledge his interests, in order to increase likelihood of conversion. This is a form of associative

Operationalization. But operationalizing this is where I’m still in doubt. One solution could be building an association matrix based on website behavior, and then form corresponding retargeting audiences (e.g., website custom audiences on Facebook). The following picture illustrates the idea.

Figure 1 Example of affinity analysis (1=Visited page, 0=Did not visit page)

For example, we can see that themes C&D and A&F commonly occur together, i.e. people visit those sub-pages in the campaign site. You can validate this by calculating correlations between all pairs. When you set your data in binary format (0/1), you can use Pearson correlation for the calculations.

Facebook targeting. Knowing this information, we can build target audiences on Facebook, e.g. “Visited /Theme_A; NOT /Theme_F; NOT /confirmation”, where confirmation indicates conversion. Then, we would show ads on Theme F to that particular audience. In practice, we could facilitate the process by first identifying the most popular themes, and then finding the associated themes. Once the user has been exposed to a given theme, and did not convert, he needs to be exposed to another theme (with the highest association score). The process is continued until themes run out, or the user converts, which ever comes first. Applying the earlier logic of determining proxy for conversion, visiting all theme sub-pages can also be used as a measure for conversion.

Finally, it is possible to use more advanced methods of associative learning. That is, we could determine that {Theme A, Theme F} => {Theme C}, so that themes A and B predict interest in theme C. However, it is more appropriate to predict conversion rather than interest in other themes, because ultimately we’re interested in persuading more voters.


[1] Posts in Finnish: