Skip to content

Tag: machine learning

Machine learning and Facebook Ads


One important thing in machine learning is feature engineering (selection & extraction). This means choosing the right variables that improve the model’s performance, while discarding those reducing it. The more impact your variables have on the performance metric, the better. Because the real world is complex, you may start with dozens or even hundreds of variables (=features), but in the end, you only want to keep the ones that improve the model’s performance.

While there are algorithms, such as information gain, to help, expert judgment can be of help as well. That’s because experts may have prior information on the important inputs. Therefore, one could interview industry insiders prior to creating a machine-learning model. Basically, the expert opinion narrows down the feature space. While this approach has risks, primarily foregoing hidden or non-obvious features, as well as potential expert biases, it also has obvious advantages in terms of distinguishing signal from noise.

So, the premise of narrowing down search space is the motivation for this article. I got to think, and do some rapid research, on what features matter for performance of Facebook advertising. These could be used as a basis for machine learning model e.g. to predict performance of a given ad.

A. Text features

  • topic
  • sentiment [1]
  • includesPrice
  • includesBrandName
  • wordCount [1]
  • wordLength
  • charCount [1]
  • includesEmojis
  • meaningEmojis
  • includesQuestion
  • includesExclamation
  • includesImperative
  • includesBenefits
  • includesNumbers
  • isSimpleLanguage
  • includesShortURL

B. Images

  • includesText [2]
  • includesPrice
  • includesProduct [2]
  • includesLogo
  • imageObjects
  • includesPeople
  • includesFace
  • includesAnimals
  • imageLocation
  • isStockphoto
  • includesCTA
  • isDarkColorTheme [2]

C. Metrics

  • clicksAll
  • clicksWebsite
  • websitePurchases
  • countLikes

D. Demographics

  • gender
  • age
  • location

E. Misc features

  • adPlacement
  • campaignGoal


A simple model could only account for C (=independent and dependent variables) and D (independent variables), while more complex models would run a more complex analysis of text and images using linear or non-linear optimization, such as neural networks (shallow or deep learning). Also, some of these features could be retrieved by using commercial or public APIs. For example,

  • Google Cloud Vision API – for image analysis [3]
  • MonkeyLearn – for text analysis [4]
  • EmojiNet API – for emoji analysis [5]


Ideally, each advertiser has his own model, because they may not generalize well (e.g., different advertisers have different target groups). However, feature selection may benefit from learning from earlier experiences. Also, given that there is enough data, it may be possible that the model learns which features apply across different advertisers, achieving a greater degree of generalizability.







Experimenting with IBM Watson Personality Insights: How accurate is it?


I ran an analysis with IBM Watson Personality Insights. It retrieved my tweets and analyzed their text content to describe me as a person.

Doing so is easy – try it here:

I’ll briefly discuss the accuracy of the findings in this post.

TL;DR: The accuracy of IBM Watson is a split decision – some classifications seem to be accurate, while others are not. The inaccuracies are probably due to lack of source material exposing a person’s full range of preferences.


The tool analyzed 25,082 words and labelled the results as “Very Strong Analysis”. In the following, I will use introspection to comment the accuracy of the findings.

“You are a bit critical, excitable and expressive.”

Introspection: TRUE

“You are philosophical: you are open to and intrigued by new ideas and love to explore them. You are proud: you hold yourself in high regard, satisfied with who you are. And you are authority-challenging: you prefer to challenge authority and traditional values to help bring about positive changes.”

Introspection: TRUE

“Your choices are driven by a desire for efficiency.”

Introspection: TRUE

“You are relatively unconcerned with both tradition and taking pleasure in life. You care more about making your own path than following what others have done. And you prefer activities with a purpose greater than just personal enjoyment.”

Introspection: TRUE

At this point, I was very impressive with the tool. So far, I would completely agree with its assessment of my personality, although it’s only using my tweets which are short and mostly shared links.

While the description given by Watson Personality Insights was spot on (introspection agreement: 100%), I found the categorical evaluation to be lacking. In particular, “You are likely to______”

“be concerned about the environment”

Introspection: FALSE (I am not particularly concerned about the environment, as in nature, although I am worried about societal issues like influence of automation on jobs, for example)

“read often”

Introspection: TRUE

“be sensitive to ownership cost when buying automobiles”

Introspection: TRUE

Actually, the latter one is quite amazing because it describes my consumption patterns really well. I’m a very frugal consumer, always taking into consideration the lifetime cost of an acquisition (e.g., of a car).

In addition, the tool tells also that “You are unlikely to______”

“volunteer to learn about social causes”

Introspection: TRUE

“prefer safety when buying automobiles”

Introspection: FALSE (In fact, I’m thinking of buying a car soon and safety is a major criteria since the city I live in has rough traffic.)

“like romance movies”

Introspection: FALSE (I do like them! Actually just had this discussion with a friend offline, which is a another funny coincidence.)

So, the overall accuracy rate here is only 3/6 = 50%.

I did not read into the specification in more detail, but I suspect the system chooses the evaluated categories based on the available amount of data; i.e. it simply leaves off topics with inadequate data. Since there is a very broad number of potential topics (ranging from ‘things’ to ‘behaviors’), the probability of accumulating enough data points on some topics increases as the amount of text increases. In other words, you are more likely to hit some categories and, after accumulating enough data on them, you can present quite a many descriptors about the person (while simply leaving out those you don’t have enough information on).

However, the choice of topics was problematic: I have never tweeted anything relating to romantic movies (at least to my recall) which is why it’s surprising that the tool chose it as a feature. The logic must be: “in absence of Topic A, there is no interest in Topic A” which is somewhat fallible given my Twitter behavior (= leaning towards professional content). Perhaps this is the root of the issue – if my tweets had a higher emphasis on movies/entertainment, it could better predict my preferences. But as of now, it seems the system has some gaps in describing the full spectrum of a user’s preferences.

Finally, Watson Personality Insights gives out numerical scores for three dimensions: Personality, Consumer needs, and Values. My scores are presented in the following figure.

Figure 1 IBM Watson Personality Scores

I won’t go through all of them here, but the verdict here is also split. Some are correct, e.g. practicality and curiousity, while others I would say are false (e.g., liberty, tradition). In sum, I would say it’s more accurate than not (i.e., beats chance).


The system is surprisingly accurate given the fact it is analyzing unstructured text. It would be interesting to know how the accuracy fares in comparison to personality scales (survey data). This is because those scales rely on structured form of inquiry which has less noise, at least in theory. In addition, survey scales may result in a comprehensive view of traits, as all the traits can be explicitly asked about. Social media data may systematically miss certain expressions of personality: for example, my tweets focus on professional content and therefore are more likely to misclassify my liking of romantic movies – a survey could explicitly ask both my professional and personal likings, and therefore form a more balanced picture.

Overall, it’s exciting and even a bit scary how well a machine can describe you. The next step, then, is how could the results be used? Regardless of all the hype surrounding the Cambridge Analytica’s ability to “influence the election”, in reality combining marketing messages with personality descriptions is not straight-forward as it may seem. This is because preferences are much more complex than just saying you are of personality type A, therefore you approve message B. Most likely, the inferred personality traits are best used as additional signals or features in decision-making situations. They are not likely to be only ones, or even the most important ones, but have the potential to improve models and optimization outcomes for example in marketing.

The black sheep problem in machine learning

Just a picture of a black sheep.

Introduction. Hal Daumé III wrote an interesting blog post about language bias and the black sheep problem. In the post, he defines the problem as follows:

The “black sheep problem” is that if you were to try to guess what color most sheep were by looking and language data, it would be very difficult for you to conclude that they weren’t almost all black. In English, “black sheep” outnumbers “white sheep” about 25:1 (many “black sheep”s are movie references); in French it’s 3:1; in German it’s 12:1. Some languages get it right; in Korean it’s 1:1.5 in favor of white sheep. This happens with other pairs, too; for example “white cloud” versus “red cloud.” In English, red cloud wins 1.1:1 (there’s a famous Sioux named “Red Cloud”); in Korean, white cloud wins 1.2:1, but four-leaf clover wins 2:1 over three-leaf clover.

Thereafter, Hal accurately points out:

“co-occurance frequencies of words definitely do not reflect co-occurance frequencies of things in the real world”

But the mistake made by Hal is to assume language describes objective reality (“the real world”). Instead, I would argue that it describes social reality (“the social world”).

Black sheep in social reality. The higher occurence of ‘black sheep’ tells us that in social reality, there is a concept called ‘black sheep’ which is more common than the concept of white (or any color) sheep. People are using that concept, not to describe sheep, but as an abstract concept in fact describing other people (“she is the black sheep of the family”). Then, we can ask: Why is that? In what contexts is the concept used? And try to teach the machine its proper use through associations of that concept to other contexts (much like we teach kids when saying something is appropriate and when not). As a result, the machine may create a semantic web of abstract concepts which, if not leading to it understanding them, at least helps in guiding its usage of them.

We, the human. That’s assuming we want it to get closer to the meaning of the word in social reality. But we don’t necessarily want to focus on that, at least as a short-term goal. In the short-term, it might be more purposeful to understand that language is a reflection of social reality. This means we, the humans, can understand human societies better through its analysis. Rather than trying to teach machines to imputate data to avoid what we label an undesired state of social reality, we should use the outputs provided by the machine to understand where and why those biases take place. And then we should focus on fixing them. Most likely, technology plays only a minor role in that.

Conclusion. The “correction of biases” is equivalent to burying your head in the sand: even if they magically disappeared from our models, they would still remain in the social reality, and through the connection of social reality and objective reality, echo in the everyday lives of people.

How to teach machines common sense? Solutions for ambiguity problem in artificial intelligence


The ambiguity problem illustrated:

User: “Siri, call me an ambulance!”

Siri: “Okay, I will call you ‘an ambulance’.”

You’ll never reach the hospital, and end up bleeding to death.


Two potential solutions:

A. machine builds general knowledge (“common sense”)

B. machine identifies ambiguity & asks for clarification from humans

The whole “common sense” problem can be solved by introducing human feedback into the system. We really need to tell the machine what is what, just like a child. It is iterative learning, in which trials and errors take place.

But, in fact, A. and B. converge by doing so. Which is fine, and ultimately needed.

Contextual awareness

To determine which solution to an ambiguous situation is proper, the machine needs contextual awareness; this can be achieved by storing contextual information from each ambiguous situation, and being explained “why” a particular piece of information results in disambiguity. It’s not enough to say “you’re wrong”, but there needs to be an explicit association to a reason (concept, variable). Equally, it’s not enough to say “you’re right”, but again the same association is needed.

The process:

1) try something

2) get told it’s not right, and why (linking to contextual information)

3) try something else, corresponding to why

4) get rewarded, if it’s right.

The problem is, currently machines are being trained by data, not by human feedback.

New thinking on teaching the machine

So we would need to build machine-training systems which enable training by direct human feedback, i.e. a new way to teach and communicate with the machine. It’s not a trivial thing, since the whole machine-learning paradigm is based on data. From data and probabilities, we would need to move into associations and concepts. A new methodology is needed. Potentially, individuals could train their own AIs like pets (think Tamagotchi), or we could use large numbers of crowd workers who would explain the machine why things are how they are (i.e., create associations). A specific type of markup (=communication) would probably also be needed.

Through mimicking human learning we can teach the machine common sense. This is probably the only way; since common sense does not exist beyond human cognition, it can only be learnt from humans. An argument can be made that this is like going back in time, to era where machines followed rule-based programming (as opposed to being data-driven). However, I would argue rule-based learning is much closer to human learning than the current probability-based one, and if we want to teach common sense, we therefore need to adopt the human way.

Conclusion: machines need education

Machine learning may be at par, but machine training certainly is not. The current machine learning paradigm is data-driven, whereas we could look into ways for concept-driven training approaches.

“Please explain Support Vector Machines (SVM) like I am a 5 year old.”

“Please explain Support Vector Machines (SVM) like I am a 5 year old.” #analytics #machinelearning #modeling

Courtesy of @copperking at Reddit:

Direct quotation from Reddit:

  1. “We have 2 colors of balls on the table that we want to separate.
  2. We get a stick and put it on the table, this works pretty well right?
  3. Some villain comes and places more balls on the table, it kind of works but one of the balls is on the wrong side and there is probably a better place to put the stick now.
  4. SVMs try to put the stick in the best possible place by having as big a gap on either side of the stick as possible.
  5. Now when the villain returns the stick is still in a pretty good spot.
  6. There is another trick in the SVM toolbox that is even more important. Say the villain has seen how good you are with a stick so he gives you a new challenge.
  7. There’s no stick in the world that will let you split those balls well, so what do you do? You flip the table of course! Throwing the balls into the air. Then, with your pro ninja skills, you grab a sheet of paper and slip it between the balls.
  8. Now, looking at the balls from where the villain is standing, they balls will look split by some curvy line.

Boring adults the call balls data, the stick a classifier, the biggest gap trick optimization, call flipping the table kernelling and the piece of paper a hyperplane.”

Some reflections:

Well, for a practice-oriented guy the first question obviously is: so what? What can you do with it in practice?

I think it boils down to the nature of classification algorithms. They are quite widely used, e.g. in image or text recognition. So, machine can better learn how to differentiate between an orange and an apple, for example. This of course leads into multiple efficiency advantages, when we are able to replace human classifiers in many jobs.

In conclusion, in my quest to understand machine learning it has become obvious that support vector machine is not the easiest concept to start from. However, since classification is an essential area in machine learning, one cannot avoid it for too long.

A.I. – the next industrial revolution?


Many workers are concerned about “robotization” and “automatization” taking away their jobs. Also the media has been writing actively about this topic lately, as can be seen in publications such as New York Times and Forbes.

Although there is undoubtedly some dramatization in the scenarios created by the media, it is true that the trend of automatization took away manual jobs throughout the 20th century and has continued – perhaps even accelerated – in the 21st century.

Currently the jobs taken away by machines are manual labor, but what happens if machines take away knowledge labor as well? I think it’s important to consider this scenario, as most focus has been on the manual jobs, whereas the future disruption is more likely to take place in knowledge jobs.

This article discusses what’s next – in particular from the perspective of artificial intelligence (A.I.). I’ve been developing a theory about this topic for a while now. (It’s still unfinished, so I apologize the fuzziness of thought…)

 Theory on development of job markets

My theory on development of job markets relies on two key assumptions:

  1. with each development cycle, less people are needed
  2. and the more difficult is for average people to add value

The idea here is that while it is relatively easy to replace a job taken away by simple machines (sewing machines still need people to operate them), it is much harder to replace jobs taken away by complex machines (such as an A.I.) providing higher productivity. Consequently, less people are needed to perform the same tasks.

By “development cycles”, I refer to the drastic shift in job market productivity, i.e.

craftmanship –> industrial revolution –> information revolution –> A.I. revolution

Another assumption is that the labor skills follow the Gaussian curve. This means most people are best suited for manual jobs, while information economy requires skills that are at the upper end of that curve (the smartest and brightest).

In other words, the average worker will find it more and more difficult to add value in the job market, due to sophistication of the systems (a lot more learning is needed to add value than in manual jobs where the training requires a couple of days). Even currently, the majority of global workers best fit to manual labor rather than information economy jobs, and so some economies are at a major disadvantage (consider Greece vs. Germany).

Consistent to the previous definition, we can see the job market including two types of workers:

  • workers who create
  • workers who operate

The former create the systems as their job, whereas the latter operate them as their job. For example, in the sphere of online advertising, Google’s engineers create the AdWords search-engine advertising platform, which is then used by online marketers doing campaigns for their clients. At the current information economy, the best situation is for workers who are able to create systems – i.e. their value-added is the greatest. With an A.I, however, both jobs can be overtaken by machine intelligence. This is the major threat to knowledge workers.

The replacement takes place due to what I call the errare humanum est -effect (disadvantage of humans vis-à-vis machines), according to which a machine is always superior to job tasks compared to human which is an erratic being controlled by biological constraints (e.g., need for food and sleep). Consequently, even the brightest humans will still lose to an A.I.


Consider these examples:

  • Facebook has one programmer per 1.2 million users [1] and one employee per 249,000 users [2]
  • Rovio has one employee per 507,000 gamers [3]
  • Pinterest has one employee per 400,000 users [2]
  • Supercell have one employee per 193,000 gamers [4]
  • Twitter has one employee per 79,000 users [5]
  • Linkedin has one employee per 47,000 users [6]

(Some of these figures are a bit outdated, but in general they serve to support my argument.)

Therefore, the ratio of workers vs. customers is much lower than in previous transitions. To build a car for one customer, you need tens of manufacturing workers. To serve customers in a super-market, the ratio needs to be something like 1:20 (otherwise queues become too long). But when the ratio is 1:1,000,000, not many people are needed to provide a service for the whole market.

As can be seen, the mobile application industry which has been touted as a source of new employment does indeed create new jobs [7], but it doesn’t create them for masses. This is because not many people are needed to succeed in this business environment.

Further disintermediation takes place when platforms talk to each other, forming super-ecosystems. Currently, this takes place though an API logic (application programming interface) which is a “dumb” logic, doing only prescribed tasks, but an A.I. would dramatically change the landscape by introducing creative logic in API-based applications.

Which jobs will an A.I. disrupt?

Many professional services are on the line. Here are some I can think of.

1. Marketing managers 

An A.I. can allocate budget and optimize campaigns far more efficiently than erroneous humans. The step from Google AdWords and Facebook Ads to automated marketing solutions is not that big – at the moment, the major advantage of humans is creativity, but the definition of an A.I. in this paper assumes creative functions.

2. Lawyers 

An A.I. can recall all laws, find precedent cases instantly and give correct judgments. I recently had a discussion with one of my developer friends – he was particularly interested in applying A.I. into the law system – currently it’s too big for a human to comprehend, as there are thousands of laws, some of which contradict one another. An A.I. can quickly find contradicting laws and give all alternative interpretations. What is currently the human advantage is a sense of moral (right and wrong) which can be hard to replicate with an A.I.

3. Doctors 

An A.I. makes faster and more accurate diagnoses; a robot performs surgical operations without flaw. I would say many standard diagnoses by human doctors could be replaced by A.I. measuring the symptoms. There have been several cases of incorrect diagnoses due to hurry and the human error factor – as noted previously, an A.I. is immune to these limitations. The major human advantage is sympathy, although some doctors are missing even this.

4. Software developers

Even developers face extinction; upon learning the syntax, an A.I. will improve itself better than humans do. This would lead into exponentially accelerating increase of intellect, something commonly depicted in the A.I. development scenarios.

Basically, all knowledge professions if accessible to A.I. will be disrupted.

Which jobs will remain?

Actually, the only jobs left would be manual jobs – unless robots take them as well (there are some economic considerations against this scenario). I’m talking about low-level manual jobs – transportation, cleaning, maintenance, construction, etc. These require more physical material – due to aforementioned supply and demand dynamics, it may be that people are cheaper to “build” than robots, and therefore can still assume simple jobs.

At the other extreme, there are experience services offered by people to other people – massage, entertainment. These can remain based on the previous logic.

How can workers prepare?

I can think of a couple of ways.

First, learn coding – i.e. talking to machines. people who understand their logic are in the position to add value — they have an access to the society of the future, whereas those who are unable to use systems get disadvantaged.

The best strategy for a worker in this environment is continuous learning and re-education. From the schooling system, this requires a complete re-shift in thinking – currently most universities are far behind in teaching practical skills. I notice this every day in my job as a university teacher – higher education must catch up, or it will completely lose its value.

Currently higher education is shielded by governments through official diplomas appreciated by recruiters, but true skills trump such an advantage in the long run. Already at this moment I’m advising my students to learn from MOOCs (massive open online courses) rather than relying on the education we give in my institution.

What are the implications for the society?

At a global scale, societies are currently facing two contrasting mega-trends:

  • the increase of productivity through automatization (= lower demand for labor)
  • the increase of population (= higher supply of labor) (everyone has seen the graph showing population growth starting from 19th century [8])

It is not hard to see these are contrasting: less people are needed for the same produce, whereas more people are born, and thus need jobs. The increase of people is exponential, while the increase in productivity comes, according to my theory, in large shifts. A large shift is bad because before it takes place, everything seems normal. (It’s like a tsunami approaching – no way to know before it hits you.)

What are the scenarios to solve the mega-trend contradiction?

I can think of a couple of ways:

  1. Marxist approach – redistribution of wealth and re-discovery of “job”
  2. WYSIWYG approach – making the systems as easy as possible

By adopting a Marxist approach, we can see there are two groups who are best off in this new world order:

  • The owners of the best A.I. (system capital)
  • The people with capacity to use and develop A.I. further (knowledge capital)

Others, as argued previously, are at a disadvantage. The phenomenon is much similar to the concept of “digital divide” which can refer to 1) the difference of citizens from developed and developing countries’ access to technologies, or 2) the ability of the elderly vs. the younger to use modern technology (the latter have, for example, worse opportunities in high-tech job markets).

There are some relaxations to the arguments I’ve made. First, we need to consider that the increase of time people have as well as the general population increase create demand for services relating experiences and entertainment per se; yet, there needs to be consideration of re-distribution of wealth, as people who are unable to work need to consume to provide work for others (in other words, the service economy needs special support and encouragement from government vis-à-vis machine labor).

While it is a precious goal that everyone contribute in the society through work, the future may require a re-check on this protestant work ethic if indeed the supply of work drastically decreases. the major reason, in my opinion, behind the failure of policies reducing work hours such as the 35-hour work-week in France is that other countries besides these pioneers are not adopting them, and so they gain a comparative advantage in the global market. We are yet not in the stage where supply of labor is dramatically reduced at a global scale, but according to my theory we are getting there.

Secondly, a major relaxation, indeed, is that the systems can be usable by people who lack the understanding of their technical finesse. This method is already widely applied – very few understand the operating principles of the Internet, and yet can use it without difficulties. Even more complex professional systems, like Google AdWords, can be used without detailed understanding of the Google’s algorithm or Vickrey second-price sealed auctions.

So, dumbing things down is one way to go. The problem with this approach in the A.I. context is that when the system is smart enough to use itself, there is no need to dumb down – i.e., having humans use it would be a non-optimal use of resources. Already we can see this in some bidding algorithms in online advertising – the system optimizes better than people. At the moment we online marketers can add value through copywriting and other creative ways, but the upcoming A.I. would take away this advantage from us.


It is natural state of job markets that most workers are skilled only for manual labor or very simple machine work; if these jobs are lost, new way of organizing society is needed. Rather than fighting the change, societies should approach it objectively (which is probably one of the hardest things for human psychology).

My recommendations for the policy makers are as follows:

  • decrease cost of human labor (e.g., in Finland sometime in the 70s services were exempted from taxes – this scenario should help)
  • reduce employment costs – the situation is in fact perverse, as companies are penalized through side costs if they recruit workers. In a society where demand of labor is scarce, the reverse needs to take place: companies that recruit need to be rewarded.
  • retain/introduce monetary transfers à la welfare societies – because labor is not enough for everyone, the state needs to pass money from capital holders to underprivileged. The Nordic states are closer to a working model than more capitalistic states such as the United States.
  • push education system changes – because skills required in the job market are more advanced and more in flux than previously, the curriculum substance needs to change faster than it currently does. Unnecessary learning should be eliminated, while focusing on key skills needed in the job market at the moment, and creating further education paths to lifelong learning.

Because the problem of reducing job demand is not acute, these changes are unlikely to take place until there is no other choice (which is, by the way, the case for most political decision making).

Open questions

Up to which point can the human labor be replaced? I call it the point of zero human when no humans are needed to produce an equal or larger output than what is being produced at an earlier point in time. The fortune of humans is that we are all the time producing more – if the production level was at the stage of 18th century, we would already be in the point of zero human. Therefore, job markets are not developing in a predictable way towards point of zero human, but it may nevertheless be a stochastic outcome of the current development rate of technology. Ultimately, time will tell. We are living exciting times.