June 26, 2017
Introduction. In this post, I’m exploring the usefulness of personas in digital analytics. At Qatar Computing Research Institute (QCRI), we have developed a system for automatic persona generation (APG) – see the demo. Under the leadership of Professor Jim Jansen, we’re constantly working to position this system toward the intersection of customer profiles, personas, and analytics.
Three levels of data. Imagine three levels of data:
Which one is the best? The answer: it depends.
Case of advertising. For advertising, usually the more individual the data, the better. The worst case is the mass advertising, where there is one message for everyone: it fails to capture the variation of preferences and tastes of the underlying audience, and is therefore inefficient and expensive. Group-based targeting, i.e. market segmentation (“women 25-34”) performs better because it is aligning the product features with the audience features. Here, the communalities of the target group allow marketers to create more tailored and effective messages, which results in less wasted ad impressions.
Case of design and development. In a similar vein, design is moving towards experimentation. You have certain conventions, first of all, that are adopted industry-wide in the long run (e.g., Amazon adopts a practice and small e-commerce sites follow suit). Many prefer being followers, and it works for the most part. But multivariate testing etc. can reveal optimal designs better than “imagining a user” or simply following conventions. Of course, personas, just like any other immersive technique, can be used as a source of inspiration and ideas. But they are just one technique, not the technique.
For example, in the case of mobile startups I would recommend experimentation over personas. A classic example is Instagram that found from data that filters were a killer feature. For such applications, it makes sense to define an experimental feature set, and adjust if based on behavioral feedback from the users.
Unfortunately, startup founders often ignore systematic testing because they have a pre-defined idea of the user (à la persona) and are not ready to get their ideas challenged. The more work is done to satisfy the imaginary user, the more harder it becomes to make out-of-the-box design choices. Yet, those kind of changes are required to improve not by small margins but by orders of magnitude. Eric Ries call this ‘sunk code fallacy’.
In my opinion, two symptoms predating such a condition can be seen when the features are not
In contrast, iterative (=repetitive) analysis of performance of each feature is a modern way to design mobile apps and websites. Avoiding the two symptoms is required for systematic optimization. Moreover, testing the features does not need to take place in parallel, but it can be one by one as sequential testing. This can in fact be preferable to avoid ‘feature creep’ (clutter) that hinders the user experience. However, for sequential testing it is preferable to create a testing roadmap with a clear schedule – otherwise, it is too easy to forget about testing.
Strategic use cases show promise. So, what is left for personas? In the end, I would say strategic decision making is very promising. Tactical and operational tasks are often better achieved by using either completely individual or completely aggregated data. But individual data is practically useless at strategic decision making. Aggregated data is useful, e.g. sales by region or customer segment, and it is hard to see anything replace that. However, personas are in between the two – they can provide more understanding on the needs and wants of the market, and act as anchor points for decision making.
Strategic decision aid is also a lucrative space; companies care less about the cost, because the decisions they make are of great importance. To correctly steer the ship, executives need need accurate information about customer preferences and have clear anchor points to align their strategic decision with (see the HubSpot case study).
In addition, aggregated analytics systems have one key weakness. They cannot describe the users very well. Numbers do not include information such as psychographics or needs, because they need to be interpreted from the data. Customer profiles are a different thing — in CRM systems, enrichment might be available but again the number of individual profiles is prohibitive for efficient decision making.
Conclusion. The more we are moving towards real-time optimization, the less useful a priori conceptualizations like target groups and personas become for marketing and design. However, they are likely to remain useful for strategic decision making and as “aggregated people analytics” that combine the coverage of numbers and the details of customer profiles. The question is: can we build personas that include the information of customer profiles, while retaining the efficiency of using large numbers? At QCRI, we’re working everyday toward that goal.
March 30, 2017
Attribution modelling is like digital magic.
Wow, so I’m reading a great piece by Funk and Nabout (2015) . They outline the main problems of attribution modelling. By “standard”, I refer to the commonly used method of attribution modelling, most commonly known from Google Analytics.
Previously, I’ve addressed this issue in my digital marketing class by saying that the choice of an attribution model is arbitrary, i.e. marketers can freely decide whether it’s better to use e.g. last-click model or first-click model. But now I realized this is obviously a wrong approach — given that the impact of each touch-point can be estimated. There is much more depth to attribution modelling than the standard model leads you to believe.
So, here are the five problems by Funk and Nabout (2015).
This is the main problem to me. The impact of touch-points on conversion value needs to be weighed but it is seemingly an arbitrary rather than a statistically valid choice (that is, until we consider advanced methods!). Therefore, there is no objective rank or “betterness” between different attribution models.
The standard attribution model does not consider the time interval between touch-points – it can range anywhere from 30 minutes to 90 days, restricted only by cookie duration. Why does this matter? Because time generally matters in consumer behavior. For example, if there is a long interval between contacts A_t and A_t+1, it may be that the effect of the first contact was not very powerful to incite a return visit. Of course, one could also argue there is a reason not to consider time, because any differences arise due to discrepancy of the natural decision-making process of the consumers which results in unknown intervals. Ignoring time would then standardize the intervals. However, if we assume patterns in consumers’ decision-making process, as it is usually done by stating that “in our product category, the purchase process is short, usually under 30 days”, then addressing time differences could yield a better forecast, say we should expect a second contact to take place at a certain point in time given our model of consumer behavior.
The nature of the touch or interaction should be considered when modeling customer journey. The standard attribution model assigns conversion value for different channels based on clicks, but the type of interaction in channels might be mixed. For example, for one conversion you might get a view in Facebook and click in AdWords whereas another conversion might have the reverse. But are views and clicks equally valuable? Most marketers would not say so. However, they would also assign some credit to views – at least according to classic advertising theory, visibility has an impact on advertising performance. Therefore, the attribution model should also consider several interaction types and the impact each type has on conversion propensity.
As Funk and Nabout (2015) note, “the analysis does not compare successful and unsuccessful customer journeys, [but] only looks at the former.” This is essentially a case of survivor bias – we are unable to compare those touch-points that lead to a conversion to those that did not. By doing so, we could observe that a certain channel has a higher likelihood to be included in a conversion path  than another channel, i.e. its weight should be higher and proportional to its ability to produce lift in the conversion rate. Excluding information on unsuccessful interaction, we risk getting Type I and Type II errors – that is, false negatives and positives.
The standard attribution model does not consider offline interactions. But research shows multi-channel consumer behavior is highly prevalent. The lack of data on these interactions is the major reason behind exclusion, but the the same it restricts the usefulness of attribution modelling to ecommerce context. Most companies, therefore, are not getting accurate information with attribution modelling beyond the online environment. And, as I’ve argued in my class, word-of-mouth is not included in the standard model either, and that is a major issue for accuracy, especially considering social media. Even if we want to measure the performance of advertising channel, social media ads have a distinct social component – they are shared and commented on, which results in additional interactions that should be considered when modeling customer journey.
I’m still finishing reading the original article, but had to write these few lines because the points I encountered were poignant. Next I’m sure they will propose solutions, and I may update this article afterwards. At this point, I can only state two solutions that readily come to mind: 1) the use of conversion rate (CVR) as an attribution parameter — it’s a global metric and thus escapes survivorship bias; and 2) Universal Analytics, i.e. using methods such as Google’s Measurement Protocol to capture offline interactions. As someone smart said, solution to a problem leads to a new problem and that’s the case here as well — there needs to a universal identifier (“User ID” in Google’s terms) to associate online and offline interactions. In practice, this requires registration.
The criticism applies to standard attribution modeling, e.g. to how it is done in Google Analytics. There might be additional issues not included in the paper, such as aggregate data — to perform any type of statistical analysis, click-stream data is a must have. Also, a relevant question is: How do touch-points influence one another? And how to model that influence? Beyond technicalities, it is important for managers to understand the general limitations of current methods of attribution modelling and seek solutions in their own organizations to overcome them.
 Funk, B., & Abou Nabout, N. (2016). Cross-Channel Real-Time Response Analysis. in O. Busch (Hrsg.), Programmatic Advertising: The Successful Transformation to Automated, Data-Driven Marketing in Real-Time. (S. 141-151). Springer-Verlag.
 Conversion path and customer journey are essentially referring to the same thing; perhaps with the distinction that conversion path is typically considered to be digital while customer journey has a multichannel meaning.
March 30, 2017
This post is based on Dr. Elina Jaakkola’s presentation “What is co-creation?” on 19th August, 2014. I will elaborate on some of the points she made in that presentation.
Customer research, a sub-form of market research, serves the purpose of acquiring customer insight. Often, when pursuing information from consumers
companies use surveys. Surveys, and usage of customers as a source of information, have some particular problems discussed in the following.
1. Hidden needs
Customers have latent or hidden needs that they do not express, perhaps due to social reasons (awkwardness) or due to the fact of them not knowing what is technically possible (unawareness). If one is not specifically asking about a need, it is easily left unmentioned, even if it has great importance for the customer. This problem is not easily solved, since even the company may not be aware of all the possibilities in the technological space. However, if the purpose is to learn about the customers, a capability of immersion and sympathy is needed.
2. Reporting bias
What customers report they would do is not equivalent to their true behavior. They might say one thing, and do something entirely different. In research, this is commonly known as reporting bias. It is a major problem when carrying out surveys. The general solution is to ask about past, not future behavior, although even this approach is subject to recall bias.
3. Interpretation problem
Consumers answers to surveys can misinterpret the questions, and analysts can also misinterpret their answers. It is difficult to vividly present choices of hypothetical products and scenarios to consumers, and therefore the answers one receives may not be accurate. A general solution is to avoid ambiguity in the framing of questions, so that everything is commonly known and clear to both the respondent and the analyst (shared meanings).
4. Loud minority
This is a case where a minority, for being more vocal, creates a false impression of needs of the whole population. For example, in social media this effect may easily take place. A general rule of thumb is that only 1% of members of a community actively participates in a given discussion while other 99% merely observe. It is easy to see consumers who are the loudest get their opinions out, but this may not represent the needs of the silent majority. The solution would be stratification, where one distinguishes different groups from one another so as to form a more balanced view of the population. This works when there is an adequate participation among strata. Another alternatively would be actively seek out non-vocal customers.
Generally, the mentioned problems relate to stated preferences. When we are using customers as a source of information, all kinds of biases emerge. That is why behavioral data, not dependent on what customers say, is a more reliable source of information. Thankfully, in digital environments it is possible to obtain behavioral data with much more ease than in analogue environments. The problems of it emerge from representativeness and on the other hand fitting it to other forms of data so as to gain a more granular understanding of the customer base.
March 30, 2017
Planning makes happy people.
Media planning, or campaign planning in general, requires you to set goal metrics, so that you are able to communicate the expected results to a client. In digital marketing, these are metrics like clicks, impressions, costs, etc. The actual planning process usually involves using estimates — that is, sophisticated guesses of some sorts. These estimates may be based on your previous experience, planned goal targets (when for example given a specific business goal, like sales increase), or industry averages (if those are known).
By knowing or estimating some goal metrics, you are able to calculate others. But sometimes it’s hard to remember the formulas. This is a handy list to remind you of the key formulas.
In general, metrics relating to impressions are used as proxies for awareness and brand related goals. Metrics relating to clicks reflect engagement, while conversions indicate behavior. Oftentimes, I estimate CTR, CVR and CPC because 1) it’s good to set a starting goal for these metrics, and 2) they exhibit some regularity (e.g., ecommerce conversion rate tends to fall between 1-2%).
You don’t have to know everything to devise a sound digital media plan. A few goal metrics are enough to calculate all the necessary metrics. The more realistic your estimates are, the better. Worry not, accuracy will get better in time. In the beginning, it is best to start with moderate estimates you feel comfortable in achieving, or even outperforming. It’s always better to under-promise than under-perform. Finally, the achieved metric values differ by channel — sometimes a lot — so take that into consideration when crafting your media plan.
March 29, 2017
Carryover effects in marketing are a tricky beast. On one hand, you don’t want to prematurely judge a campaign because the effect of advertising may be delayed. On the other hand, you don’t want bad campaigns to be defended with this same argument.
What’s the solution then? They need to be quantified, or didn’t exist. Some ways to quantify are available in Google Analytics:
All these methods, however, are retrospective in nature. Predicting carryover effects is notoriously hard, and I’m not sure it would even be possible with such accuracy that it should be pursued.
In conclusion, I’d advise against being too hasty in drawing conclusion about campaign performance. This way you avoid the problem of premature judgment. The problem of shielding inferior campaigns can be tackled by using other proxy metrics of performance, such as the bounce rate. This would effectively tell you whether a campaign has even a theoretical chance of providing positive carryover effects. Indeed, regarding the prediction problem, proving the association between high bounce rate and low carryover effects would enforce this “rule of thumb” even further.
Dr. Joni Salminen holds a PhD in marketing from the Turku School of Economics. His research interests relate to startups, platforms, and digital marketing.
Contact email: [email protected]
March 29, 2017
Here’s a list of analytics problems I’ve devised for a class I was teaching a digital analytics course (Web & Mobile Analytics, Information Technology Program) at Aalto University in Helsinki. Some solutions to them are also considered.
Want to add something to this list? Please write in the comments!
[edit: I’m compiling a larger list of analytics problems. Will update this post once it’s ready.]
I’m into digital marketing, startups, platforms. Download my dissertation on startup dilemmas: http://goo.gl/QRc11f
March 29, 2017
This post applies to cases satisfying two conditions.
First, you have a simple landing page designed for immediate action (=no further clicks). This can be the case for many marketing campaigns for which we design a landing page without navigation and a very simple goal, such as learning about a product or watching a video.
Second, you have a high bounce rate, indicating a bad user experience. Bounce rate is calculated as follows:
visitors who leave without clicking further / all visitors
It’s a proxy for it. A high bounce rate simply means a lot of people leave the website without clicking further. This usually indicates bad relevance: the user was expecting something else, didn’t find, and so leaves the site immediately.
For search engines a high bounce rate indicates bad landing page relevance vis-à-vis a given search query (keyword), as the user immediately returns to the SERP (search-engine result page). Search engines, such as Google, would like to offer the right solution for a given search query as fast as possible to please their users, and therefore a poor landing page experience may lead to lower ranking for a given website in Google.
I’ll give a simple example. Say you have a landing page with only one call-to-action, such as viewing a video. You then have a marketing campaign resulting to ten visitors. After viewing the video, all ten users leave the site.
Now, Google Analytics would record this as 100% bounce rate; everyone left without clicking further. Moreover, the duration of the visits would be recorded as 0:00, since the duration is only stored after a user clicks further (which didn’t happen in this case).
So, what should we conclude as site owners when looking at our statistics? 100% bounce: that means either that a) our site sucks or b) the channel we acquired the visitors from sucks. But, in the previous case it’s an incorrect conclusion; all of the users watched the video and so the landing page (and marketing campaign associated with it) was in fact a great success!
I will show four solutions to improve your measurement of user experience through bounce rate.
First, simply create an event that pings your analytics software (most typically Google Analytics) when a user makes a desired on-page action (e.g. video viewing). This removes users who completed a desired action but still left without clicking further from the bounce rate calculation.
Here are Google’s instructions for event tracking.
Second, ping GA based on visit duration, e.g. create an event of spending one minute on the page. This will in effect lower your reported bounce rate by degree of users who stay at least a minute on the landing page.
Third, create a form. Filling a form directs the user to another site which then triggers an event for analytics. In most cases, this is also compatible with our condition of a simple landing page with one CTA (well, if you have a video and a form that’s two actions for a user, but in most cases I’d say it’s not too much).
Finally, there is a really cool Analytics plugin by Rob Flaherty called Scrolldepth (thanks Tatu Patronen for the tip!). It pings Google Analytics as users scroll down the page, e.g. by 25%, 75% and 100% intervals. In addition to solving the bounce problem, it also gives you more data on user behavior.
Note that adding event tracking to reduce bounce rate only reduces it in your analytics. Search-engines still see bounce as direct exits, and may include that in their evaluation of landing page experience. Moreover, individual solutions have limitations – creation of a form is not always natural given the business, or it may create additional incentive for the user; and Scrolldepth is most useful in lengthy landing pages, which is not always the case.
I’m into digital marketing, startups, platforms. Download my dissertation on startup dilemmas: http://goo.gl/QRc11f