Skip to content

Tag: analytics

Argument: Personas lose to ‘audience of one’

Introduction. In this post, I’m exploring the usefulness of personas in digital analytics. At Qatar Computing Research Institute (QCRI), we have developed a system for automatic persona generation (APG) – see the demo. Under the leadership of Professor Jim Jansen, we’re constantly working to position this system toward the intersection of customer profiles, personas, and analytics.

Three levels of data. Imagine three levels of data:

  •  customer profiles (individual)
  • personas (aggregated individual)
  • statistics (aggregated numbers: tables and charts)

Which one is the best? The answer: it depends.

Case of advertising. For advertising, usually the more individual the data, the better. The worst case is the mass advertising, where there is one message for everyone: it fails to capture the variation of preferences and tastes of the underlying audience, and is therefore inefficient and expensive. Group-based targeting, i.e. market segmentation (“women 25-34”) performs better because it is aligning the product features with the audience features. Here, the communalities of the target group allow marketers to create more tailored and effective messages, which results in less wasted ad impressions.

Case of design and development. In a similar vein, design is moving towards experimentation. You have certain conventions, first of all, that are adopted industry-wide in the long run (e.g., Amazon adopts a practice and small e-commerce sites follow suit). Many prefer being followers, and it works for the most part. But multivariate testing etc. can reveal optimal designs better than “imagining a user” or simply following conventions. Of course, personas, just like any other immersive technique, can be used as a source of inspiration and ideas. But they are just one technique, not the technique.

For example, in the case of mobile startups I would recommend experimentation over personas. A classic example is Instagram that found from data that filters were a killer feature. For such applications, it makes sense to define an experimental feature set, and adjust if based on behavioral feedback from the users.

Unfortunately, startup founders often ignore systematic testing because they have a pre-defined idea of the user (à la persona) and are not ready to get their ideas challenged. The more work is done to satisfy the imaginary user, the more harder it becomes to make out-of-the-box design choices. Yet, those kind of changes are required to improve not by small margins but by orders of magnitude. Eric Ries call this ‘sunk code fallacy’.

In my opinion, two symptoms predating such a condition can be seen when the features are not

  1. connected to analytics, so that tracking of contribution of each is possible (in isolation & to the whole)
  2. iteratively analyzed with goal metrics, so that there is an ‘action->response->action’ feedback loop.

In contrast, iterative (=repetitive) analysis of performance of each feature is a modern way to design mobile apps and websites. Avoiding the two symptoms is required for systematic optimization. Moreover, testing the features does not need to take place in parallel, but it can be one by one as sequential testing. This can in fact be preferable to avoid ‘feature creep’ (clutter) that hinders the user experience. However, for sequential testing it is preferable to create a testing roadmap with a clear schedule – otherwise, it is too easy to forget about testing.

Strategic use cases show promise. So, what is left for personas? In the end, I would say strategic decision making is very promising. Tactical and operational tasks are often better achieved by using either completely individual or completely aggregated data. But individual data is practically useless at strategic decision making. Aggregated data is useful, e.g. sales by region or customer segment, and it is hard to see anything replace that. However, personas are in between the two – they can provide more understanding on the needs and wants of the market, and act as anchor points for decision making.

Strategic decision aid is also a lucrative space; companies care less about the cost, because the decisions they make are of great importance. To correctly steer the ship, executives need need accurate information about customer preferences and have clear anchor points to align their strategic decision with (see the HubSpot case study).

In addition, aggregated analytics systems have one key weakness. They cannot describe the users very well. Numbers do not include information such as psychographics or needs, because they need to be interpreted from the data. Customer profiles are a different thing — in CRM systems, enrichment might be available but again the number of individual profiles is prohibitive for efficient decision making.

Conclusion. The more we are moving towards real-time optimization, the less useful a priori conceptualizations like target groups and personas become for marketing and design. However, they are likely to remain useful for strategic decision making and as “aggregated people analytics” that combine the coverage of numbers and the details of customer profiles. The question is: can we build personas that include the information of customer profiles, while retaining the efficiency of using large numbers? At QCRI, we’re working everyday toward that goal.

Agile methods for predicting contest outcomes by social media analysis

People think, or seem to assume, that there is some magical machine that spits out accurate predictions of future events from social media data. There is not, and that’s why each credible analysis takes human time and effort. But therein also lies the challenge: when fast decisions are needed, time-taking analyses reduce agility. Real-time events would require real-time analysis, whereas data analysis is often cumbersome and time-taking effort, including data collection, cleaning, machine training, etc.

It’s a project for weeks or days, not for hours. All the practical issues of the analysis workflow make it difficult to provide accurate predictions at a fast pace (although there are other challenges as well).

An example is Underhood.co – they predicted Saara Aalto to win X-Factor UK based on social media sentiment, but ended up being wrong. While there are many potential reasons for this, my conclusion is that their indicators lack sufficient predictive power. They are too reliant on aggregates (in this case country-level data), and had a problematic approach to begin with – just like with any prediction, the odds change on the go as new information becomes available, so you should never predict the winner weeks ahead. Of course, theirs was just a publicity stunt where they hoped being right would prove the value of their service. Another example, of course, is the US election where prediction markets were completely wrong of the outcome. That was, according to my theory, because of wrong predictors – polls ask what is your preference or what you would do, whereas social media engagement shows what people do (in social media), and as such are closer to real behavior, hence better predictors.

Even if I do think human analysts are still needed in the near future, more solutions for quick collection and analysis of social media data are needed, especially to combine the human and machine work in the best possible way. Some of these approaches can be based on automation, but others can be methodological, such as quick definition of relevant social media outlets for sampling.

Here are some ideas I have been thinking of:

I. Data collection

  1. Quick definition of choice space (e.g., candidates in a political election, X-Factor contestants)
  2. Identification of related media social media outlets (i.e., communities, topic hashtags)
  3. Collecting sample (API, scraping, or copy-paste (crowdsourcing))

Each part is case-dependent and idiosyncratic – for whatever event, I’m thinking competitions here, you have to this work from scratch. Ultimately, you cannot get the whole Internet as your data, but you want the sample to be as representative as possible. For example, it was obvious that Twitter users showed much more negative sentiment towards Trump than Facebook users, and in both platforms you had supporter groups/topic concentrations that should first be identified before any data collection. Then, the actual data collection is tricky. People again seem to assume all data is easily accessible. It’s not – while Twitter and Facebook have an API, Youtube and Reddit don’t, for example. This means the comments that you use for predicting the outcome (by analyzing their relative share of the total as well as the strength of the sentiment beyond pos/neg) need to be fetched either by webscraping or manually copying them to a spreadsheet. Due to large volumes of data, crowdsourcing could be useful — e.g., setting up a Google Sheet where crowdworkers each paste the text material in clean format. The raw text content, e.g. tweets, Facebook comments, Reddit comments, is put in separate sheets for each candidate.

II. Data analysis

  1. Cluster visualization (defining clusters, visualizing their respective sizes (plot # of voters), breakdown by source platform and potential other factors)
  2. Manual training (classifying the sentiment, or “likelihood to vote”)
  3. Machine classification (calculating the number of likely voters)

In every statistical analysis, the starting point should be visualizing the data. This shows an aggregate “helicopter view” of the situation. Such a snapshot is useful also for demonstrating the results for the end-user, to let the data speak for itself. Candidates are bubbles in the chart, their sizes in respect to the number of calculated likely voters. The data could be broken down according to source platforms, or other factors, by using the candidate as a point of gravity for the cluster.

Likelihood to vote could be classified as a scale, not binary. That is, instead of saying “sentiment is positive: YES/NO”, we could say “How likely is the person to vote?” which is the same as asking how enthusiastic or engaged he or she is. Therefore, a scale is better, e.g. ranging from -5 (definitely not voting for this candidate) to +5 (definitely voting for this candidate). The manual training, which also could be done with the help of crowd, helps the machine classifier to improve its accuracy on the go. Based on training data, it would generalize classification to all material. Now, the material is bucketed so that each candidate is evaluated separately and the number of likely voters can be calculated. It is possible that the machine classifier could benefit from training input from both candidates, inasmuch the language showing positive and negative engagement is not significantly different.

It is important to note that negative sentiment does not really matter. What we are interested in is the number of likely voters. This is because of the election dynamics – it does not matter how poor a candidates aggregate sentiment is, i.e. the ratio between haters and sympathizers, as long as his or her number of likely voters is higher than that of the competition. This effect was evident in the recent US presidential election.

The crucial thing is keep the process alive during the whole election/competition period. There is no such point where it becomes certain that one loses and the other remains, although the divide can become substantial and therefore increase the accuracy of the prediction.

III. Presentation of results

  • constantly updating feed (à la Facebook video stream)
  • cluster visualization
  • search trend widget (source: Google Trends)
  • live updating predictions (manual training –> machine model)

The results could be shown as a form of dashboard to the end user. Search trend graph and the above mentioned cluster visualization could be viable alternatives. In addition, it would be interesting to see the count of voters evolving in time – in such a way that it, along with the visualization, could be “played back” to examine the development in time. In other words, interactive visualization. As noted, the prediction, or the count of likely votes, should update real-time as a result of combined human-machine work.

Conclusion and discussion

The idea behind development of more agile methods to use social media data to predict content outcomes is that the accuracy of the prediction is based on the choice of indicators rather than the finesse of the method. For example, complex Bayesian models falsely predicted Hillary Clinton would win the election. It’s not that the models were poorly built; they just used the wrong indicators, namely polling data. This is the usual case of ‘garbage in, garbage out’, and it shows that the choice of indicators is more important than technical features of the predictive model.

The choice of indicators should be done based on their predictive power and although I don’t have strict evidence on it, it intuitively makes sense that social media engagement is a stronger indicator in many instances than survey data, because it’s based on actual preferences instead of stated preferences. Social scientists know from long tradition of survey research that there are myriad of social effects reducing the reliability of data (e.g., social desirability bias). Those, I would argue, are much smaller issue in social media engagement data.

However, to be fair, there can be issues of bias in the social media engagement data. The major concern is low participation rate: a common heuristic is that 1/10 of participants actually contribute in writing, while the other 9/10 are readers whose real thoughts remain unknown. It’s then a question of how well does the vocal minority reflect the opinion of the silent majority. Or, in some cases, this is irrelevant for competitions if the overall voting share remains low. For example, if it’s 60% it is relative much more important to mobilize the active base than if voting was close to 100% where one would need a universal acceptance.

Another issue is the non-representative sampling. This is a concern when the voting takes place offline, and online data does not accurately reflect the voting of those who do not express themselves online. However, as social media participation is constantly increasing, this becomes less of a problem. In addition, compared to other methods of data collection – apart from stratified polling, perhaps – social media is likely to give a good result on competitive predictions because of their political nature. People who strongly support a candidate are more likely to be vocal about it, and the channel for voicing their opinion is the social media.

It is evident that the value of social media engagement as a predictor is currently underestimated, as proven by the large emphasis put on political polls and virtually zero discussion on social media data. As a direct consequence of this, those who are able to leverage the social media data in the proper way will gain competitive advantage, be it betting market, or any other purpose where prediction accuracy plays a key role. The prediction work will remain a hybrid effort by man and machine.

Problems of standard attribution modelling

Attribution modelling is like digital magic.

Introduction

Wow, so I’m reading a great piece by Funk and Nabout (2015) [1]. They outline the main problems of attribution modelling. By “standard”, I refer to the commonly used method of attribution modelling, most commonly known from Google Analytics.

Previously, I’ve addressed this issue in my digital marketing class by saying that the choice of an attribution model is arbitrary, i.e. marketers can freely decide whether it’s better to use e.g. last-click model or first-click model. But now I realized this is obviously a wrong approach — given that the impact of each touch-point can be estimated. There is much more depth to attribution modelling than the standard model leads you to believe.

Five problems of standard attribution modelling

So, here are the five problems by Funk and Nabout (2015).

1. Giving touch-points accurate credit

This is the main problem to me. The impact of touch-points on conversion value needs to be weighed but it is seemingly an arbitrary rather than a statistically valid choice (that is, until we consider advanced methods!). Therefore, there is no objective rank or “betterness” between different attribution models.

2. Disregard for time

The standard attribution model does not consider the time interval between touch-points – it can range anywhere from 30 minutes to 90 days, restricted only by cookie duration. Why does this matter? Because time generally matters in consumer behavior. For example, if there is a long interval between contacts A_t and A_t+1, it may be that the effect of the first contact was not very powerful to incite a return visit. Of course, one could also argue there is a reason not to consider time, because any differences arise due to discrepancy of the natural decision-making process of the consumers which results in unknown intervals. Ignoring time would then standardize the intervals. However, if we assume patterns in consumers’ decision-making process, as it is usually done by stating that “in our product category, the purchase process is short, usually under 30 days”, then addressing time differences could yield a better forecast, say we should expect a second contact to take place at a certain point in time given our model of consumer behavior.

3. Ignoring interaction types

The nature of the touch or interaction should be considered when modeling customer journey. The standard attribution model assigns conversion value for different channels based on clicks, but the type of interaction in channels might be mixed. For example, for one conversion you might get a view in Facebook and click in AdWords whereas another conversion might have the reverse. But are views and clicks equally valuable? Most marketers would not say so. However, they would also assign some credit to views – at least according to classic advertising theory, visibility has an impact on advertising performance. Therefore, the attribution model should also consider several interaction types and the impact each type has on conversion propensity.

4. Survivorship bias

As Funk and Nabout (2015) note, “the analysis does not compare successful and unsuccessful customer journeys, [but] only looks at the former.” This is essentially a case of survivor bias – we are unable to compare those touch-points that lead to a conversion to those that did not. By doing so, we could observe that a certain channel has a higher likelihood to be included in a conversion path [2] than another channel, i.e. its weight should be higher and proportional to its ability to produce lift in the conversion rate. Excluding information on unsuccessful interaction, we risk getting Type I and Type II errors – that is, false negatives and positives.

5. Exclusion of offline data

The standard attribution model does not consider offline interactions. But research shows multi-channel consumer behavior is highly prevalent. The lack of data on these interactions is the major reason behind exclusion, but the the same it restricts the usefulness of attribution modelling to ecommerce context. Most companies, therefore, are not getting accurate information with attribution modelling beyond the online environment. And, as I’ve argued in my class, word-of-mouth is not included in the standard model either, and that is a major issue for accuracy, especially considering social media. Even if we want to measure the performance of advertising channel, social media ads have a distinct social component – they are shared and commented on, which results in additional interactions that should be considered when modeling customer journey.

Solutions

I’m still finishing reading the original article, but had to write these few lines because the points I encountered were poignant. Next I’m sure they will propose solutions, and I may update this article afterwards. At this point, I can only state two solutions that readily come to mind: 1) the use of conversion rate (CVR) as an attribution parameter — it’s a global metric and thus escapes survivorship bias; and 2) Universal Analytics, i.e. using methods such as Google’s Measurement Protocol to capture offline interactions. As someone smart said, solution to a problem leads to a new problem and that’s the case here as well — there needs to a universal identifier (“User ID” in Google’s terms) to associate online and offline interactions. In practice, this requires registration.

Conclusion

The criticism applies to standard attribution modeling, e.g. to how it is done in Google Analytics. There might be additional issues not included in the paper, such as aggregate data — to perform any type of statistical analysis, click-stream data is a must have. Also, a relevant question is: How do touch-points influence one another? And how to model that influence? Beyond technicalities, it is important for managers to understand the general limitations of current methods of attribution modelling and seek solutions in their own organizations to overcome them.

References

[1] Funk, B., & Abou Nabout, N. (2016). Cross-Channel Real-Time Response Analysis. in O. Busch (Hrsg.), Programmatic Advertising: The Successful Transformation to Automated, Data-Driven Marketing in Real-Time. (S. 141-151). Springer-Verlag.

[2] Conversion path and customer journey are essentially referring to the same thing; perhaps with the distinction that conversion path is typically considered to be digital while customer journey has a multichannel meaning.

Customers as a source of information: 4 risks

Introduction

This post is based on Dr. Elina Jaakkola’s presentation “What is co-creation?” on 19th August, 2014. I will elaborate on some of the points she made in that presentation.

Customer research, a sub-form of market research, serves the purpose of acquiring customer insight. Often, when pursuing information from consumers
companies use surveys. Surveys, and usage of customers as a source of information, have some particular problems discussed in the following.

1. Hidden needs

Customers have latent or hidden needs that they do not express, perhaps due to social reasons (awkwardness) or due to the fact of them not knowing what is technically possible (unawareness). If one is not specifically asking about a need, it is easily left unmentioned, even if it has great importance for the customer. This problem is not easily solved, since even the company may not be aware of all the possibilities in the technological space. However, if the purpose is to learn about the customers, a capability of immersion and sympathy is needed.

2. Reporting bias

What customers report they would do is not equivalent to their true behavior. They might say one thing, and do something entirely different. In research, this is commonly known as reporting bias. It is a major problem when carrying out surveys. The general solution is to ask about past, not future behavior, although even this approach is subject to recall bias.

3. Interpretation problem

Consumers answers to surveys can misinterpret the questions, and analysts can also misinterpret their answers. It is difficult to vividly present choices of hypothetical products and scenarios to consumers, and therefore the answers one receives may not be accurate. A general solution is to avoid ambiguity in the framing of questions, so that everything is commonly known and clear to both the respondent and the analyst (shared meanings).

4. Loud minority

This is a case where a minority, for being more vocal, creates a false impression of needs of the whole population. For example, in social media this effect may easily take place. A general rule of thumb is that only 1% of members of a community actively participates in a given discussion while other 99% merely observe. It is easy to see consumers who are the loudest get their opinions out, but this may not represent the needs of the silent majority. The solution would be stratification, where one distinguishes different groups from one another so as to form a more balanced view of the population. This works when there is an adequate participation among strata. Another alternatively would be actively seek out non-vocal customers.

Conclusion

Generally, the mentioned problems relate to stated preferences. When we are using customers as a source of information, all kinds of biases emerge. That is why behavioral data, not dependent on what customers say, is a more reliable source of information. Thankfully, in digital environments it is possible to obtain behavioral data with much more ease than in analogue environments. The problems of it emerge from representativeness and on the other hand fitting it to other forms of data so as to gain a more granular understanding of the customer base.

Basic formulas for digital media planning

Planning makes happy people.

Introduction

Media planning, or campaign planning in general, requires you to set goal metrics, so that you are able to communicate the expected results to a client. In digital marketing, these are metrics like clicks, impressions, costs, etc. The actual planning process usually involves using estimates — that is, sophisticated guesses of some sorts. These estimates may be based on your previous experience, planned goal targets (when for example given a specific business goal, like sales increase), or industry averages (if those are known).

Calculating online media plan metrics

By knowing or estimating some goal metrics, you are able to calculate others. But sometimes it’s hard to remember the formulas. This is a handy list to remind you of the key formulas.

  • ctr = clicks / imp
  • clicks = imp * ctr
  • imp = clicks / ctr
  • cpm = cost / (imp / 1000)
  • cost = cpm * (imp / 1000)
  • cpa = cpc / cvr
  • cpa = cost / conversions
  • cost = cpa * conversions
  • conversions = cost / cpa

In general, metrics relating to impressions are used as proxies for awareness and brand related goals. Metrics relating to clicks reflect engagement, while conversions indicate behavior. Oftentimes, I estimate CTR, CVR and CPC because 1) it’s good to set a starting goal for these metrics, and 2) they exhibit some regularity (e.g., ecommerce conversion rate tends to fall between 1-2%).

Conclusion

You don’t have to know everything to devise a sound digital media plan. A few goal metrics are enough to calculate all the necessary metrics. The more realistic your estimates are, the better. Worry not, accuracy will get better in time. In the beginning, it is best to start with moderate estimates you feel comfortable in achieving, or even outperforming. It’s always better to under-promise than under-perform. Finally, the achieved metric values differ by channel — sometimes a lot — so take that into consideration when crafting your media plan.

Carryover effects and their measurement in Google Analytics

Introduction

Carryover effects in marketing are a tricky beast. On one hand, you don’t want to prematurely judge a campaign because the effect of advertising may be delayed. On the other hand, you don’t want bad campaigns to be defended with this same argument.

Solutions

What’s the solution then? They need to be quantified, or didn’t exist. Some ways to quantify are available in Google Analytics:

  • first, you have the time lag report of conversions – this shows how long it has taken for customers to convert
  • second, you have the possibility to increase the inspection window – by looking at a longer period, you can capture more carryover effects (e.g., you ran a major display campaign on July; looking back on December you might still see effects) [Notice that cookie duration limits the tracking, and also remember to use UTM parameters for tracking.]
  • third, you can look at assisted conversions to see the carryover effect in conversion paths – many campaigns may not directly convert, but are a part of the conversion path.

All these methods, however, are retrospective in nature. Predicting carryover effects is notoriously hard, and I’m not sure it would even be possible with such accuracy that it should be pursued.

Conclusion

In conclusion, I’d advise against being too hasty in drawing conclusion about campaign performance. This way you avoid the problem of premature judgment. The problem of shielding inferior campaigns can be tackled by using other proxy metrics of performance, such as the bounce rate. This would effectively tell you whether a campaign has even a theoretical chance of providing positive carryover effects. Indeed, regarding the prediction problem, proving the association between high bounce rate and low carryover effects would enforce this “rule of thumb” even further.

Dr. Joni Salminen holds a PhD in marketing from the Turku School of Economics. His research interests relate to startups, platforms, and digital marketing.

Contact email: [email protected]

A Few Interesting Digital Analytics Problems… (And Their Solutions)

Introduction

Here’s a list of analytics problems I’ve devised for a class I was teaching a digital analytics course (Web & Mobile Analytics, Information Technology Program) at Aalto University in Helsinki. Some solutions to them are also considered.

The problems

  • Last click fallacy = taking only the last interaction into account when analayzing channel or campaign performance (a common problem for standard Google Analytics reports)
  • Analysis paralysis = the inability to know which data to analyze or where to start the analysis process from (a common problem when first facing a new analytics tool 🙂 )
  • Vanity metrics = reporting ”show off” metrics as oppose to ones that are relevant and important for business objectives (a related phenomenon is what I call “metrics fallback” in which marketers use less relevant metrics basically because they look better than the primary metrics)
  • Aggregation problem = seeing the general trend, but not understanding why it took place (this is a problem of “averages”)
  • Multichannel problem = losing track of users when they move between online and offline (in cross-channel environment, i.e. between digital channels one can track users more easily, but the multichannel problem is a major hurdle for companies interested in knowing the total impact of their campaigns in a given channel)
  • Churn problem = a special case of the aggregation problem; the aggregate numbers show growth whereas in reality we are losing customers
  • Data discrepancy problem = getting different numbers from different platforms (e.g., standard Facebook conversion configuration shows almost always different numbers than GA conversion tracking)
  • Optimization goal dilemma = optimizing for platform-specific metrics leads to suboptimal business results, and vice versa. It’s because platform metrics, such as Quality Score, are meant to optimize competitiveness within the platform, not outside it.

The solutions

  • Last click fallacy → attribution modeling, i.e. accounting for all or select interactions and dividing conversion value between them
  • Analysis paralysis → choosing actionable metrics, grounded in business goals and objectives; this makes it easier to focus instead of just looking at all of the overwhelming data
  • Vanity metrics → choosing the right KPIs (see previous) and sticking to them
  • Aggregation problem → segmenting data (e.g. channel, campaign, geography, time)
  • Multichannel problem → universal analytics (and the associated use of either client ID or customer ID, i.e. a universal connector)
  • Churn problem → cohort analysis (i.e. segment users based on the timepoint of their enrollment)
  • Data discrepancy problem → understanding definitions & limitations of measurement in different ad platforms (e.g., difference between lookback windows in FB and Google), using UTM parameters to track individual campaigns
  • Optimization goal dilemma → making a judgment call, right? Sometimes you need to compromise; not all goals can be reached simultaneously. Ultimately you want business results, but as far as platform-specific optimization helps you getting to them, there’s no problem.

Want to add something to this list? Please write in the comments!

[edit: I’m compiling a larger list of analytics problems. Will update this post once it’s ready.]

Learn more

I’m into digital marketing, startups, platforms. Download my dissertation on startup dilemmas: http://goo.gl/QRc11f

The Bounce Problem: How to Track Bounce in Simple Landing Pages

Introduction

This post applies to cases satisfying two conditions.

First, you have a simple landing page designed for immediate action (=no further clicks). This can be the case for many marketing campaigns for which we design a landing page without navigation and a very simple goal, such as learning about a product or watching a video.

Second, you have a high bounce rate, indicating a bad user experience. Bounce rate is calculated as follows:

visitors who leave without clicking further / all visitors

Why does high bounce indicate bad user experience?

It’s a proxy for it. A high bounce rate simply means a lot of people leave the website without clicking further. This usually indicates bad relevance: the user was expecting something else, didn’t find, and so leaves the site immediately.

For search engines a high bounce rate indicates bad landing page relevance vis-à-vis a given search query (keyword), as the user immediately returns to the SERP (search-engine result page). Search engines, such as Google, would like to offer the right solution for a given search query as fast as possible to please their users, and therefore a poor landing page experience may lead to lower ranking for a given website in Google.

The bounce problem

I’ll give a simple example. Say you have a landing page with only one call-to-action, such as viewing a video. You then have a marketing campaign resulting to ten visitors. After viewing the video, all ten users leave the site.

Now, Google Analytics would record this as 100% bounce rate; everyone left without clicking further. Moreover, the duration of the visits would be recorded as 0:00, since the duration is only stored after a user clicks further (which didn’t happen in this case).

So, what should we conclude as site owners when looking at our statistics? 100% bounce: that means either that a) our site sucks or b) the channel we acquired the visitors from sucks. But, in the previous case it’s an incorrect conclusion; all of the users watched the video and so the landing page (and marketing campaign associated with it) was in fact a great success!

How to solve the bounce problem

I will show four solutions to improve your measurement of user experience through bounce rate.

First, simply create an event that pings your analytics software (most typically Google Analytics) when a user makes a desired on-page action (e.g. video viewing). This removes users who completed a desired action but still left without clicking further from the bounce rate calculation.

Here are Google’s instructions for event tracking.

Second, ping GA based on visit duration, e.g. create an event of spending one minute on the page. This will in effect lower your reported bounce rate by degree of users who stay at least a minute on the landing page.

Third, create a form. Filling a form directs the user to another site which then triggers an event for analytics. In most cases, this is also compatible with our condition of a simple landing page with one CTA (well, if you have a video and a form that’s two actions for a user, but in most cases I’d say it’s not too much).

Finally, there is a really cool Analytics plugin by Rob Flaherty called Scrolldepth (thanks Tatu Patronen for the tip!). It pings Google Analytics as users scroll down the page, e.g. by 25%, 75% and 100% intervals. In addition to solving the bounce problem, it also gives you more data on user behavior.

Limitations

Note that adding event tracking to reduce bounce rate only reduces it in your analytics. Search-engines still see bounce as direct exits, and may include that in their evaluation of landing page experience. Moreover, individual solutions have limitations – creation of a form is not always natural given the business, or it may create additional incentive for the user; and Scrolldepth is most useful in lengthy landing pages, which is not always the case.

I’m into digital marketing, startups, platforms. Download my dissertation on startup dilemmas: http://goo.gl/QRc11f