Simple methods for anomaly detection in e-commerce

Anomaly is a deviation from the expected value. The main challenges are: (a) how much the deviation should be to be classified as an anomaly, and (b) what time frame or subset of data should we examine.

The simplest way to answer those questions is to use your marketer’s intuition. As an e-commerce manager, you have an idea of how big of an impact constitutes an anomaly for your business. For example, if sales change by 5% in a daily year-on-year comparison, that would not typically be an anomaly in e-commerce, because the purchase patterns naturally deviate this much or even more. However, if your business has e.g. a much higher growth going on and you suddenly drop from 20% y-o-y growth to 5%, then you could consider such a shift as an anomaly.

So, the first step should be to define which metrics are most central for tracking. Then, you would define threshold values and the time period. In e-commerce, we could e.g. define the following metrics and values:

  • Bounce Rate – 50% Increase
  • Branded (Non-Paid) Search Visits – 25% Decrease
  • CPC Bounce – 50% Increase
  • CPC Visit – 33% Decrease
  • Direct Visits – 25% Decrease
  • Direct Visits – 25% Increase
  • Ecommerce Revenue – 25% Decrease
  • Ecommerce Transactions – 33% Decrease
  • Internal Search – 33% Decrease
  • Internal Search – 50% Increase
  • Non-Branded (Non-Paid) Search Visits – 25% Decrease
  • Non-Paid Bounces – 50% Increase
  • Non-Paid Visits – 25% Decrease
  • Pageviews – 25% Decrease
  • Referral Visits – 25% Decrease
  • Visits – 33% Decrease

As you can see, this is rule-based detection of anomalies: once the observed value exceeds the threshold value in a given time period (say, daily or weekly tracking), the system alerts to e-commerce manager.

The difficulty, of course, lies in defining the threshold values. Due to changing baseline values, they need to be constantly updated. Thus, there should be better ways to detect anomalies.

Another simple method is to use a simple sliding window algorithm. This algorithm can (a) update the baseline value automatically based on data, and (b) identify anomalies based on a statistical property rather than the marketer’s intuition. The parameters for such an algorithm are:

  • frequency: how often the algorithm runs, e.g. daily, weekly, or monthly. Even intra-day runs are possible, but in most e-commerce cases not necessary (exception could be technical metrics such as server response time).
  • window size: this is the period for updating. For example, if the window size is 7 days and the algorithm is run daily, it computes that data always from the past seven days, each day adding +1 to start and end date.
  • statistical threshold: this is the logic of detecting anomalies. A typical approach is to (a) compute the mean for each metric during window size, and (b) compare the new values to mean, so that a difference of more than 2 or 3 standard deviations from the mean indicates an anomaly.

Thus, the threshold values automatically adjust to the moving baseline because the mean value is re-calculated at each window size.

How to interpret anomalies?

Note that an anomaly is not necessarily a bad thing. Positive anomalies occur e.g. when a new campaign kicks off, or the company achieves some form of viral marketing. Anomalies can also arise when a season breaks in. To mitigate such effects from showing, one can configure the baseline to represent year-on-year data instead of historical data from the current year. Regardless of whether the direction of the change is positive or negative, it is useful for a marketer to know there is a change of momentum. This helps restructure campaigns, allocate resources properly, and become aware of the external effects on key performance indicators.

Platform metrics: Some ideas

I was chatting with Lauri [1] about platform research. I claimed that the research has not that many implications for real-world companies apart from the basic constructs of network effects, two-sidedness, tipping, marquee users, strategies such as envelopment, and of course many challenges, including chicken-and-egg problems, monetization dilemma, and remora’s curse (see my dissertation on startup dilemmas for more insights on those…).

But then I got to think that metrics are kind of overlooked. Most of the platform literature comes from economics and is very theoretical and math-oriented. Yet, it’s somehow apart from practical Web analytics and digital metrics. Those kind of metrics, however, are very important for platform entrepreneurs and startup founders.

On the face of it, it seems that the only difference that platforms have compared to “normal” businesses is their two-sidedness. If we have supply side (Side A) and demand side (Side B), then the metrics could be just identical for each and the main thing is to keep track of the metrics for both sides.

However, there are some dynamics at play. The company has one goal, one budget, one strategy, at least typically. That means those metrics, even though can be computed separately, are interconnected.

Here are some examples of platform metrics:

  • Number of Users/Customers (Side A, Side B)
  • Revenue (Side A, Side B)
  • Growth of Revenue (Side A, Side B)
  • Cross-correlation of Number of Users and Revenue (e.g., Side A users => Side B revenue)
  • Cost per User/Customer Acquisition (Side A, Side B)
  • Support cost (Side A, Side B)
  • Average User/Customer Lifetime (Side A, Side B)
  • Average Transaction Value (Side A, Side B)
  • Engagement Volume (Side A, Side B)
  • Profitability distribution (Side A, Side B)

Note the cross-correlation example. Basically, all the metrics can be cross-correlated to analyze how different metrics of each side affect each other. Moreover, this can be done in different time periods to increase robustness of the findings. Such correlations can reveal important information about the dynamics of network effects and tell, for example, whether to focus on adding Side A or Side B at a given point in time. A typical example is solving the cold start problem by hitting critical mass, i.e., the minimum number of users required for network effects to take place (essentially, the number of users needed for the platform to be useful). Before this point is reached, all other metrics can look grim; however, after reaching that point, the line charts of other metrics should turn from flat line to linear or exponential growth, and the platform should ideally become self-sustainable.

Basic metrics can also be used to calculate profitability, e.g.

Average Transaction Value x Average Customer Lifetime > Cost per Customer Acquisition

Business models of many startup platforms are geared towards “nickel economics,” meaning that the average transaction values are very low. In such situations, the customer acquisition cost has to be low as well, or the frequency of transactions extremely high. When these rules are violated, the whole business model does not make sense. This is partly because of competitive nature of the market, requiring sizable budgets for actual user/customer acquisition. For platforms, the situation is even more serious than for other businesses because network effects require the existence of critical mass that costs money to achieve.

In real world, customer acquisition cost (CPA) cannot usually be ignored, apart from few outliers. The CPA structure might also differ between the platform sides, and it is not self-evident what type of customer acquisition strategies yield the lowest CPAs. In fact, it is an empirical question. A highly skilled sales force can bring in new suppliers at a lower CPA than a digital marketer that lacks the skills or organic demand. Then again, under lucrative conditions, the CPA of digital advertising can be minuscule compared to sales people due to its property of scaling.

However, as apparent from the previous list, relationship durations also matter. For example, many consumers can be fickle but supplier relationships can last for years. This means that suppliers can generate revenue over a longer period of time than consumers, possibly turning a higher acquisition cost into more a more profitable investment. Therefore, churn of each side should be considered. Moreover, there are costs associated with providing customer support for each side. As Lauri noted based on his first-hand experience working for a platform company, the frequency and cost per customer encounter differ vastly by side, and require different kind of expertise from the company.

In cases where the platform has an indirect business model, also called subvention because one side is subventing the cost of the other, the set of metrics should be different. For example, if only Side B (paid users) is paying the platform but is doing so because there is Side A (free users), Side B could be monitored with financial metrics and Side A with engagement metrics.

Finally, profitability distribution refers to uneven distribution of profitable players in each market side. This structure is important to be aware of. For example, in e-commerce it is typical that there are a few “killer products” that account for a relatively large share of sales value, but the majority of the total sales value is generated by hundreds or thousands of products with small individual sales. Understanding this dynamics of adding both killer products and average products (or complements, in using platform terms) is crucial for managing the platform growth.


[1] Lauri Pitkänen, my best friend.

My website go hacked — here’s what I learned

My friend Vignesh alerted me earlier this week that my site has been hacked and is forwarding to some malware site.

At first, I called GoDaddy to ask for help. That was useless. It turns out their tech support consists of sales reps trying to sell you shit — no help for cleaning the site. They could it for me for 200 pounds, which I didn’t feel like paying, especially since they advertise their contact number as support, not sales.

So, I started googling and learning how to fix the site myself.

Here are the lessons I learned

  • Only install plugins and themes you really need and use. Every theme and plugin is a potential security risk. Most likely, hackers utilized one of my plugins to enter the website. I had tons of plugins and themes I didn’t use and although I did update them every now and then, the plugin creators are not necessarily fixing the vulnerabilities that quickly if at all.
  • Keep WordPress core, themes, and plugins updated. As I mentioned, I updated the themes and plugins every now and then. It’s important to do that as frequently as updates roll in. However, my WordPress core is automatically updated by GoDaddy. That’s why I think an outdated plugin was probably the root cause for the hack.
  • Don’t use GoDaddy — in the process, I learned their tech support is useless. In addition, I read WP Engine is safer than GoDaddy – they block some plugins altogether, and actually fix your site for free if it’s been hacked. GoDaddy also doesn’t let you change your database password after being hacked (talking about their Managed WordPress hosting which I’m using), so even though I cleaned the website, it’s still potentially vulnerable.

Here are the steps I did for the cleaning. Also including some useful links to start from in case your WordPress gets hacked.

  • To GoDaddy’s credit, I could find a message where they listed infected files on my website. I started by manually removing these 15 files.
  • .htaccess was infected. I replaced its content with default content (code in [1], also includes additional code that blocks external connections [2])
  • Removed all plugins and themes apart from the theme I’m using and CloudFlare CDN plugin which I need. Everything else could go.
  • Downloaded a fresh copy and reinstalled my theme from scratch (removed the whole folder and replaced with a clean one).
  • Installed the free Anti-Malware Security and Brute-Force Firewall and ran the analysis. It couldn’t find any more infected files, but suggested potentially vulnerable files. I went through these files manually one by one. They contained no suspicious code and their edits dates did not differ from those of clean WP installation, so they were not compromised.
  • Changed WP security tokens to log out every user.
  • Removed a spam user and changed other users’ passwords to new, strong passwords.
  • Manually checked WP core files for malicious code but couldn’t find (also comparing Last Modified times to those in a clean WP directory helps).
  • Set up a .htaccess script that blocks php files in Upload folder [3]
  • Finally, made sure that WP + theme + plugins that remain are up-to-date.

The only things I didn’t do are (1) reinstalling WP core (used a virus scanner + manual check instead) and (2) changing SQL password (GoDaddy doesn’t let you do that — another reason to avoid them). Moreover, (3) raw usage logs could also be viewed via Cpanel in order to find IPs of the hackers but, again, GoDaddy doesn’t give you Cpanel access in the plan I’m using.

Useful links I used


[1] # BEGIN WordPress

<IfModule mod_rewrite.c>

RewriteEngine On

RewriteBase /

RewriteRule ^index\.php$ – [L]

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteRule . /index.php [L]


# Block WordPress xmlrpc.php requests

<Files xmlrpc.php>

order deny,allow

deny from all

allow from



# END WordPress


<Files *.php>

deny from all


Myynti on resurssien tuhlausta – ajatuksia myynnin hyödyllisyydestä isossa kuvassa

Kansantaloudessa käytetään merkittävä määrä resursseja myyntiin, joka ei tuota tuloksia tai ole tarpeellista.

Myynti on tarpeellista silloin, kun asiakkaalla on tarve, jonka hän joko tiedostaa tai ei tiedosta. Kummassakin tapauksessa myyjästä on asiakkaalle hyötyä — myyjä toimii informaation välittäjänä ja tehostaa näin markkinoiden toimintaa. Logiikka on sama kuin informatiivisen mainonnan teoriassa.

Mutta silloin kuin myydään tuotetta, joka ei ole tarpeellinen tai haluttu, on kyseessä puhdas resurssien tuhlaus (olettaen, että resurssi eli ihmistyövoima voitaisiin käyttää johonkin oikeasti hyödylliseen hommaan, mikä välttämättä aina ei ole totta).

Aitoa kohtaantoa ei tällöin ole, ja tuloksena sekä myyjä että ostaja tuhlaavat aikaansa. Myynti voi jopa olla haitallista – silloin kun ns. huijataan ihminen ostamaan jotain mitä hän ei halunnutkaan ainoastaan tyrkyttämällä.

Näistä syistä myyjiin suhtaudutaan yleensä nuivasti – aika on kallisarvoista, ja kun ei ole tarvetta, pitää vuorovaikutukseen menevä hukka-aika minimoida. Kokemuksesta tiedän, että päällikkötason henkilöllä kuluu paljon aikaa myyjien kanssa asiointiin, ja monet myyjät eivät usko, ettei oikeaa tarvetta ole, vaan koittavat väkisin tyrkyttää.

Nuiva suhtautuminen voi kuitenkin olla ongelma ostajalle silloin kun myyjä tarjoaisi jotain oikeasti hyödyllistä, mutta siihen ei koskaan päästä koska ei anneta tämän mahdollisuuden ilmetä. Ostajan paras strategia onkin kuunnella perusajatus, ja sitten vetää johtopäätös kiinnostaako vaiko ei. Pohjimmiltaan kyse on arvostelukyvystä.

On kuitenkin myös ns. kohteliaita ostajia, jotka eivät syystä tai toisesta sano, ettei kiinnosta tai etteivät pysty ostamaan mitään. Tällöin ongelma kääntyy toisinpäin, ja myyjä hukkaakin aikaansa.

Ostajan kieltäytyminen voidaan jakaa seuraaviin pääluokkiin:

  • aito kieltäytyminen = ei ole tarvetta, eikä lisäsuostuttelu tässä nyt auta. Tällöin optimiratkaisu sekä myyjän että ostajan kannalta on siirtyä eteenpäin.
  • epäaito kieltäytyminen = on tarve, mutta sitä ei tiedosteta / suostuta kuuntelemaan. Tällöin optimiratkaisu olisi lisäsuostuttelu, joka hyödyttää molempia osapuolia.
  • aito hyväksyminen = ostetaan, koska on tarve. Tämä on molemmille osapuolille optimi.
    epäaito hyväksyminen 1 = ostetaan, vaikka ei ole tarvetta, esim. pakkomyynnin tai tietämättömyyden seurauksena. Tässä pelissä ostaja häviää.
  • epäaito hyväksyminen 2 = ei osteta, mutta annetaan ymmärtää, että voidaan ostaa. Tietyt ostajat käyttävät myyjiä esim. oman asiantuntijuuden kasvattamiseksi ilman tarkoitusta ostaa. Tällöin ostaja saa hyötyä myyjän kustannuksella; myyjä häviää turhaan aikaa. Epäaitoon hyväksyntään voi myös johtaa yleinen kohteliaisuus, joka tulkitaan väärin ostosignaaliksi myyjän toimesta; taikka firman imagon ylläpitäminen, jossa kaikkia myyjiä kuunnellaan reilusti.

Kansantalouden kannalta tehoton myynti on sekä mikro- että makrotason ongelma. Mikrotason, koska pahimmillaan se kaataa myyvän organisaation – etenkin startupeilta puuttuu taloudellinen puskuri pitkien ja turhien myyntineuvottelujen käymiseksi. Makrotasolla taas optimi saavutetaan, kun ihmiset tekevät tuottavaa työtä.

Johtopäätöksenä aidot ei-realisoituvat suhteet ovat ok, mutta epäaidot hyväksynnät saavat aikaan tehottomuutta.

Myyjän kannattaa miettiä seuraavia asioita:

  1. miten voi tunnistaa epäaidon kieltäytymisen?
  2. mikä on oikea strategia suhtautua epäaitoihin kieltäytymisiin?
  3. miten voi tunnistaa epäaidon hyväksymisen?
  4. mikä on oikea strategia suhtautua epäaitoihin hyväksymisiin?

Kaikkien kannalta olisi parasta “iskeä kortit pöytään” ja selvittää mahdollisimman nopeasti:

  • mikä on myydyn palvelun tarkoitus?
  • onko asiakkaalla sille aitoa tarvetta?
  • voiko asiakas tehdä ostoksen nyt? jos ei, milloin?

Viime kädessä myynnin tehottomuusongelmat voidaan laskea kommunikaatiovirheiksi.

Digital analytics maturity model

Digital analytics maturity model:

  1. Concepts — here, focus on is on buzzwords and realization that “we should do something”.
  2. Tools — here, focus is on tools, i.e. “Let’s use this shiny new technology and it will solve all our problems.”
  3. Value — here, we finally focus on what matters: how will the tools and technologies serve and integrate with our core competitive advantage, i.e. “Guys, what’s the point?”.

Applies to almost any booming technology.

Problem of continuous value in SaaS business

A major challenge for many SaaS businesses is to provide continuous value, so that the users are compelled to continue using the service.

There’s a risk of opportunism if the user can achieve his goals with one-time use; he then either uses the free trial version, or only subscribes for one month.

For example, some SEO tools enable data download, so why should I stick around after downloading the data?

This is especially pertinent if my decision making cycle is not frequent, so I don’t really need monthly data.

Potential ways to counter this effect:

  1. develop automatic insights that continuously tell the user something they didn’t know, without him having to log into a system
  2. include different tiers for one-time users (e.g., one-time report feature with the cost of xxxx USD)
  3. understand the decision making cycles of different users, and make sure your business model is adapted to them
  4. put previously free features behind a subscription plan
  5. raise the monthly price so increase CLV even for those users that drop after a month

The latter I’ve seen applied by many startups, e.g. SurveyMonkey that raised its prices substantially. At the same time, though, they lost me as a customer – that’s the risk, and it can only work if they have more high-value customers not to care about my business.

Number four was applied by Trello that decreased the number of PoweUps to one – essentially forcing you to pay if you want to use any of them (because “Calendar” is already a PowerUp). Often, the application of these upselling tactics take place after the startup has been sold or there are new investors that wish to capture a larger share of value produced by the service. Obviously, this comes at the cost of free users who previously had a great deal (=large share of value provided by the startup), now reduced to “good” or “decent” deal depending on their tolerance level.

Identifying opportunities that Google and Facebook can’t handle

It’s almost impossible to beat Facebook’s or Google’s algorithms in ad optimization, because they have access to individual-level data whereas the advertiser only gets aggregates, and even their supply is limited. But, there are two opportunities I see which Google and Facebook don’t handle:

1. Use of CRM data

Especially purchase history (=lifetime value), product margins (=profitability), and other customer information that can be used for user modelling or machine learning as features. But, don’t use Google Analytics for linking this data to website analytics — Google Analytics sucks, because Google keeps individual-level information (=click-stream data) for itself and only shares, again, aggregates. Use Piwik instead.

2. Use of cross-platform data

Google doesn’t have access to Facebook’s data or vice versa, but the advertiser has. Thus, you can create more comprehensive optimization models for bidding and budgeting.

Grassblade model of startup acquisition

Grassblade model of startup acquisition = an incumbent is waiting until an upstart rival exceeds a KPI threshold x (e.g., 1 million users).


  1. ‘x’ needs to be defined so that it is big enough to prove the momentum, yet small enough to give a decent valuation — let the startup grow long enough, it can a serious competitor
  2. the process involve challenges for defining industry-specific KPIs to pick the winners (need to think what are the strategic assets).
  3. there is an assimilation cost to consider — in “soft” things like organizational cultures, committing the key people, aligning the infrastructure, and ensuring continuity of user experience.

Determining the point of acquisition is important since some startups are too early to be potential targets while others are too advanced to accept deals.

Edelläkävijän kirous

Edelläkävijän kirous = edelläkävijä missaa bisnesmahdollisuuksia, koska kuvittelee että “se on jo tehty”.

Ratkaisut tähän:

1. täydellinen ajoitus on mahdotonta: luovu sellaisen odottamisesta

2. mikään ongelma ei ole ratkaistu, ennen kuin kilpailijasi on ns. household brand (eikä sittenkään ole mahdotonta disruptoida, kuten Facebook => MySpace ja Google => Yahoo osoittavat)

Aallolla ratsastaja tekee parhaimmat tuotot, ja aaltoon pääsee mukaan vähän myöhemminkin. Esim. Bitcoin on “wanha juttu”, mutta jos siihen olisi aloittanut sijoittamaan vasta tämän vuoden alussa, olisi voittanut maailman kaikki indeksirahastot kirkkaasti.

Idea: Verkkokaupan showroom

= kerää verkkokauppojen tuotteita fyysiseen tilaan.

“Myymälän rooli nähdään nyt ‘’entertainment hubina’’ ja sillä tulee olemaan tärkeä rooli asiakaskokemuksen luojana – ei niinkään ostospaikkana.”

On mahdollista tehdä tavaratalo, jossa tuotteita voi kokeilla ja katsoa, mutta ne tilataan netin kautta. Näin ei tarvitse olla paikallista varastoa. Haasteena on, että asiakas voi haluta tuotteen heti matkaansa.