Tips on Data Imputation From Machine Learning Experts

Missing values are a critical issue in statistics and machine learning (which is “advanced statistics”). Data imputation deals with ways to fill those missing values.

Andriy Burkov made this statement a few days ago [1]:

“The best way to fill a missing value of an attribute is to build a classifier (if the attribute is binary) or a regressor (if the attribute is real-valued) using other attributes as “X” and the attribute you want to fix as “y”.”

However, the issue is not that simple. As noted by one participant:

From Franco Costa, Developer: Java, Cloud, Machine Learning:

What if is totally independent from the other features? Nothing to learn

The discussion then quickly expanded and many machine learning experts offered their own experiences and tips for solving this problem. At the time of writing (March 8, 2018), there are 69 answers.

Here are, in my opinion, the most useful ones.

1) REMOVE MISSING VALUES

From Blaine Bateman, EAF LLC, Founder and Chief Data Engineer at EAF LLC:

Or just drop it from the predictors

From Swapnil Gaikwad, Software Engineer Cognitive Computing (Chatbots) at Light Information Systems Pvt. Ltd.:

Also I got an advice from one of my mentor is whenever we have more than 50% of the missing values in a column, we can simply omit that column (if we can), if we have enough other features to build a model.

2) ASK WHY

From Kevin Gray, Reality Science:

It’s of fundamental importance to do our best to understand why missing data are missing. Two excellent sources for an in-depth look at this topic are Applied Missing Data Analysis (Enders) and Handbook of Missing Data Methodology (Molenberghs et al.). Outlier Analysis (Aggarwal) is also relevant. FIML and MI are very commonly used by statisticians, among other approaches.

From Julio Bonis Sanz, Medical Doctor + MBA + Epidemiologist + Software Developer = Health Data Scientist:

In some analysis I have done in the past, including “missing” as a value for prediction itself have got some interesting results. The fact that for a given observation that value is missing is sometimes associated with the outcome you want to predict.

From Tero Keski-Valkama, A Hacker and a Machine Learning Generalist:

Also, you can try to check if the value being missing encodes some real phenomenon (like the responder chooses to skip the question about gender, or a machine dropping temperature values above a certain threshold) by trying to train a classifier to predict whether a value would be missing or not. It’s not always the case that values being missing are just independent random noise.

From Vishnu Sai, Decision Scientist at Mu Sigma Inc.:

In my experience, I’ve found that the technique for filling up missing values depends on the business scenario.

From David T. Kearns, Co-founder, Sustainable Data and Principal Consultant, Sustainable Services:

I think it’s important to understand the underlying cause of the missing values. If your data was gathered by survey, for example, some people will realise their views are socially unpopular and will keep them to themselves. You can’t just average out that bias – you need to take steps to reduce it during measurement. For example, design your survey process to eliminate social pressure on the respondent.

For non-human measurements, sometimes instruments can be biased or faulty. We need to understand if those biases/faults are themselves a function of the underlying measurements – do we lose data just as our values become high or low for example? This is where domain knowledge is useful – making intelligence decisions of what to do, not blind assumptions.

If you’ve done all that and still have some missing values, then you’ll be in a far stronger position to answer your question intelligently.

3) USE MISSING VALUES AS A FEATURE

From Julio Bonis Sanz, Medical Doctor + MBA + Epidemiologist + Software Developer = Health Data Scientist:

One of my cases was a predictive model of use of antibiotics by patients with chronic bronchitis. One of the variables was smoking with about 20% of missing values. It turned out that having no information in the clinical record about smoking status was itself a strong predictor of use of antibiotics because a patient missing this data were receiving worse healthcare in general. By using imputation methods you someway lose that information.

From Kirstin Juhl, Full Stack Software Developer/Manager at UnitedHealth Group:

Julio Bonis Sanz Interesting- something that I wouldn’t have thought of – missing values as a feature itself.

From Peter Fennell, Postdoctoral Researcher in Statistics and A.I. @ USC:

Thats fine if have one attribute with missing values. Or two. But what if many of your features have missing values? Do recursive filling, but that can lead to error propagation? like to think that there is value in missing value, and so giving them their own distinct label (which, eg, a tree based classifier can isolate) can be an effective option

4) USE TESTED PACKAGES SUCH AS MICE OR RANDOM FOREST

From Jehan Gonsal, Senior Insights Analyst at AIMIA:

MCMC methods seem like the best way to go. I’ve used the MICE package before and found it to be very easy to audit and theoretically defensible.

From Swapnil Gaikwad, Software Engineer Cognitive Computing (Chatbots) at Light Information Systems Pvt. Ltd.:

This is a great advice! In one of my projects, I have used the R package called MICE which does the regression to find out the missing values. It works much better than the mean method.

From Nihit Save, Data Analyst at CMS Computers Limited (INDIA):

Multivariate Imputation using Chained Equation (MICE) is an excellent algorithm which tries to achieve the same. https://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/

From ROHIT MAHAJAN, Research Scholar – Data Science and Machine Learning at Aegis School of Data Science:

In R there are many packages like MICE, Amelia and most Important “missForest” which will do this for you. But it takes too much time if data is more than 500 Mb. I always follow this regressor/classifier approach for most important attributes.

From Knut Jägersberg, Data Analyst:

Another way to deal with missing values in a model based manner is by using random forests, which work for both categorical and continuous variables: https://github.com/stekhoven/missForest . This algorithm can be easily be reimplemented with i.e. a faster than in R implemented RF algorithm such as ranger (https://github.com/imbs-hl/ranger) and then scales well to larger datasets.

5) USE INTERPOLATION

From Sekhar Maddula, Actively looking for Data-science roles:

Partly agree Andriy Burkov. But at the same time there are few methods specific to the technique/algo. e.g. For Time-series data, you may think of considering interpolation methods available with the R package “imputeTS”. I also hope that there are many Interpolation methods in the field of Mathematics. We may need to try an appropriate one.

6) ANALYZE DISTRIBUTIONS

From Gurtej Khanooja, Software Engineering/Data Science Co-op at Weather Source| Striving for Excellence:

One more way of dealing with the missing values is to identify the distribution using remaining values and fill the missing values by randomly filling the values from the distribution. Works fine in a lot of cases.

From Tero Keski-Valkama, A Hacker and a Machine Learning Generalist:

If you are going to use a classifier or a regressor to fill the missing values, you should sample from the predicted distribution rather than just picking the value with the largest probability.

SUMMARY

This was the best summary comment I found:

“I have used MICE package in R to deal with imputing and luckily it produced better results. But in general we should take care of the following:

  1. Why the data is missing? Is is for some meaningful reason or what?
  2. How much data is missing?
  3. Fit a model with non missing values.
  4. Now apply the imputing technique and fit the model and compare with the earlier one.”

-Dr. Bharathula Bhargavarama Sarma, PhD in Statistics, Enthusiastic data science practitioner

FOOTNOTES

[1] https://www.linkedin.com/feed/update/urn:li:activity:6375893686550614016/

Creating Buyer Personas: Common Interview Questions

Introduction

At Qatar Computing Research Institute (QCRI), we are developing a system for automatic persona generation (APG). The demo is available online at https://persona.qcri.org

As a part of this research, we’re interested in the information needs of end users of personas [1]. People working in different domains are interested in different information, after all. For example, journalists want to know what type of news the personas are consuming, while e-commerce marketers want to know what products they are buying.

We have reviewed a lot of material relating to interviewing customers in order to create the persona profiles because, although our approach is based on automation and computational techniques, we have an interest to experiment with mixed personas utilizing qualitative data to enrich the automatically generated personas [2].

This brief post shares some of the key insights we’ve found. [Update: there is an extended version of persona interview questions in our team’s blog. I suggest checking it out.]

Persona Information

In general, when creating personas we need to query two types of information:

  1. Information needs of persona users => this means what information people inside our organization want to know
  2. Customer information => this means what information we can learn about the customers

For the former, we have developed an Information Needs Questionnaire with eight questions:

  1. What are your objectives for content creation / marketing?
  2. What kind of customer-related decisions you make?
  3. What kind of customer information you need?
  4. What analytics information are you currently using?
  5. What kind of customer-related questions you don’t currently get good answers to?
  6. How would you use personas in your own work?
  7. What information you find useful in the persona mockup?
  8. What information is missing from the mockup?

The purpose of these questions is to discover the interviewee’s professional information needs. This is useful for developing analytics systems, e.g. automatic persona generation, but also extends to traditional persona creation.

In the following, we summarize some questions intended for customers.

From Mr. Steve Cartwright (2015) [3]

“I know that when I am preparing buyer personas I have a whole heap of questions that I ask in fact I have a PowerPoint I go through with clients, this enables me to generate the personas that I need. However, if you start by simply asking:

·      Who are they?

·      What do these people do?

·      Are they married, singles, living with a partner?

·      What problems or concerns do they have, that your industry niche can solve?

·      Where do they hang out and what do they do online?

·      Are these people decision makers, influencers or referral sources?

Just those six questions are all you need to get started and to start to understand who you’re customers are and to turn your business into a customer centric one.”

***

From “Nisha” (2013) [4]:

“Questions for B2B marketers to delve into while creating buyer personas include:

  • Buyer experience and reporting officer of the prospect
  • Professional background of the prospect
  • Kind of organization
  • Organizations’ segment focus
  • History of purchases
  • Change in role in past few years
  • Market forces influencing buyers
  • Most urgent problems
  • What funded initiatives does the buyer have
  • What are the motivations that drive the buyer
  • What the buyer’s needs?
  • What is the budget?
  • Who are involved in the decision-making?
  • Attitude of the company towards the product/service

***

From Jesse Ness [5] (2016):

“Demographic questions:

These are the most basic questions that you should be asking your target customers, such as:

·      Are they married?

·      How old are they?

·      Where do they live?

·      Do they have children? How many? What ages?

·      Which country/city did they grow up in?

·      Education questions:

Our early school and college education help us shape as adults. People usually tend to answer these questions more honestly.

·      What level of education did they complete?

·      Which schools did they attend? Public or Private?

·      What did they study?

·      Were they popular at school?

·      Which extra-curricular activities (if any) did they take part in?

·      Career questions:

Questions about the working life of your prospects reveals a lot of interesting details about them.

·      What industry do they work in?

·      What is their current job level?

·      What was their first full-time job?

·      How did they end up where they are today?

·      Has their career track been traditional or did they switch from another industry?

·      Financial questions:

Your customers finances will tell you what they can afford and how easily they make their purchasing decisions.

·      How often you buy high ticket items?

·      How much are they worth?

·      Are they responsible for making purchasing decision in the household?

Keep in mind that people tend to answer financial questions incorrectly, even in anonymous online surveys. Some might even construe this as an invasion of their privacy. Temper your results accordingly (usually by decreasing the stated average income).”

Conclusion

There is a myriad of questions one can ask from the customers when creating persona profiles. However, they should be based on first defining internal information needs. In the persona creation process, the above question lists serve as inspiration.

Interested in automatic persona generation for your company? Contact Dr. Jim Jansen: [email protected]

Footnotes

[1] Personas are fictive characters based on real data about the underlying audience. Their purpose is to make customer analytics more easily understandable than numbers and graphs.

[2] Salminen, J., Şengün, S., Haewoon, K., Jansen, B. J., An, J., Jung, S., … Harrell, F. (2017). Generating Cultural Personas from Social Data: A Perspective of Middle Eastern Users. In Proceedings of The Fourth International Symposium on Social Networks Analysis, Management and Security (SNAMS-2017). Prague, Czech Republic.

[3] https://website-designs.com/online-marketing/content-marketing/buyers-personas-allow-you-to/

[4] https://www.xerago.com/blog/2013/08/why-buyer-personas-are-not-the-same-as-customer-profiling/

[5] https://www.ecwid.com/blog/how-to-create-buyer-personas-for-an-ecommerce-store.html

Google Analytics: 21 Questions to Get Started

I was teaching a course called “Web & Mobile Analytics” at Aalto University back in 2015.

As a part of that course, the students conducted an analytics audit for their chosen website. I’m sharing the list of questions I made for that audit, as it’s a useful list for getting to know Google Analytics.

The questions

Choose a period to look at (e.g., the past month, three months, last year, this year… generally, the longer the better because it gives you more data). Answer the questions. The questions are organized by sections of Google Analytics.

a. Audience

  • How has the website traffic evolved during the period your inspecting? How does the traffic from that period compare to earlier periods?
  • What are the 10 most popular landing pages?
  • What are the 10 pages with the highest bounce rate AND at least 100 visits in the last month? (Hint: advanced filter)

b. Behavior

  • How does the user behavior differ based on the device they’re using? (Desktop/laptop, mobile, tablet)
  • Where do people most often navigate to from the homepage?
  • How do new and old visitors differ by behavior?
  • What is the general bounce rate of the website? Which channel has the highest bounce rate?
  • How well do the users engage with the website? (Hint: Define the metrics you used to evaluate engagement.)
  • Is there a difference in engagement between men and women?

c. Acquisition

  • How is the traffic distributed among the major sources?
  • Can you find performance differences between paid and organic channels?
  • Compare the goal conversion rate of different marketing channels to the site average. What can you discover?

d. Conversion

  • What is the most profitable source of traffic?
  • What is the best sales (or conversion, based on the number of conversions) month of the year? How would you use this information in marketing planning?
  • Which channels or sources seem most promising in terms of sales potential? (Hint: look at the channels with high CVR and low traffic)
  • Analyze conversion peaks. Are there peaks? Can you find explanation to such peaks?
  • Can you find sources that generated assisted conversions? Which sources are they? Is the overall volume of assisted conversions significant?
  • Does applying another attribution model besides the last click model alter your view on the performance of marketing channels? If so, how?

e. Recommendations

  • Based on your audit, how could the case company develop its digital marketing?
  • How could the case company’s use of analytics be developed? (E.g., what data is not available?)
  • What other interesting discoveries can you make? (Mention 2–5 interesting points.)

Answering the above questions provides a basic understanding of a goal-oriented website’s situation. In the domain of analytics, asking the right questions is often the most important (and difficult) thing.

The dashboard

In addition, the students built a dashboard for the class. Again, the instructions illustrate some useful basic functions of Google Analytics.

Build a dashboard showing the following information. Include a screenshot of your dashboard in the audit report.

Where is the traffic coming from?

  • breakdown of traffic by channel

What are the major referral sources?

  • 10 biggest referral sites

How are conversions distributed geographically?

  • 5 biggest cities by conversions

How is Facebook bringing conversions?

  • Product revenue from Facebook as a function of time

Are new visitors from different channels men or women?

  • % new sessions by channels and gender

What keywords bring in the most visitors and money?

  • revenue and sessions by keyword

If you see fit, include other widgets in the dashboard based on the key performance metrics of your company.

Conclusion

Reports and dashboards are basic functions of Google Analytics. More advanced uses include custom reports and metrics, alerts, and data importing.

Simple methods for anomaly detection in e-commerce

Anomaly is a deviation from the expected value. The main challenges are: (a) how much the deviation should be to be classified as an anomaly, and (b) what time frame or subset of data should we examine.

The simplest way to answer those questions is to use your marketer’s intuition. As an e-commerce manager, you have an idea of how big of an impact constitutes an anomaly for your business. For example, if sales change by 5% in a daily year-on-year comparison, that would not typically be an anomaly in e-commerce, because the purchase patterns naturally deviate this much or even more. However, if your business has e.g. a much higher growth going on and you suddenly drop from 20% y-o-y growth to 5%, then you could consider such a shift as an anomaly.

So, the first step should be to define which metrics are most central for tracking. Then, you would define threshold values and the time period. In e-commerce, we could e.g. define the following metrics and values:

  • Bounce Rate – 50% Increase
  • Branded (Non-Paid) Search Visits – 25% Decrease
  • CPC Bounce – 50% Increase
  • CPC Visit – 33% Decrease
  • Direct Visits – 25% Decrease
  • Direct Visits – 25% Increase
  • Ecommerce Revenue – 25% Decrease
  • Ecommerce Transactions – 33% Decrease
  • Internal Search – 33% Decrease
  • Internal Search – 50% Increase
  • Non-Branded (Non-Paid) Search Visits – 25% Decrease
  • Non-Paid Bounces – 50% Increase
  • Non-Paid Visits – 25% Decrease
  • Pageviews – 25% Decrease
  • Referral Visits – 25% Decrease
  • Visits – 33% Decrease

As you can see, this is rule-based detection of anomalies: once the observed value exceeds the threshold value in a given time period (say, daily or weekly tracking), the system alerts to e-commerce manager.

The difficulty, of course, lies in defining the threshold values. Due to changing baseline values, they need to be constantly updated. Thus, there should be better ways to detect anomalies.

Another simple method is to use a simple sliding window algorithm. This algorithm can (a) update the baseline value automatically based on data, and (b) identify anomalies based on a statistical property rather than the marketer’s intuition. The parameters for such an algorithm are:

  • frequency: how often the algorithm runs, e.g. daily, weekly, or monthly. Even intra-day runs are possible, but in most e-commerce cases not necessary (exception could be technical metrics such as server response time).
  • window size: this is the period for updating. For example, if the window size is 7 days and the algorithm is run daily, it computes that data always from the past seven days, each day adding +1 to start and end date.
  • statistical threshold: this is the logic of detecting anomalies. A typical approach is to (a) compute the mean for each metric during window size, and (b) compare the new values to mean, so that a difference of more than 2 or 3 standard deviations from the mean indicates an anomaly.

Thus, the threshold values automatically adjust to the moving baseline because the mean value is re-calculated at each window size.

How to interpret anomalies?

Note that an anomaly is not necessarily a bad thing. Positive anomalies occur e.g. when a new campaign kicks off, or the company achieves some form of viral marketing. Anomalies can also arise when a season breaks in. To mitigate such effects from showing, one can configure the baseline to represent year-on-year data instead of historical data from the current year. Regardless of whether the direction of the change is positive or negative, it is useful for a marketer to know there is a change of momentum. This helps restructure campaigns, allocate resources properly, and become aware of the external effects on key performance indicators.

Platform metrics: Some ideas

I was chatting with Lauri [1] about platform research. I claimed that the research has not that many implications for real-world companies apart from the basic constructs of network effects, two-sidedness, tipping, marquee users, strategies such as envelopment, and of course many challenges, including chicken-and-egg problems, monetization dilemma, and remora’s curse (see my dissertation on startup dilemmas for more insights on those…).

But then I got to think that metrics are kind of overlooked. Most of the platform literature comes from economics and is very theoretical and math-oriented. Yet, it’s somehow apart from practical Web analytics and digital metrics. Those kind of metrics, however, are very important for platform entrepreneurs and startup founders.

On the face of it, it seems that the only difference that platforms have compared to “normal” businesses is their two-sidedness. If we have supply side (Side A) and demand side (Side B), then the metrics could be just identical for each and the main thing is to keep track of the metrics for both sides.

However, there are some dynamics at play. The company has one goal, one budget, one strategy, at least typically. That means those metrics, even though can be computed separately, are interconnected.

Here are some examples of platform metrics:

  • Number of Users/Customers (Side A, Side B)
  • Revenue (Side A, Side B)
  • Growth of Revenue (Side A, Side B)
  • Cross-correlation of Number of Users and Revenue (e.g., Side A users => Side B revenue)
  • Cost per User/Customer Acquisition (Side A, Side B)
  • Support cost (Side A, Side B)
  • Average User/Customer Lifetime (Side A, Side B)
  • Average Transaction Value (Side A, Side B)
  • Engagement Volume (Side A, Side B)
  • Profitability distribution (Side A, Side B)

Note the cross-correlation example. Basically, all the metrics can be cross-correlated to analyze how different metrics of each side affect each other. Moreover, this can be done in different time periods to increase robustness of the findings. Such correlations can reveal important information about the dynamics of network effects and tell, for example, whether to focus on adding Side A or Side B at a given point in time. A typical example is solving the cold start problem by hitting critical mass, i.e., the minimum number of users required for network effects to take place (essentially, the number of users needed for the platform to be useful). Before this point is reached, all other metrics can look grim; however, after reaching that point, the line charts of other metrics should turn from flat line to linear or exponential growth, and the platform should ideally become self-sustainable.

Basic metrics can also be used to calculate profitability, e.g.

Average Transaction Value x Average Customer Lifetime > Cost per Customer Acquisition

Business models of many startup platforms are geared towards “nickel economics,” meaning that the average transaction values are very low. In such situations, the customer acquisition cost has to be low as well, or the frequency of transactions extremely high. When these rules are violated, the whole business model does not make sense. This is partly because of competitive nature of the market, requiring sizable budgets for actual user/customer acquisition. For platforms, the situation is even more serious than for other businesses because network effects require the existence of critical mass that costs money to achieve.

In real world, customer acquisition cost (CPA) cannot usually be ignored, apart from few outliers. The CPA structure might also differ between the platform sides, and it is not self-evident what type of customer acquisition strategies yield the lowest CPAs. In fact, it is an empirical question. A highly skilled sales force can bring in new suppliers at a lower CPA than a digital marketer that lacks the skills or organic demand. Then again, under lucrative conditions, the CPA of digital advertising can be minuscule compared to sales people due to its property of scaling.

However, as apparent from the previous list, relationship durations also matter. For example, many consumers can be fickle but supplier relationships can last for years. This means that suppliers can generate revenue over a longer period of time than consumers, possibly turning a higher acquisition cost into more a more profitable investment. Therefore, churn of each side should be considered. Moreover, there are costs associated with providing customer support for each side. As Lauri noted based on his first-hand experience working for a platform company, the frequency and cost per customer encounter differ vastly by side, and require different kind of expertise from the company.

In cases where the platform has an indirect business model, also called subvention because one side is subventing the cost of the other, the set of metrics should be different. For example, if only Side B (paid users) is paying the platform but is doing so because there is Side A (free users), Side B could be monitored with financial metrics and Side A with engagement metrics.

Finally, profitability distribution refers to uneven distribution of profitable players in each market side. This structure is important to be aware of. For example, in e-commerce it is typical that there are a few “killer products” that account for a relatively large share of sales value, but the majority of the total sales value is generated by hundreds or thousands of products with small individual sales. Understanding this dynamics of adding both killer products and average products (or complements, in using platform terms) is crucial for managing the platform growth.

Footnotes:

[1] Lauri Pitkänen, my best friend.

My website go hacked — here’s what I learned

My friend Vignesh alerted me earlier this week that my site has been hacked and is forwarding to some malware site.

At first, I called GoDaddy to ask for help. That was useless. It turns out their tech support consists of sales reps trying to sell you shit — no help for cleaning the site. They could it for me for 200 pounds, which I didn’t feel like paying, especially since they advertise their contact number as support, not sales.

So, I started googling and learning how to fix the site myself.

Here are the lessons I learned

  • Only install plugins and themes you really need and use. Every theme and plugin is a potential security risk. Most likely, hackers utilized one of my plugins to enter the website. I had tons of plugins and themes I didn’t use and although I did update them every now and then, the plugin creators are not necessarily fixing the vulnerabilities that quickly if at all.
  • Keep WordPress core, themes, and plugins updated. As I mentioned, I updated the themes and plugins every now and then. It’s important to do that as frequently as updates roll in. However, my WordPress core is automatically updated by GoDaddy. That’s why I think an outdated plugin was probably the root cause for the hack.
  • Don’t use GoDaddy — in the process, I learned their tech support is useless. In addition, I read WP Engine is safer than GoDaddy – they block some plugins altogether, and actually fix your site for free if it’s been hacked. GoDaddy also doesn’t let you change your database password after being hacked (talking about their Managed WordPress hosting which I’m using), so even though I cleaned the website, it’s still potentially vulnerable.

Here are the steps I did for the cleaning. Also including some useful links to start from in case your WordPress gets hacked.

  • To GoDaddy’s credit, I could find a message where they listed infected files on my website. I started by manually removing these 15 files.
  • .htaccess was infected. I replaced its content with default content (code in [1], also includes additional code that blocks external connections [2])
  • Removed all plugins and themes apart from the theme I’m using and CloudFlare CDN plugin which I need. Everything else could go.
  • Downloaded a fresh copy and reinstalled my theme from scratch (removed the whole folder and replaced with a clean one).
  • Installed the free Anti-Malware Security and Brute-Force Firewall and ran the analysis. It couldn’t find any more infected files, but suggested potentially vulnerable files. I went through these files manually one by one. They contained no suspicious code and their edits dates did not differ from those of clean WP installation, so they were not compromised.
  • Changed WP security tokens to log out every user.
  • Removed a spam user and changed other users’ passwords to new, strong passwords.
  • Manually checked WP core files for malicious code but couldn’t find (also comparing Last Modified times to those in a clean WP directory helps).
  • Set up a .htaccess script that blocks php files in Upload folder [3]
  • Finally, made sure that WP + theme + plugins that remain are up-to-date.

The only things I didn’t do are (1) reinstalling WP core (used a virus scanner + manual check instead) and (2) changing SQL password (GoDaddy doesn’t let you do that — another reason to avoid them). Moreover, (3) raw usage logs could also be viewed via Cpanel in order to find IPs of the hackers but, again, GoDaddy doesn’t give you Cpanel access in the plan I’m using.

Useful links I used

https://sucuri.net/guides/how-to-clean-hacked-wordpress

https://askwpgirl.com/10-steps-remove-malware-wordpress-site/

https://codex.wordpress.org/FAQ_My_site_was_hacked

https://www.killersites.com/community/index.php?/topic/22255-i-think-my-wordpress-site-was-hacked/

Footnotes

[1] # BEGIN WordPress

<IfModule mod_rewrite.c>

RewriteEngine On

RewriteBase /

RewriteRule ^index\.php$ – [L]

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteRule . /index.php [L]

[2]

# Block WordPress xmlrpc.php requests

<Files xmlrpc.php>

order deny,allow

deny from all

allow from 123.123.123.123

</Files>

</IfModule>

# END WordPress

[3]

<Files *.php>

deny from all

</Files>

Myynti on resurssien tuhlausta – ajatuksia myynnin hyödyllisyydestä isossa kuvassa

Kansantaloudessa käytetään merkittävä määrä resursseja myyntiin, joka ei tuota tuloksia tai ole tarpeellista.

Myynti on tarpeellista silloin, kun asiakkaalla on tarve, jonka hän joko tiedostaa tai ei tiedosta. Kummassakin tapauksessa myyjästä on asiakkaalle hyötyä — myyjä toimii informaation välittäjänä ja tehostaa näin markkinoiden toimintaa. Logiikka on sama kuin informatiivisen mainonnan teoriassa.

Mutta silloin kuin myydään tuotetta, joka ei ole tarpeellinen tai haluttu, on kyseessä puhdas resurssien tuhlaus (olettaen, että resurssi eli ihmistyövoima voitaisiin käyttää johonkin oikeasti hyödylliseen hommaan, mikä välttämättä aina ei ole totta).

Aitoa kohtaantoa ei tällöin ole, ja tuloksena sekä myyjä että ostaja tuhlaavat aikaansa. Myynti voi jopa olla haitallista – silloin kun ns. huijataan ihminen ostamaan jotain mitä hän ei halunnutkaan ainoastaan tyrkyttämällä.

Näistä syistä myyjiin suhtaudutaan yleensä nuivasti – aika on kallisarvoista, ja kun ei ole tarvetta, pitää vuorovaikutukseen menevä hukka-aika minimoida. Kokemuksesta tiedän, että päällikkötason henkilöllä kuluu paljon aikaa myyjien kanssa asiointiin, ja monet myyjät eivät usko, ettei oikeaa tarvetta ole, vaan koittavat väkisin tyrkyttää.

Nuiva suhtautuminen voi kuitenkin olla ongelma ostajalle silloin kun myyjä tarjoaisi jotain oikeasti hyödyllistä, mutta siihen ei koskaan päästä koska ei anneta tämän mahdollisuuden ilmetä. Ostajan paras strategia onkin kuunnella perusajatus, ja sitten vetää johtopäätös kiinnostaako vaiko ei. Pohjimmiltaan kyse on arvostelukyvystä.

On kuitenkin myös ns. kohteliaita ostajia, jotka eivät syystä tai toisesta sano, ettei kiinnosta tai etteivät pysty ostamaan mitään. Tällöin ongelma kääntyy toisinpäin, ja myyjä hukkaakin aikaansa.

Ostajan kieltäytyminen voidaan jakaa seuraaviin pääluokkiin:

  • aito kieltäytyminen = ei ole tarvetta, eikä lisäsuostuttelu tässä nyt auta. Tällöin optimiratkaisu sekä myyjän että ostajan kannalta on siirtyä eteenpäin.
  • epäaito kieltäytyminen = on tarve, mutta sitä ei tiedosteta / suostuta kuuntelemaan. Tällöin optimiratkaisu olisi lisäsuostuttelu, joka hyödyttää molempia osapuolia.
  • aito hyväksyminen = ostetaan, koska on tarve. Tämä on molemmille osapuolille optimi.
    epäaito hyväksyminen 1 = ostetaan, vaikka ei ole tarvetta, esim. pakkomyynnin tai tietämättömyyden seurauksena. Tässä pelissä ostaja häviää.
  • epäaito hyväksyminen 2 = ei osteta, mutta annetaan ymmärtää, että voidaan ostaa. Tietyt ostajat käyttävät myyjiä esim. oman asiantuntijuuden kasvattamiseksi ilman tarkoitusta ostaa. Tällöin ostaja saa hyötyä myyjän kustannuksella; myyjä häviää turhaan aikaa. Epäaitoon hyväksyntään voi myös johtaa yleinen kohteliaisuus, joka tulkitaan väärin ostosignaaliksi myyjän toimesta; taikka firman imagon ylläpitäminen, jossa kaikkia myyjiä kuunnellaan reilusti.

Kansantalouden kannalta tehoton myynti on sekä mikro- että makrotason ongelma. Mikrotason, koska pahimmillaan se kaataa myyvän organisaation – etenkin startupeilta puuttuu taloudellinen puskuri pitkien ja turhien myyntineuvottelujen käymiseksi. Makrotasolla taas optimi saavutetaan, kun ihmiset tekevät tuottavaa työtä.

Johtopäätöksenä aidot ei-realisoituvat suhteet ovat ok, mutta epäaidot hyväksynnät saavat aikaan tehottomuutta.

Myyjän kannattaa miettiä seuraavia asioita:

  1. miten voi tunnistaa epäaidon kieltäytymisen?
  2. mikä on oikea strategia suhtautua epäaitoihin kieltäytymisiin?
  3. miten voi tunnistaa epäaidon hyväksymisen?
  4. mikä on oikea strategia suhtautua epäaitoihin hyväksymisiin?

Kaikkien kannalta olisi parasta “iskeä kortit pöytään” ja selvittää mahdollisimman nopeasti:

  • mikä on myydyn palvelun tarkoitus?
  • onko asiakkaalla sille aitoa tarvetta?
  • voiko asiakas tehdä ostoksen nyt? jos ei, milloin?

Viime kädessä myynnin tehottomuusongelmat voidaan laskea kommunikaatiovirheiksi.

Digital analytics maturity model

Digital analytics maturity model:

  1. Concepts — here, focus on is on buzzwords and realization that “we should do something”.
  2. Tools — here, focus is on tools, i.e. “Let’s use this shiny new technology and it will solve all our problems.”
  3. Value — here, we finally focus on what matters: how will the tools and technologies serve and integrate with our core competitive advantage, i.e. “Guys, what’s the point?”.

Applies to almost any booming technology.

Problem of continuous value in SaaS business

A major challenge for many SaaS businesses is to provide continuous value, so that the users are compelled to continue using the service.

There’s a risk of opportunism if the user can achieve his goals with one-time use; he then either uses the free trial version, or only subscribes for one month.

For example, some SEO tools enable data download, so why should I stick around after downloading the data?

This is especially pertinent if my decision making cycle is not frequent, so I don’t really need monthly data.

Potential ways to counter this effect:

  1. develop automatic insights that continuously tell the user something they didn’t know, without him having to log into a system
  2. include different tiers for one-time users (e.g., one-time report feature with the cost of xxxx USD)
  3. understand the decision making cycles of different users, and make sure your business model is adapted to them
  4. put previously free features behind a subscription plan
  5. raise the monthly price so increase CLV even for those users that drop after a month

The latter I’ve seen applied by many startups, e.g. SurveyMonkey that raised its prices substantially. At the same time, though, they lost me as a customer – that’s the risk, and it can only work if they have more high-value customers not to care about my business.

Number four was applied by Trello that decreased the number of PoweUps to one – essentially forcing you to pay if you want to use any of them (because “Calendar” is already a PowerUp). Often, the application of these upselling tactics take place after the startup has been sold or there are new investors that wish to capture a larger share of value produced by the service. Obviously, this comes at the cost of free users who previously had a great deal (=large share of value provided by the startup), now reduced to “good” or “decent” deal depending on their tolerance level.