The following is from the March 2011 Drilling Down Newsletter. Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection? Just ask your question. Also, feel free to leave a comment and I’ll reply.
Want to see the answers to previous questions? Here’s the blog archive; the pre-blog newsletter archives are here.
Q: We’ve been playing around with Recency / Frequency scoring in our customer email campaigns as described in your book. To start, we’re targeting best customers who have stopped interacting with us. I have just completed a piece of analysis that shows after one of these targeted emails:
1. Purchasers increased 22.9%
2. Transactions increased 69%
3. Revenue increased 71%
A: There you go!
Q: My concern is that what I am seeing is merely a seasonal effect – our revenue peaks in July and August. So what I should have done is use a control group as you described in the book – which is what I am doing for the October Email.
A: Yep, that’s exactly what control groups are for – to strain out the noise of seasonality, other promotions, etc. But don’t beat yourself up over it, nothing wrong with poking around and trying to figure out where the levers are first.
Q: Two questions:
1. What statistical test do I use to demonstrate that the observed changes are not down to chance
2. How big should my control group be – typically our cohort is 500-800 individuals
A: Good questions…
On a group that small, you are probably not going to get anything “statistically significant” without ruining your total profit, e.g. might have to use 50% in control. If you have the leeway to do it, that’s what I would do.
On the other hand, in some cultures people will go bonkers over giving up sales to learn something really important. OK, so take 10% as control and repeat it 3 times; if the results are stable then you have your proof. Do another control every once and a while (every 6 months?) just to make sure it tracks.
Either way, you don’t really need statistics.
Practically, confidence is the likelihood a sample represents the population. This can be a really useful idea when you are forced into very small sample sizes or the event is highly risky to repeat. But here, if you are testing a really large slug of the population, confidence is less useful. Or if you can repeat the event (because essentially, you are in control of it and it’s low risk), do you really need to force yourself through the screw of complying with the statistical math? It’s like using a 727 for crop dusting, overkill for the situation, methinks.
If you were running a drug manufacturing line, statistical concepts like confidence and significance are absolutely valuable. But for a marketing program?
That’s why I love the idea of “beefy controls” in start-up projects because I *do not* have to rely on statistics that the audience likely does not understand and provide room to question the results, e.g. “Yea, but what if the result is an outlier?” Very appropriate in high risk situations, with giant populations and a lot of money on the line. For this situation, perhaps not. But, if you’d like to go that way, there’s lots of calculators on the web that let you play with some of the numbers anyway.
Here’s one, make sure to read the descriptions of the variables underneath the calculator:
Nice work on the core campaign idea, by the way! Now we just have to tighten it up a bit…
(3 months later)
Q: We decided to tighten the targets and do a “best customer defection” email program. Basically, we look at customers who has an RFM score of 555 in the previous scoring period who have dropped out of that score.
A: Interesting! So instead of targeting by guessing the current score of a defecting best customer (say 355), you are looking for all customers who were formally best customers, regardless of current score. This is a subtle difference, but much more of a LifeCycle approach and frankly why I prefer these kinds of ideas over “straight” RFM.
An example might be helpful. Let’s say the acquisition folks run a huge new customer campaign in between the prior RFM scoring and the scoring done before your campaign drop. A big inflow of new customers can artificially “force” certain groups of customers down in score – even though their own behavior has *not changed*. In this case, the new score is not reflective of actual behavior, so increases noise in the system.
That’s the problem with the “Snapshot” or date-specific view of Customer State – it’s a single point without reference. By using prior score, you are acknowledging behavior over time and the primary importance of the former State, as opposed to the current State – a Movie as opposed to a Snapshot.
In other words, from a Marketing perspective, I’m more interested in the path they are taking through the LifeCycle than any particular point in time during the LifeCycle represented by a single RFM score.
Q: Good news on your advice. We ran a 50% control (500 purchasers in each group) and the results really nailed the issue for us. The actual number of purchasers remained unchanged at 20% but Total Revenue and Average Spend increased by 40% compared to control.
(Jim’s Note: for those not following, a very precise target group of 1000 was split into 2 groups of 500. One group received this campaign, the other did not. People who did not receive the campaign purchased at the same rate as people who did receive the campaign, but the people who received the campaign averaged 40% higher spend).
A: Awesome. So what you are seeing is Customer State makes a huge difference in terms of what offers / timing can be most effective for this “Recently defecting best customers” cohort. If I’m reading your numbers correctly, no lift in response versus control but a huge lift in revenue.
To me, that means these customers are early in the process of defection – still buying, but without a special treatment, slowing down the monthly spend. After all, they are very Recent (former 5XX), so highly likely to purchase again, which is why lift in response was flat – they likely would have purchased anyway.
Not a bad time to hit them. Offers to a very Recent State should focus on increasing order value, not generating response – you don’t want to spit into the wind, but go with the natural flow of the behavior.
In other words, these customers likely would have purchased anyway, but at lower price points if they had not received the campaign. The common way this is addressed is with “threshold” discounts – if average order is $50, then something like “$10 off any purchase over $50” – test different thresholds to maximize profitability.
Looks like you gave them the right offer ;)
On the other hand, a straight discount to this specific best customer group – $10 off anything, and especially when their normal category of purchase is promoted to them – almost ensures that you will lose money. Why? Most of these customers would have bought at full price anyway, as demonstrated by equal buying activity whether the customer received the campaign or not. So the discount turns into a loss versus no campaign at all.
Unfortunately, I see a lot of this exact type of campaign delivered to best customers because all customers get some version of the same offer. “Hey Jim, we’re not sending the same message to every customer, we send different messages by segment”. Sure, the copy and art are customized for different segments, but the segmentation is not by Customer State, so the offers are mismatched and suboptimal.
This is the value of using control groups; they drive understanding of Marketing concepts like opportunity costs and subsidy costs. These two concepts are the reasons why ignoring Customer State is suboptimal: by not segmenting using State, you will get lower than possible profit or sales at most customers, depending on Customer State.
Had you not delivered a campaign tailored for prior Customer State, money would have been left on the table by way of lower order size. And 40% Revenue lift sounds like it might have covered the cost of the campaign ;)
Q: We tried to run a Student T test on the results but our new statistician informed me that the distributions were not normal – so on her advice we ran a Wilcoxan Test which gave us a highly significantly p = 0.016
A: Oh, so you still went the stats route? Well, the fact you HAVE a statistician tells me the culture there is more familiar with interpreting these ideas, so more power to you.
Glad it worked out and keep me informed on how things go downstream.