The following is from the December 2009 Drilling Down Newsletter. Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection? Just ask your question. Also, feel free to leave a comment and I’ll reply.
Q: I am a big fan of your web site and read your Drilling Down book. Great work!
A: Thanks for the kind words!
Q: I was wondering if you could help me picking the right control group size for a project of ours? The population is 25 million telco customers that for which we want to do a long term impact analysis (month by month) in regards to revenue increase versus control group. The marketing initiatives are mix of retention, lifecycle and tactical/seasonal activities. We want to measure revenue increase through any of the marketing activities compared to control group.
A: Great project, this is the kind of idea that can really improve margins if you can find out which specific tactics drop the most profit to the bottom line.
Q: I have searched the web for some help and found calculators that say: On 25 million and smallest expected uplift of 0.1% and highest likely rate of > 5% the calculator gives 250k (1%). Is that sufficient to calculate the net impact on the remaining base? Would be very grateful if you could give me your thoughts.
A: Well, it could be and might not be…
If you were manufacturing widgets, where the outcomes are clear (unit is defective or not defective), you might use this approach to the question. But in Marketing we’re talking about human behavior, and there is quite a lot more variability in outcomes and more room for interpretation. You can encounter a number of problems down the road by running a control so “tight” to the statistically correct size.
From a practical perspective, when you do a test of this magnitude (and I assume strategic importance), you don’t want test to just “beat control”, you want to beat control beyond a shadow of any executive’s possible doubt.
From personal experience, I can tell you that executives tend to be non-believers with a 1% control versus a 5% control or a 10% control. So some of this control size choice is culture-based – if the exec team is a bunch of engineers that understand / believe in statistical sampling methods, then 1% is probably OK in terms of believing the results are predictive of future events.
But if you need to convince a CFO or somebody who will be working from gut or risk management rather than “science” then 1% may not be enough, there is too much perceived “room for error” with a 1% sample (even with the science).
This is in effect a “perceived confidence interval” argument – the difference between 95% confidence and 99.999% confidence. Engineers may be OK with 95% because they intimately understand the derivation of it; CFO’s not so much. CFO’s may not even understand the math behind confidence but intuitively, they perceive that 10% control is “more likely to be accurate” than 1%.
Said another way, do you want people to argue about the math and stats and waver on their belief in the outcome, or do you want them to just look at a simple chart of test versus control numbers and say, “Congratulations, that’s a tremendous success!”. A 10% control gets you complete agreement on the results without any quibbling. At 1%, you may get “what about the chance we are wrong” arguments.
Now, there are financial implications to using very large controls – some positive (reduced expense), and some negative (potential revenue foregone). So choosing control group size can be impacted by these other issues. In small population tests these financial impacts are usually quite small, so negligible and I always go for large controls.
But in a population of 25 million, maybe not so.
Which brings us to the second consideration – segmentation or “drill down” after the test.
Nothing is quite so painful as gearing up for a test of this magnitude, producing a stunning positive result on a “macro” basis across all initiatives, and then having the execs ask, “What is the driving force behind this increased profitability in the test group? Is it retention, lifecycle or tactical / seasonal?” Or as often happens in telco (usually from an ops GM or VP), “What was the result of this test in my region or on my platform?”
With a 1% control across the entire population, you frequently are “boxed in” when it comes to sub-populations because you lose significance (both perceived and scientific) as you drill in. You may be OK on a couple of large scale events on large populations, but as we know, every answer begs another question and you can run out of statistically significant answers pretty quickly. If you use a large control at the macro level, you are (as a rough example) 99% confident at the macro level, 98% confident one segment down, 97% confident two segments down, 95% confident three segments down, etc.
One way to handle this is to build the test from subsegments up to the macro level. Let’s say at a minimum you want 3 subsegments in the test – retention, lifecycle or tactical / seasonal – and each of these you want to be 95% confident in. Since some of these programs are triggered by behavior (lifecycle) and some by calendar (seasonal) I’d guess the sizes of the populations and number of executions could be vastly different. Meaning, you may only need 1% control on the seasonal promotions but more like 5% or 10% control for some of the lifecycle stuff to be 95% confident on the outcomes of those.
When you sum all these segments up, you often end up with more like 2% or 3% of the entire population in control groups to always be at least 95% confident at all the desired subsegments, which means you end up with even higher confidence at the macro “all campaigns” level – a very good thing.
And much better than trying to explain why you can’t answer a subsegment question because you used 250K instead of 400K or 600K in the control group, if you know what I mean! That’s when people forget the arguments about foregone revenue and start saying stuff like “Why did you not use a larger control group for this test?”
In the end, you will thank yourself again and again for using a larger than minimum required control at the macro level because you WILL come up with that unexpected “must know” question and be thrilled to find out you actually can answer it at a decent level of confidence.
Good luck with it, let me know what you learn!