Control Groups in Small Populations

Jim answers questions from fellow Drillers
(More questions with answers here, Work Overview here, Index of concepts here)

Q: Thank you for your recent article about Control Groups.  Our organization launched an online distance learning program this past August, and I’ve just completed some student behavior analysis for this past semester.

Using weekly RF-Scores based on Recently and Frequently they’ve logged in to courses within the previous three weeks, I’m able to assess their “Risk Level”– how likely they are to stop using the program.  We had a percentage who discontinued the program, but in retrospect, their login behavior and changes in their login behavior gave strong indication they were having trouble before they completely stopped using it.

A: Fantastic!  I have spoken with numerous online educators about this application of Recency – Frequency modeling, as well online research subscriptions, a similar behavioral model.  All reported great results predicting student / subscriber defection rates.

Q: I’m preparing to propose a program for the upcoming semester where we contact students by email and / or phone when their login behavior gives indication that they’re having trouble.  My hope is that by proactively contacting these students, we can resolve issues or provide assistance before things escalate to the point they defect completely.

A: Absolutely, the yield (% students / revenue retained) on a project like this should be excellent.  Plus, you will end up learning a lot about “why”, which will lead to better executions of the “potential dropout” program the more you test it.

Q: However, in light of your newsletter, I realized that we should probably have a control group with whom we do NOTHING (just as we did this past semester) in order to prove the effectiveness (or not) of the program.

A: Correct.  Otherwise, you won’t be able to make a valid claim to the “saved students”. People can always argue a variety of other factors were in play – seasonality, topic, course sequence, etc.

Q: Since the actual number of students is confidential, can you please tell me what percentage you would use for a control group if we had 400, 800, 1200, 1600, 2000, 3500, or 5000 students   You mentioned 10% in your newsletter, but the population you were referring to exceeded millions.

A: Well, there are online calculators you can use confidentially, example right here.

If you don’t understand the variables they are asking for, explanations at bottom of page, though this is very simple – what is confidence level and interval plus population size.

Q: Our population is MUCH smaller, and each customer is therefore even more critical.  I don’t want to recommend an unnecessarily large control group that would prevent us from retaining future students when we could see they were having trouble.

I suspect that our defection rates will be lower 2nd semester than 1st since students should be beyond the “learning curve,” so I don’t think we can justly say that the program alone is the reason for lower defection rates if we don’t use a control group.

A: Yes, well, this desire to “get as much test as we can” was the main point discussed in the newsletter.  And that’s the challenge with very small populations – to hit statistical confidence levels at say population = 500, you need over 300 or so in control.

Not so great.

So we go back to the question of company culture and how intuitively confident people will be with the results.  Do they in fact need true statistical significance for a program like this?

There is a way around the significance issue – repetition. The stats part of this is all about the “likelihood you get the same results again” – real important for drug testing, not so much for 500 folks in a marketing program.

The question you need to ask: do you really need “prediction”?  Or does prediction just make the whole test more complex and expensive than it’s worth?  What if you repeated the test a couple of times and got roughly the same results, is that “proof”?

Here is what I might do.  I would ask whoever needs to believe in the results of this test a question like this:

“Let’s say we took a random 20% sample of the students and excluded them from the marketing.  We apply the marketing to the other 80% and their retention rate is 15% higher than the 20% who had no marketing. We do this test 2 more times and the retention rate of students in the test is 13% and 17% higher than the students in the 20% who do not receive the marketing.  Would you at that point believe that without question, the marketing drives at least a 13% improvement in retention among students?”

Do you see where I’m headed with this?  The more times you repeat the test, the more confident you will be in the results – regardless of sample sizes and statistical mumbo jumbo. At some point, the reality of the differences between test and control performance has to be accepted.  It may help to define up front how many repetitions the “boss” needs.

There are two clues to help you evaluate the validity of your results / how many times you need to repeat the test to be “confident”.

One clue is the variability of the results – the more inconsistent the results are, the more likely the data is “noisy” and the more times you need to repeat the test to be confident.

If the spreads between test and control for the first 3 tests are 20%, 5%, and 10%, then you’ll need more repetitions of the test to get a good feeling for the actual impact.  If the results tend to cluster as in the example above (15%, 13%, 17%) then you can be more confident earlier in the test series the actual impact is somewhere around 15%.

The other clue is in the “spread” between test and control.  If the spread is consistently  “wide”, say +10% (or more), this provides additional confidence a positive impact is being made.  The result over a series of tests may not actually be +10% (confirm by repeating the test), but it’s more likely to be positive.  If you consistently get a spread more like 1% or 2%, it’s more likely the actual result could be zero or negative and you need to keep repeating the test to gain confidence you have a positive result.

In the end, you may not want or be able to repeat the test enough times to know with statistical confidence what the result is. But if the spread between test and control is wide and consistent, and the cost relative to the benefit is small, then does it really matter if there is statistical confidence?

For example, if you can make the statement you’re confident the program generates at least $10 in profit for each $1 invested, does it really matter if the statistically confident number is $11 or $12 profit for $1 in cost? We’re doing Marketing here, not drug testing. There is an opportunity cost (profit left on the table) to not rolling out a program based on a test with results like this; rather than repeat the test to death just to be more confident I’d roll it out and continue to monitor the results.

One more tip, on this idea of sequencing / semesters / experience with the program.

There is no doubt in my mind that 2nd semester students would have what is called a “survivor bias” and be less likely to drop out; you will get the best performance in a program like this with 1st semester students.  So if at all possible, run the test / control on only 1st semester students , or segment by semester.

But, just because you run it on only 1st semester students does not mean you don’t have an effect in 2nd semester.  Continue to follow test and control into 2nd, 3rd, 4th semesters and you may see the dropout rate of the original 1st semester group continue to widen versus control.

This is not only great for the profitability of the initial 1st semester program but also provides you the baseline you have to beat (control) for those 2nd, 3rd, 4th semesters.  When you decide to see if you can have an additional effect by intervening in those periods, you’ll have 2 groups: those affected by Marketing in the 1st semester, and those new to any Marketing intervention.

My guess: a 1st semester intervention will have tremendous impact, both then and throughout the 4th. The impact of intervention at each subsequent semester will diminish compared with acting in 1st semester, as will the “tail” value created over the student life, since the number of months left in the student life is shrinking each semester.

Hope that helps!


Get the book at

Find Out Specifically What is in the Book

Learn Customer Marketing Concepts and Metrics (site article list)

Download the first 9 chapters of the Drilling Down book: PDF 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.