Control Groups and Random Sampling

Control Groups

High ROI Customer marketing depends a lot on a tool known as control groups.  A control group is a random sample of the customers targeted for some kind of test program who will NOT be offered the test. In medicine, control groups are the ones receiving “placebo” – the sugar pill.

Let’s say that you wanted to test a discount mailing to your best customers.  You would select all these customers for your list, and then take a random sample of them to exclude from the mailing – usually anywhere from 3% to 10% of the total.  This group is known as the control group; the others who will receive the mailing are the test group.

Why do this?  Since the control and test customer groups are exactly the same, you can compare the buying behavior of the test group versus the control group over time to determine precisely what the effect of your mailing is.  Taking this approach screens out a lot of external noise (like other promotions these groups may be exposed to) and gives you a true read on your profitability.  

Using control groups also allows for inclusion of typically high ROI halo effects, which are rarely measured by most people doing promotions.  Halo effects occur when people respond to a promotion outside of the business tracking process but are “not counted” as having responded.  For example, you send a discount and the customer loses it but makes a purchase anyway because you “reminded” them of a need they had.  Typically, all anybody measures is response, which does not give a true read on ROI.  

You cannot measure halo effects without a control group.  If you aren’t using controls, you are short-changing yourself, because the promotion could be many times more profitable if you include the halo effects.

Want more detail on this topic? See the Control Group Series.

Random Sampling

Most controlled testing in database marketing requires the creation of a random sample of your customer base, either for the test group – targets receiving the mailing, or the control group – those not receiving the mailing.

When you are testing new concepts, you usually don’t want to blow a whole bunch of money, so a random sample of the target group is created for the mailing (test group), and the rest of the target group acts as control.  When you are going with proven high ROI concepts, you want to mail as many pieces as possible (test group), so the random sample is created to act as control.

For the first case, when testing new concepts, the larger the random sample is on a percentage basis, the more accurate its predictive power will be.  You want the results of a test to be repeatable – if it works, you want to do it again.  The larger the sample is, the more likely the results of the test can be repeated on the next mailing.

Three percent will give you a pretty good shot.  Larger samples will cost more to mail but will add extra stability to the predictive power of the sample; smaller samples could result in unstable predictive power, for example, the promotion makes money the first time but when repeated it loses money.

If you can afford it, go to 10%; 5% is good, but 3% is OK.  The smaller your database, the higher percentage you should take for a test, in general, to even out the instability that comes from testing small databases (under 5,000 customers).  If you have only 1,000 customers, consider a 20% test, or if you can afford it, run the test to every customer not in the control group.

In the second case, tracking proven high ROI concepts, the larger the control group sample is, the more reliable and repeatable the results of the promotion will be.  Early on in the life of a promotion, it is a good idea to use a “fat” control group, just to make sure the ROI is tracking.  Over time, you can reduce the size of the control group when you are confident the results are stable.

These tests are extremely important events, as the information gained is used extensively down the line.  Don’t skimp on a test if you can help it.  Also make sure the sample is truly random, and doesn’t introduce any bias, meaning the sample is not truly random because the selection methods used have distorted the selection process.

Here’s an example of introducing bias during random sample selection:

Let’s say you have 1,000 customers, and they were consecutively assigned customer ID’s, meaning you oldest customers have the lowest ID numbers.  You want a 10% sample, or 100 customers.  Your customers happen to be sorted by customer ID, and you start choosing customers with customer ID 1 and select every 5th customer.  You would have the 100 customers you need by customer ID 500.  But your sample would be biased, because the customer group you have selected has a higher percentage of old customers than the entire customer base.

The customer base was sorted by ID, meaning your oldest customers have the lowest ID and newest customers the highest ID.  You stopped choosing at 500, instead of choosing through the entire customer base; this creates the bias towards older customers.  

If you had selected every 10th customer instead, you would have ended with your most recent customer and have an even sample with no bias against representation by a particular customer group.  Bias can occur geographically, by product type, and so on.  Be careful with the way a database is sorted if you are using a “choose every Nth customer” random selection technique.

A convenient way to generate a random sample, if you use consecutively numbered customer ID’s, is to pick a digit location from the customer ID, and specify a value for it. Then choose every customer with this value at the specified location in the ID.  You’ll get a 10% sample.  For example,”give me everybody whose customer number ends in 2″ or “give me everybody having a 4 in the second to last digit location”.  For this to work, you have to have at least one customer in the next highest (to the left) digit location.  For example, if you have 5,349 total customers, you could use any of the last 3 digit locations (left of the comma in 5,349) but not the lead (left-most) digit location.  Using the left-most digit would introduce bias, since the selection would complete halfway through the run, before a full 10% sample is taken.

Get the book at

Find Out Specifically What is in the Book

Learn Customer Marketing Concepts and Metrics (site article list)

Download the first 9 chapters of the Drilling Down book: PDF