Control Groups in Small Populations

February 5th, 2010

The following is from the January 2010 Drilling Down Newsletter.  Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection?  Just ask your question.  Also, feel free to leave a comment and I’ll reply.

Want to see the answers to previous questions?  Here’s the blog archive; the pre-blog newsletter archives are here.

Q: Thank you for your recent article about Control Groups.  Our organization launched an online distance learning program this past August, and I’ve just completed some student behavior analysis for this past semester.

Using weekly RF-Scores based on Recently and Frequently they’ve logged in to courses within the previous three weeks, I’m able to assess their “Risk Level”– how likely they are to stop using the program.  We had a percentage who discontinued the program, but in retrospect, their login behavior and changes in their login behavior gave strong indication they were having trouble before they completely stopped using it.

A: Fantastic!  I have spoken with numerous online educators about this application of Recency – Frequency modeling, as well online research subscriptions, a similar behavioral model.  All reported great results predicting student / subscriber defection rates.

Q: I’m preparing to propose a program for the upcoming semester where we contact students by email and / or phone when their login behavior gives indication that they’re having trouble.  My hope is that by proactively contacting these students, we can resolve issues or provide assistance before things escalate to the point they defect completely.

A: Absolutely, the yield (% students / revenue retained) on a project like this should be excellent.  Plus, you will end up learning a lot about “why”, which will lead to better executions of the “potential dropout” program the more you test it.

Q: However, in light of your newsletter, I realized that we should probably have a control group with whom we do NOTHING (just as we did this past semester) in order to prove the effectiveness (or not) of the program.

A: Correct.  Otherwise, you won’t be able to make a valid claim to the “saved students”. People can always argue a variety of other factors were in play – seasonality, topic, course sequence, etc.

Q: Since the actual number of students is confidential, can you please tell me what percentage you would use for a control group if we had 400, 800, 1200, 1600, 2000, 3500, or 5000 students?  You mentioned 10% in your newsletter, but the population you were referring to exceeded millions.

A: Well, there are online calculators you can use confidentially, example right here.

If you don’t understand the variables they are asking for, explanations at bottom of page, though this is very simple – what is confidence level and interval plus population size.

Q: Our population is MUCH smaller, and each customer is therefore even more critical.  I don’t want to recommend an unnecessarily large control group that would prevent us from retaining future students when we could see they were having trouble.

I suspect that our defection rates will be lower 2nd semester than 1st since students should be beyond the “learning curve,” so I don’t think we can justly say that the program alone is the reason for lower defection rates if we don’t use a control group.

A: Yes, well, this desire to “get as much test as we can” was the main point discussed in the newsletter.  And that’s the challenge with very small populations – to hit statistical confidence levels at say population = 500, you need over 300 or so in control.

Not so great.

So we go back to the question of company culture and how intuitively confident people will be with the results.  Do they in fact need true statistical significance for a program like this?

There is a way around the significance issue – repetition. The stats part of this is all about the “likelihood you get the same results again” – real important for drug testing, not so much for 500 folks in a marketing program.

The question you need to ask: do you really need “prediction”?  Or does prediction just make the whole test more complex and expensive than it’s worth?  What if you repeated the test a couple of times and got roughly the same results, is that “proof”?

Here is what I might do.  I would ask whoever needs to believe in the results of this test a question like this:

“Let’s say we took a random 20% sample of the students and excluded them from the marketing.  We apply the marketing to the other 80% and their retention rate is 15% higher than the 20% who had no marketing. We do this test 2 more times and the retention rate of students in the test is 13% and 17% higher than the students in the 20% who do not receive the marketing.  Would you at that point believe that without question, the marketing drives at least a 13% improvement in retention among students?”

Do you see where I’m headed with this?  The more times you repeat the test, the more confident you will be in the results – regardless of sample sizes and statistical mumbo jumbo. At some point, the reality of the differences between test and control performance has to be accepted.  It may help to define up front how many repetitions the “boss” needs.

There are two clues to help you evaluate the validity of your results / how many times you need to repeat the test to be “confident”.

One clue is the variability of the results – the more inconsistent the results are, the more likely the data is “noisy” and the more times you need to repeat the test to be confident.

If the spreads between test and control for the first 3 tests are 20%, 5%, and 10%, then you’ll need more repetitions of the test to get a good feeling for the actual impact.  If the results tend to cluster as in the example above (15%, 13%, 17%) then you can be more confident earlier in the test series the actual impact is somewhere around 15%.

The other clue is in the “spread” between test and control.  If the spread is consistently  ”wide”, say +10% (or more), this provides additional confidence a positive impact is being made.  The result over a series of tests may not actually be +10% (confirm by repeating the test), but it’s more likely to be positive.  If you consistently get a spread more like 1% or 2%, it’s more likely the actual result could be zero or negative and you need to keep repeating the test to gain confidence you have a positive result.

In the end, you may not want or be able to repeat the test enough times to know with statistical confidence what the result is.  But if the spread between test and control is wide and consistent, and the cost relative to the benefit is small, then does it really matter if there is statistical confidence?

For example, if you can make the statement you’re confident the program generates at least $10 in profit for each $1 invested, does it really matter if the statistically confident  number is $11 or $12 profit for $1 in cost?  We’re doing Marketing here, not drug testing.  There is an opportunity cost (profit left on the table) to not rolling out a program based on a test with results like this; rather than repeat the test to death just to be more confident I’d roll it out and continue to monitor the results.

One more tip, on this idea of sequencing / semesters / experience with the program.

There is no doubt in my mind that 2nd semester students would have what is called a “survivor bias” and be less likely to drop out; you will get the best performance in a program like this with 1st semester students.  So if at all possible, run the test / control on only 1st semester students , or segment by semester.

But, just because you run it on only 1st semester students does not mean you don’t have an effect in 2nd semester.  Continue to follow test and control into 2nd, 3rd, 4th semesters and you may see the dropout rate of the original 1st semester group continue to widen versus control.

This is not only great for the profitability of the initial 1st semester program but also provides you the baseline you have to beat (control) for those 2nd, 3rd, 4th semesters.  When you decide to see if you can have an additional effect by intervening in those periods, you’ll have 2 groups: those affected by Marketing in the 1st semester, and those new to any Marketing intervention.

My guess: a 1st semester intervention will have tremendous impact, both then and throughout the 4th.  The impact of intervention at each subsequent semester will diminish compared with acting in 1st semester, as will the “tail” value created over the student life, since the number of months left in the student life is shrinking each semester.

Hope that helps!

Jim

Acting on Buyer Engagement

January 21st, 2010

Over the years I’ve argued that there is a single, easy to track metric for buyer engagement – Recency.  Though you can develop really complex models for purchase likelihood, just knowing “weeks since last purchase” gets you a long way to understanding how to optimize Marketing and Service programs for profit.

Which brings me to the latest Marketing Science article I have reviewed for the Web Analytics Association, Dynamic Customer Management and the Value of One-to-One Marketing, where the researchers find “customized promotions yield large increases in revenue and profits relative to uniform promotion policies”.  And what variable is most effective when customizing promotions?

The researchers took 56 weeks of purchase behavior from an online store, and used the first 50 weeks to construct a predictive model of purchase behavior.   Inputs to the model included Price, presence of Banner Ads, 3 types of promotions, order sizes, number of orders, merchandise category, demographics, and weeks since last purchase (Recency).

The last 6 weeks of data were used to test the predictive power of the model, and the answer to which variable is most predictive of purchase is displayed in the chart below, click to enlarge:

Weeks since last purchase dominated the predictive power of the model, controlling not only the Natural purchase rate (labeled Baseline in chart above, people who received no promotions) but the response to all three different types of promotion.

Read the rest of this entry »

Choosing the Size of Control Groups

December 29th, 2009

The following is from the December 2009 Drilling Down Newsletter.  Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection?  Just ask your question.  Also, feel free to leave a comment and I’ll reply.

Want to see the answers to previous questions?  Here’s the blog archive; the pre-blog newsletter archives are here.

 Q:  I am a big fan of your web site and read your Drilling Down book. Great work!

A:  Thanks for the kind words!

Q:  I was wondering if you could help me picking the right control group size for a project of ours?  The population is 25 million telco customers that for which we want to do a long term impact analysis (month by month) in regards to revenue increase versus control group.  The marketing initiatives are mix of retention, lifecycle and tactical/seasonal activities.  We want to measure revenue increase through any of the marketing activities compared to control group.

A:   Great project, this is the kind of idea that can really improve margins if you can find out which specific tactics drop the most profit to the bottom line.

Q:   I have searched the web for some help and found calculators that say: On 25 million and smallest expected uplift of 0.1% and highest likely rate of > 5% the calculator gives 250k (1%).  Is that sufficient to calculate the net impact on the remaining base?  Would be very grateful if you could give me your thoughts.

A:  Well, it could be and might not be…

Read the rest of this entry »

Customer Value in the Freemium Model

December 4th, 2009

The following is from the November 2009 Drilling Down Newsletter.  Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection?  Just ask your question.  Also, feel free to leave a comment and I’ll reply.

Want to see the answers to previous questions?  Here’s the blog archive; the pre-blog newsletter archives are here.

Q: You kindly clarified a few issues when I was reading Drilling Down earlier this year – so I hope you don’t mind the direct email.

A: Yes, I remember!

I am working for www.XYZ.com, a social networking / virtual world site based abroad but visitors are 85% US.

Our growth up to now has been mainly viral and in the summer we hit 1.2M UVs operating on the Freemium model with only 5% of our registered users converting to paying customers and a significant portion of our revenue coming from ads.  On average our customers are active on the site for something like 4 months making their first purchase around day 28. 

But to take us to the next stage we are embarking on some marketing for the first time using AdWords and various revenue share campaigns, and of course to do this sensibly we need to arrive at a reasonable estimate of LTV.

A: Makes sense!

Q: To calculate an adjusted LTV I removed all customers with a lifetime of less than 4 months but this gives a low estimate as this calculation ignores the bumper summer months and the extra paid for features put in place earlier this year.  Calculating LTV using ARPU and monthly churn (not sure how to calculate this in our environment) gives another different estimate.  Is there any help or advice you could perhaps give us?  If not in the US then perhaps you could recommend somebody abroad – can’t find anything in the literature relevant for start-up like us.

A:  It sounds to me like you’re trying to make this too complicated, at least for the place you are at this time.  Monthly churn and the “28 day” threshold are nice to know on a tactical level, but LTV is more of a Strategic idea that does not necessarily benefit from analysis at that level.  And you may not really want LTV, but a derivative that might be more helpful.

Read the rest of this entry »

“X Month” Value

November 20th, 2009

The basic concept of LifeTime Value (LTV) was ably outlined by Seth Godin in a great post here.  If you know the average net value of a customer is $2500 over their “Life”, why would you not spend  $50 (or $200, really) to acquire each one?  As long as you stuck to the model, your company would be insanely profitable over time.

Their are 2 primary challenges to implementing this idea.

1.  “Over time” is a concept many management folks have a hard time embracing; what matters are the profits this year, or this quarter, or this month.  Unless the whole company embraces an “over time” measurement approach it is difficult for Marketers and Analysts to drive towards programs and practices supporting the LTV outcome.

2.  The $2500 is an average figure.  Most customers are worth less; 10% or 20% are worth much more.

Most people I talk to embrace the general idea of LTV models intuitively.  It’s really a cash flow concept, isn’t it?

So Financial people get it right away, and if Marketers could align with it, there would be no conflicts and the Marketing budget becomes virtually unlimited.

In fact, many folks in the PPC world follow just this model – they have unlimited budget as long as each conversion costs no more than “X”.  Because the company knows if it spends no more than X on a conversion, it always makes money.   Marketers and Analysts involved with these “Cost < X” PPC programs love them, because Management loves them. 

And Management loves them, why?  Because the CFO loves these programs  Why?  Because they are based on Cash Flow analysis, which CFO’s understand very, very well.

So then, what will it take to get more acquisition budgets like these Cost < X  PPC programs?  We have to address the two challenges above:

Read the rest of this entry »

Member Retention in Professional Orgs

November 4th, 2009

The following is from the October 2009 Drilling Down Newsletter.  Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection?  Just ask your question.  Also, feel free to leave a comment and I’ll reply.

Want to see the answers to previous questions?  Here’s the blog archive; the pre-blog newsletter archives are here.

Q: I have recently purchased your book Drilling Down and going through the many interesting concepts.

A: Thanks for that!

Q:  I work for a membership Organization and we would like to conduct some analysis into who we may lose and approach them even before their membership lapses.  But the only problem here is that we carry data only on the purchases made (though many of our members do not purchase our products and stay a member) and web site visits.

A:  Are you *sure* that’s all the data you collect?  I once worked with a professional membership org that thought they only had one data source, but turns out they had 8 – from 8 different areas of the org – that nobody really knew about.

Q:  How do I know if a particular member is going to resign and lapse soon with this limited amount of behavioral data.  Recently it’s been a concern that we are losing members who have been with us for more than 10 years and who are in their mid career profession (aged between 30 to 45) and indicated no specific reason for resignation. 

This has been going on for the last few months and now we would like to strategically target these customers and approach them even before they react negative.  What concepts could help me to do this? Your guidance would be much appreciated.

A:  OK, my answer will be in two sections: if you (hopefully) find you have more data than you think, and if you really don’t have any other data to fall back on.

Read the rest of this entry »

Relational vs. Transactional

October 2nd, 2009

The following is from the September 2009 Drilling Down Newsletter (original title:  Customer Retention for Restaurants).  Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection?  Just ask your question.  Also, feel free to leave a comment.

Want to see the answers to previous questions?  Here’s the blog archive; the pre-blog newsletter archives are here.

Q:  I am hoping you can help answer a question for our team.  By way of introduction, I am the CEO of XXXX.  We are a specialty retailer / restaurant of gourmet pizza, salads and sandwiches.  We would like to know  restaurant industry averages (pizza industry if possible) for customer retention – What percentage of customers that have ordered once from a particular restaurant order from them a second time?  I am hoping with your years of expertise and harnessing data you may be able to assist us with this question.  Look forward to hearing from you.

A:  Unfortunately, in those said years of experience, I have found little hard information on customer retention rates in QSR and restaurants in general (if anyone has data, please leave in Comments).  It’s just the nature of the business that little hard data, if collected, is stored in such a way that one can aggregate at the customer level.  The high percentage of cash transactions doesn’t help matters much; there’s a lot of data missing.

Over the years, sometimes you see data leak out for tests of loyalty programs, and of course clients sometimes have anecdotal or survey data, but this is not much help in getting to a “true” retention rate.  More often than not you discover serious biases in the way the data was collected so at best, you have a biased view of a narrow segment.  Often what you get is a notion of retention among best customers, or customers willing to sign up for a loyalty card, but not all customers.  And the large “middle” group of customers is where all the Marketing leverage is.

What to do about this predicament?  

There are really two issues in your question; the idea of using industry benchmarks when analyzing customer performance, and the measurement of retention in restaurants.

Read the rest of this entry »

Awareness versus Persuasion

September 23rd, 2009

In the early days of Home Shopping Network (live TV, not online), we were doing some ethnographic research and started to find “physical clusters” of customers – neighbors or people who worked together.  For example, one of these groups was nurses at hospitals,  especially nurses  who worked the night shift. 

We looked for the most active member of the cluster (our “thought leader”) and asked them if they would help us with a “member get a member” program.  Would they be willing to distribute discount coupons to their friends, especially ones who were not already customers?  Time after time, the answer was:

“Honey, all my friends are already customers of yours”.

We launched the program anyway, because it was a pet project from upstairs  – I was a junior marketer at that point so I couldn’t kill it ;)  The program never, ever worked, no matter how hard we tried.  It generated very few new customers while giving lots of discounts to people who were already active buyers.  Basically,  the cost of those discounts overwhelmed the value of the new customers generated.

Apparently a similar thing happens online with Social marketing.

As part of a WAA program that reviews academic research for WAA members, I was able to take a look at a paper titled:  Firm-Created Word-of-Mouth Communication: Evidence from a Field Test by David Godes and Dina Mayzlin.

Read the rest of this entry »

Net Meaningful Audience

September 18th, 2009

 

Not Meaningful

When you’re in the business of measuring the effects of Marketing programs, certain patterns begin expressing themselves over and over.  One of the oldest in the contribution to success of various parts of a Marketing effort, sometimes called the 60-30-10 rule:

60 percent of success is determined by the audience quality
30 percent of success is determined by the offer
10 percent of success is determined by the creative

Where do these stats come from?  Continuous improvement testing.  Over the years, if you run a lot of different tests, you just begin to see this pattern.  And the pattern holds across a very wide variety of business models – online and offline.

The key takeaway here: audience quality is the most important component of success in a results-oriented Marketing campaign.  This is why the CPM’s for niche Magazines, for example, are so high.  These Magazines are tremendously efficient marketing vehicles because they have high audience quality, which drives end behavior – results.

And the primary reason the audience quality is so high?

People pay for these Magazines.  When people pay for something, they value it with more Attention. Why? Simple.

In a magazine like Hot Rod or Concrete Decor or Vogue, the percentage of content that is interesting to the niche audience is very high. In fact, the Advertising is viewed as content.

Smaller audience, very high quality. Ads work like gangbusters.

Clearly, there are other ways to run a media model.  At the opposite end of the media spectrum, there is free.

Read the rest of this entry »

RFM versus LifeCycle Grids

August 28th, 2009

The following is from the August 2009 Drilling Down Newsletter.  Got a question about Customer Measurement, Management, Valuation, Retention, Loyalty, Defection?  Just ask your question.  Also, feel free to leave a comment. 

Want to see the answers to previous questions?  Here’s the blog archive; the pre-blog newsletter archives are here.

Q:  First of all, thank you for the excellent book!  I’m really excited about digging into our own customer data to see what we’ll learn.

A:  Thank you for the kind words!

Q:  However, when you’re creating the RF Scores, what is the standard timeframe you should use?  I have access to about 5 years worth of purchase data – should I create RF scores based on the last 5 years, 3 years, 2 years, 6 months?

Our sales are quite cyclical, so I think the baseline should probably be at least a year, and I’m considering doing two years.  It seems as though if I get too much larger than that, my results will be too watered down. 

I’m also planning on generating “historical” RF scores by filtering my data to reflect the purchases only up to a certain point.  So, to generate a Q1-09 score, I’d create it from sales data of Q1-07 through Q1-09.  The Q2-09 score would be from Q2-07 through Q2-09, etc.  Does this make sense?  It will allow us to see the changes that have been happening in our company even though we’re only just now looking at the data.  It will give me a picture of what it would have looked like, had I looked at it back then.

A:  I think you have accurately understood the situation and have the right approach!  This type of analysis is very sensitive to time frame.

There are really 2 broad types of customer analysis.  There is analysis for action in the present, a Tactical approach driving towards a “we should do this now” result, and the more Strategic analysis, which is informational and says “this is what we should have done then” and / or “this is why we should make these business changes”.  The shorter time frame is Tactical, the longer timeframe Strategic.

Read the rest of this entry »