This article was written after an article ran describing how “predictive modeling techniques outperformed Recency-Frequency-Monetary value (RFM) targeting in a back-to-school campaign.” I received a ton of e-mail asking for an explanation of this confusing claim.
For those of you not well versed in what behavioral modeling is all about, this article provides a look inside and addresses some very Frequently Asked Questions on modeling.
For those looking for some resolution on issues brought up in the DM news article, I decided to just write this response and point all the queries to it (saves much typing!). Thanks to all the fellow Drillers out there who thought there was something a bit off in this article.
Let me make it clear upfront that I don’t know either company involved and am not making any judgments on the way this promotion was designed or executed. I do however have a problem with the presentation of the article, especially the opening paragraph – “predictive modeling outperformed RFM” – which at best is very misleading based on the facts provided, and at worst is an intentional obscuring of the facts to push a particular agenda.
The following is my best guess as to what is going on here and why the results ended up as they did based on the facts provided.
RFM as Straw Man?
Think about this campaign: it was a back-to- school promotion. It’s held at a fixed point in time, happens every year. The people running the campaign seem to have a lot of experience using RFM, both on the agency and client side.
One thing they should know given the type of promotion and experience of the players is this: RFM is not a valid scoring approach for at least one segment of the population – heavy cyclical buyers. These are the folks who are primarily promotional buyers, not “regular customers.” Given back-to-school is the first major promotion in the retail calendar, it may have been quite some time since these promotional buyers had made a purchase in the promoted categories – perhaps since the after holiday blow-out sale.
Knowing all this, they would certainly be aware RFM scoring would demote this promotional buyer because they are not “Recent.” So a sub-optimal scenario is set up relative to the usage of RFM scoring. RFM has at least one hand tied behind its back on this promotion, because some (perhaps many, high volume) known heavy buyers are intentionally excluded. Under these conditions, it’s not surprising just about any model, including “let’s mail to heavy buyers who bought last year” would beat RFM if you were in fact mailing the entire population in a controlled test.
So let’s look at some possible scenarios to explain the results claimed in this case.
They’re smart, but awful case writers, or
the case was edited and many of the key facts people would want to know excluded
There is no mention of methodology in this case, not even the phrase “controlled test” and there are no ROI comparisons. To make the statement about “beating RFM” one would expect some shred of evidence besides the top line “spent 2.5 times more per direct mail piece than those chosen through RFM.” OK, but what was the profit comparison? How much did the model cost? Was there discounting, and if so, what about subsidy costs? Were control groups used to measure subsidy costs? And on and on. You get what I mean. If this group included heavy cyclical buyers, my first question is this: how many of them would have bought anyway without mailing them? If you don’t know the answer to this question, any claims become suspect.
They’re smart, but not terribly honest
There were 40,000 customers chosen with the model and they “would not have mailed to any of the 40,000 using RFM.” Yet they mailed 60,000 customers using RFM. Why? If the model was so much better at selecting targets, why use RFM at all, and in such a big way? Clearly, they mailed a lot more people than you need to execute a controlled test.
This is not normally how one would execute this promotion – unless one knew they were working with different populations (one Recent, one not) and used different scoring approaches for each. If this is the way they did it, that’s smart. But in no way does it support the statement “predictive modeling outperformed RFM”; different groups were scored differently, and the gig was rigged. Any claims under this scenario could be assumed to be intentionally designed to mislead a reader, or represent a significant lack of experience on the part of people making this kind of claim.
They’re not as smart as they seem
Serendipity is a wonderful thing and my favorite part of direct marketing. Yea, it’s all pretty scientific, but sometimes you just get results that you didn’t expect or plan for – one way or the other. What if they simply said, “Hey, let’s run a model on everyone we didn’t pick with RFM, and see what happens if we mail them.” Essentially a model test, but with a huge percentage of the population, which is a bit strange if you don’t already have “gut feel.”
In this case, they were not thinking of the heavy cyclical buyers at all, and not thinking of the obvious impact of using RFM scoring on a population “rigged” to fail – they would simply run a model and follow the output. And it worked very well, because the model teased out a pretty obvious mailing strategy from the customer base (as models frequently do). They simply were not aware of and had not thought of the implications underlying the results and made an inappropriate comparison.
In fact, look at the parameters of this model they provided us with:
- customer purchase behavior, such as the average number of months between purchases and the amount spent
Well folks, that’s a Latency model if I’ve ever heard one, and certainly implies this group had a Recency problem, at the very least. RFM would be rigged to fail under this scenario
- only a half a percent the model selected were previous junior apparel buyers or previous children’s apparel purchasers
Hard to tell what this means without knowing the full story, but here’s one thought – product history didn’t matter a bit. These were just buyers who bought whatever, whenever prompted at the right time with the right offer – the classic sign of a discount prone, highly subsidized promotional buyer.
In this scenario, the players are innocent of any intentional malice – but still cannot make any claims about modeling versus RFM. They intentionally created two populations and scored them differently, and got rewarded for trying something new. Hey, that’s great!
OK, now that we’ve gone through these examples, let me address some issues on RFM and custom modeling in general. Hopefully, this information will be of value to people when they are faced with interpreting data and making decisions in the analytics area.
You say Tomato, I say Celery
Let’s talk briefly about populations and target selection. Those of you who know RFM and response models in general know they are ranking systems. They rank the likelihood of people to respond to the promotion, from highest likelihood to lowest likelihood. People at the “top” of the ranking are the very most likely to respond; people at the bottom of the ranking are the very least likely to respond. Offline, the top 20% of the ranking usually has a response rate from 5 to 40 times higher than the bottom 20% of the ranking. Online, the difference is even greater.
On any scored population, RFM or customer model, I can select how far down into the ranking to mail. Do I want to mail the top 10% most likely to respond, the top 20%? As you include more and more people, the average likelihood to respond drops rapidly.
If you mailed deeply into an RFM scored population, let’s say covering the top 50% of the rankings, and did a very shallow mailing to the custom model population, say covering the top 10% of the rankings, then I have no doubt in my mind you could get the per mailer results and comparative stats mentioned:
“Names selected using predictive modeling had a four times higher average monthly spending rate… a three times higher purchase rate… spent 2.5 times more per direct mail piece than those chosen through RFM.”
“Selected” is the operative word here. If only the best and most likely to respond were selected using the model, but on the RFM side you mailed much more deeply into the scores, including lots of people with lowered likelihood to respond, you end up with a completely self-fulfilling prophesy, not a “predictive model that beats RFM.” Not even close.
I’m not saying this happened in the article we just looked at. I’m saying a statement along the lines of “the top 20% most likely to respond groups in both the RFM and custom model populations were selected” is something you always, always look for when you are in this space. If you have people pitching you any kind of analytics, make sure you are dealing with fair comparisons. You can make anything look fantastic by fooling with the knobs and levers in the background.
Folks, RFM is a predictive model. It predicts behavior based on past activity; RFM is no different in that respect than a “predictive model” you paid some modeler $50,000 for. So to make the statement “predictive modeling beat RFM” is just a bit circular in the first place, and one wonders what the intent of making a statement like that could be. If you said “A Latency model beat a Recency model in a Seasonal Promotion” then I’d have no problem with that at all, but would wonder why it’s a news item. As explained above, it’s pretty much common sense.
Latency is nothing more than Recency with a twist; instead of counting “days since” using today, you count “days since” using a fixed point in time. Latency can work much better than Recency when there are external cyclical factors involved – like seasonal promotions.
For example, if you have not filed a tax return Recently, it does not mean you are less likely to file one in the future. All it means is there is an external cyclical event (April 15th in the US) controlling your behavior. If you had not filed one in 18 months (18 months Latent), then I would start to question likelihood to file.
The optimum solution is often to use RFM (Recency Frequency Monetary) and LFM (Latency Frequency Monetary) in tandem targeting the appropriate populations, as was (apparently) done in this promotion. Smart.
Crop dusting with the SST
If you are not doing any data modeling at all, the ROI of implementing an advanced model can be substantial. But the real question is this: will the improvement gained by using an advanced predictive model be enough to cover the cost of it relative to the improvement gained by using a simple model?
Given that most advanced “response models” like the one in the article use Recency or Latency and Frequency as the primary driving variables, it’s a valid question to ask. Here’s a dirty little modeling secret: most, if not every “response model” built includes Recency / Latency and Frequency as primary variables, whether created “top down” by a human or “bottom up” by a machine (so called data mining). The primary difference is this: they add 3rd, 4th, 5th etc. variables which incrementally improve ROI – all else equal.
In other words, RFM is the low hanging fruit, often buying you 10x or 20x response rate improvement. You want the next 10%? Get a custom model, and make sure the price you will pay is worth the diminishing returns.
Just because RFM is a simple, easy to implement, standardized predictive model, people pick on it. They want you to pay through the nose for a “good model” because, my simple friend, you could not possibly do any modeling yourself. Now, am I saying that RFM is better than a model created by a roomful of modelers? Of course not. The question, as always, is this: will it improve your performance enough to cover the modeling cost; what is the ROI?
RFM Slandered – again?
Speaking of picking on RFM, I was wondering what’s up with this statement in the article:
“When working with RFM, you are really only looking at three elements, and you never get to see the rest of the prospects in a database that have other characteristics that could lead them to become buyers in a given area.” Well, that may be the way they use RFM, but it’s certainly not the only way.
There is no reason you can’t load up on any variable you want with RFM scoring. Those who have read my book know this approach is fundamental to the Drilling Down method. RFM is the Swiss Army knife of behavioral models, and can be used in very many ways. Choosing to use the original, pre-computers, late 1950’s version of RFM is simply that – a choice. Or you could choose to use a totally bastardized version from who knows where. Like any tool, you need to really know how to use it to get the most out of it.
I’m a solution, I’m a problem
To conclude, I have nothing against custom models. I use them when appropriate. I have nothing against the design or execution of the promotion. I have a big problem with the way the article was presented, resulting in a claim appearing to lack sufficient backup.
Those of us in the modeling space need to help people understand how behavioral modeling works by presenting clear and clean examples. Fast and loose “cheerleading” is what got the CRM folks into the mess they are in, and we don’t want Business Intelligence or Customer Analytics or whatever “space” we are in this month to experience the same fate.
If anyone, including the original retail and agency players in the article above have comments on my analysis or in general on this topic, I’d be glad to post them. Heck, if the players have “the whole” case study available for review, I’ll provide the download link right here. It was a sweet promotion, really.
But what I want to know is exactly what happened with all the glorious details – so I can learn something from it, or use the stats to confirm what I already know. And we owe those folks just beginning to get a grip on behavioral modeling the same courtesy.
That’s why we’re all here. To learn.
Download the first 9 chapters of the Drilling Down book: PDF