Monthly Archives: July 2007

What Data Mining Can and Can’t Do, Part 2

The previous post was about what data mining is good for and what it is not good for, and how to use data mining properly for Marketing efforts.  This post further explains this concept in response to comments received.

Detecting credit fraud, especially with a data set as huge as the one at MCI, is a perfect application for data mining – classification, as in “this is fraud, this is not”.  These are not predictions, they are classifications based on a certain type of behavior that has already occurred.  As long as what a Marketer is really trying to accomplish is classification, then data mining is a great tool.  If you are trying to predict behavior, not so good.

I agree data mining has “real potential is to call attention to things for further investigation” as long as the classification will be actionable, but often times it is not.  There is a great deal of confusion about just what data mining can and cannot do and I’m just trying to bring some clarity to this issue for Marketing folks.

Bottom line: classifying people into “buckets” is not particularly helpful without some end result to act on as a result of having people in these buckets.  Ask yourself: if I know that people differ in a certain way, what will I do with that information, how will I act on it?

The most common mistake in this area is thinking demographics in some way predict behavior.  Demographics are not predictive, they are merely suggestive, yet many marketers cling to demos because that’s what they grew up with.  And then the analysts jump right in and say, “We can segment this population by demographics using data mining!” and you’re right off down the rat hole.  Then the Marketers create programs with an Objective of influencing behavior based on this demographic segmentation and wonder why they don’t work.

I certainly don’t have a problem with using “models” in general to solve Business and Marketing problems – that’s what I do for a living.

What I do have a problem with is the tendency to throw brute force machine learning technology at Marketing problems that ultimately can’t be solved using that particular approach.  It’s a waste of time and money.  Paula, I think this is an area similar to your: “If this is the answer, what was the question?”

Said another way, detecting a behavior and predicting one are very different Objectives, and a lot of what you want to do in Marketing is prediction, not detection; it’s a “when” question, not a “who” question.  Often in Marketing, by the time you know “who”, it’s too late to do anything about it.  So Marketers need to know the probability of, the propensity to, not a classification of “who” after something happens.

On the flip side, if I have a prediction or propensity already, and then you want to tell me “who” they are with data mining, that’s fine, provided that information will make any difference.  And here we get to the crux of my comment: knowing who after I have the propensity usually does not make any difference at all.  On this point I am sure there will be a lot of disagreement, but I urge anybody who disagrees to simply test the hypothesis.  Show me the time, money, and effort spent on finding out”who” created enough economic value to pay off the investment, created incremental profit beyond the profit generated by simply understanding the propensity all by itself.

More data is not the answer; only the right data is required.  Huge numbers of models are not the answer either; just because I can segment doesn’t mean that segmentation is worth anything.  Data / model output can be considered as must know, good to know, nice to know, and who cares?  Machine learning technologies seem to drive much more “who cares” than “need to know” output, and people end up drowning in irrelevant noise.   This is not a fault of the technology, but the application of it improperly.

For most Marketing needs, data mining is like “crop dusting with the SST”, to quote a former CEO I worked for.  Discovering a Marketing problem is typically the easy part and doesn’t require data mining; taking the right action to solve the problem is where the difficulty lies and machine learning is not going to provide that answer, despite many people hoping or believing it is true.

Of course, the inability of many Marketers to understand and communicate the actual problem they are trying to solve, and / or the inability of many technology people to turn those requirements into an actionable solution, is a different story that we won’t begin to address in this forum.  To the extent either one is responsible for the misapplication of a certain technology to solving a problem, oh well, where have we heard that before.

I hope I explained my position more clearly this time!

***** What Data Mining Can and Can’t Do

Timing, Counting, & Choice.  “Most real-world business problems are just some combination of those building blocks jammed together” – Peter Fader

Over at CIO Insight we have this very practical article on Data Mining by Fader.  What it’s good for, what it’s not good for.  If you have wondered how you might use this tool, especially if you are a Marketer, you should read this article. 

I say the article is practical because even though there are many ways to create mathematical models of customer data, if the end result is not something a Marketer can use to actually increase Marketing Productivity, then you really cannot do much with the output.  The models have to create leverage of some kind that can be used to take real world action.  In other words, a model can be “technically correct” but completely useless to a Marketer.

For example, just because you can identify a segment doesn’t mean it is practical or viable to address that segment with a unique marketing treatment.  And just because the segment has unique characteristics doesn’t mean those characteristics create any real marketing opportunity.

Key takeaways for Marketers from this article should be:

1.  Too much data tends to mess up a model.  This is especially true if you try jamming all kinds of demographic crap into a model that is trying to predict behavior.  If you want behavior as an output, use behavioral variables in your models.

2.  Data mining is a great classification tool; it is good at telling you why segments are different.  But in order for this to be useful, you need actionable segments to begin with.  For example, data mining can tell you the demographic differences between people likely to respond versus people not likely to respond – if there is a demographic difference.  But you have to know this “likely to respond” element first.  While we’re on this topic, the same idea holds true for surveys.  If you want the survey output to be actionable, get to known behavioral segments first, then do your surveys of each segment.

Often, people use technical tools for the wrong Marketing reasons.  I see this problem coming down the tracks in web analytics, people are getting so wrapped up in the minutia and the automation of testing they are missing out on the basic stuff.  Just like the data mining wave got people off track and into the bushes with “collecting all the data so we can mine it”.  But it doesn’t matter how much data you have, the tool does what it does and doesn’t do what it doesn’t do.

Check out the article What Data Mining Can and Can’t Do here.

Any thoughts from the Data Miners out there on this?