Monthly Archives: July 2007

What Data Mining Can and Can’t Do, Part 2

The previous post was about what data mining is good for and what it is not good for, and how to use data mining properly for Marketing efforts.  This post further explains this concept in response to comments received.

Detecting credit fraud, especially with a data set as huge as the one at MCI, is a perfect application for data mining – classification, as in “this is fraud, this is not”.  These are not predictions, they are classifications based on a certain type of behavior that has already occurred.  As long as what a Marketer is really trying to accomplish is classification, then data mining is a great tool.  If you are trying to predict behavior, not so good.

I agree data mining has “real potential is to call attention to things for further investigation” as long as the classification will be actionable, but often times it is not.  There is a great deal of confusion about just what data mining can and cannot do and I’m just trying to bring some clarity to this issue for Marketing folks.

Bottom line: classifying people into “buckets” is not particularly helpful without some end result to act on as a result of having people in these buckets.  Ask yourself: if I know that people differ in a certain way, what will I do with that information, how will I act on it?

The most common mistake in this area is thinking demographics in some way predict behavior.  Demographics are not predictive, they are merely suggestive, yet many marketers cling to demos because that’s what they grew up with.  And then the analysts jump right in and say, “We can segment this population by demographics using data mining!” and you’re right off down the rat hole.  Then the Marketers create programs with an Objective of influencing behavior based on this demographic segmentation and wonder why they don’t work.

I certainly don’t have a problem with using “models” in general to solve Business and Marketing problems – that’s what I do for a living.

What I do have a problem with is the tendency to throw brute force machine learning technology at Marketing problems that ultimately can’t be solved using that particular approach. It’s a waste of time and money.  Paula, I think this is an area similar to your: “If this is the answer, what was the question?”

Said another way, detecting a behavior and predicting one are very different Objectives, and a lot of what you want to do in Marketing is prediction, not detection; it’s a “when” question, not a “who” question.  Often in Marketing, by the time you know “who”, it’s too late to do anything about it.  So Marketers need to know the probability of, the propensity to, not a classification of “who” after something happens.

On the flip side, if I have a prediction or propensity already, and then you want to tell me “who” they are with data mining, that’s fine, provided that information will make any difference.  And here we get to the crux of my comment: knowing who after I have the propensity usually does not make any difference at all.  On this point I am sure there will be a lot of disagreement, but I urge anybody who disagrees to simply test the hypothesis.  Show me the time, money, and effort spent on finding out”who” created enough economic value to pay off the investment, created incremental profit beyond the profit generated by simply understanding the propensity all by itself.

More data is not the answer; only the right data is required.  Huge numbers of models are not the answer either; just because I can segment doesn’t mean that segmentation is worth anything.  Data / model output can be considered as must know, good to know, nice to know, and who cares?  Machine learning technologies seem to drive much more “who cares” than “need to know” output, and people end up drowning in irrelevant noise.   This is not a fault of the technology, but the application of it improperly.

For most Marketing needs, data mining is like “crop dusting with the SST”, to quote a former CEO I worked for.  Discovering a Marketing problem is typically the easy part and doesn’t require data mining; taking the right action to solve the problem is where the difficulty lies and machine learning is not going to provide that answer, despite many people hoping or believing it is true.

Of course, the inability of many Marketers to understand and communicate the actual problem they are trying to solve, and / or the inability of many technology people to turn those requirements into an actionable solution, is a different story that we won’t begin to address in this forum.  To the extent either one is responsible for the misapplication of a certain technology to solving a problem, oh well, where have we heard that before.

I hope I explained my position more clearly this time!