Monthly Archives: August 2007

PRIZM Clusters Not as Predictive as Behavior

Jim answers questions from fellow Drillers
(More questions with answers here, Work Overview here, Index of concepts here)


PRIZM Clusters Not as Predictive as Behavior

Q: I am on an interesting project (and my first DB Mktg one): the client has a large loyalty program, and loves his PRIZM clusters. However, when I told him a little more about Recency and suggest that we spread all members across based on it, he was surprised to see that his PRIZM segments were not a predictive indicator at all!

A: Yes, and here is something many people don’t realize about PRIZM and other geo-demo programs, including census-driven. They were developed for site location – where should I put my Burger King, where should I put my mall? They are incredibly useful for this. However, think about all the sample size discussions for web analytics related to A/B testing, and now imagine what your PRIZM cluster looks like.

In most cases, you are talking about 1 or maybe 2 records in a geo location – what is the likelihood these households reflect the overall “label” of the PRIZM cluster? Combine this with the fact that for customer analysis, demographics are generally descriptive or suggestive but not nearly as predictive as behavior and you have a bit of a mess.

Here’s a test for you. It only requires rough knowledge of your neighbors, so should not be very difficult (for most people!)

1. What is your “demographic”?
2. If you were to walk around the block and knock on doors, how many households would you find that are “in your demographic”?

Right. Maybe a handful, unless you live in a brand new housing development or other special situation.  Now think about walking your zip code, or walking out 10 blocks or so from your house in any direction, and knocking on doors. Do you find most of these people are in the same demographic as you are? Did you ever find the “cluster average” neighbor?

We certainly know from web analytics that dealing with “averages” can be very dangerous indeed. So too with taking a demographic “average” of a zip or other area and tying it to a specific household. The model falls apart at the household level of granularity.

So now what to you think of all those websites and services that claim to know demographics based on a zip code they captured?

Now, if you think about an e-commerce database, with most records being one of a very few in a zip or cluster, you can see how the cluster demos would really break down at the household level.

Again, nothing wrong with using these geo-demo programs for what they were intended to be used for. When you are looking for a mall location or doing urban planning they can be very helpful. But the match rates at the individual household level are poor.

Couple this with the fact that e-commerce folks are usually looking for behavior from customers, and the fact demographics are not generally predictive of behavior by themselves, and you have yourself analytical stew.

Better than nothing?  Absolutely, and for customer acquisition, sometimes all you can get. Best you can be? Not if you have the behavioral records of customers. In fact, what we often see is a skew in the demographics being called “predictive” when the underlying behaviorals are driving action.

In other words, let’s say a series of campaigns generates buyers with a particular demo skew. A high percentage of these Recent responders then respond to the next promotion. If you look just at the demos, you would see a trend and declare the demos are “predictive” of response, even though they are incidental to the underlying Recency behavior.

I suspect something like this was going on with your client. Not looking at behavior, over time the client becomes convinced that the PRIZM clusters are predictive, when for some reason they are simply coincident in a way with the greater power of the behavioral metrics. Given the client has behavioral data, that should be the first line of segmentation.

Q: After reading you for some years, I now understand how one must be very careful with psycho-demographics.

A: Well, at least one person is listening!  And now you have seen how this works right before your very own eyes.

I think this situation is really a function of Marketers in general being “brought up” in the world of branding / customer acquisition. Most Marketers come up through the ranks “buying media” or some other marketing activity that focuses on demographics to describe the customer. And most of the college courses and reading material available focus on this function, so even the IT-oriented folks in online marketing end up learning that demographics are really important. And they can be, when you don’t know anything about your target.

Then the world flips upside down on you, and now people are looking at customer marketing, and that’s a whole different ballgame. The desired outcome is “action” that can be measured and the “individual” is the source of that outcome, as opposed to “impressions” and “audience”.

In the past, if your tried and true weapon of choice for targeting was demographics, that is what you reach for as you enter into the customer marketing battle.Problem is, it’s just not the best weapon for that particular marketing engagement.

Jim

Get the book at Booklocker.com

Find Out Specifically What is in the Book

Learn Customer Marketing Concepts and Metrics (site article list)

Download the first 9 chapters of the Drilling Down book: PDF 

More Tips on Evaluating Research

To continue with this previous post…other things to look for when evaluating research:

Discontinuous Sample – I don’t know if there is a scientific word for this (experts, go ahead and comment if so), but what I am referring to here is the idea of setting out the parameters of a sample and then sneaking in a subset of the sample where the original parameters are no longer true.  This is extremely popular in press about research.

Example:  A statement is made at the beginning of the press release regarding the population surveyed.  Then, without blinking an eye, they start to talk about the participants, leaving you to believe the composition of participants reflects the original population.  In most cases, this is nuts, especially when you are talking about sending an e-mail to 8000 customers and 100 answer the survey. 

Sometimes it works the other way, they will slip in something like, “50% of the participants said the main focus of their business was an e-commerce site”, which does not in any way imply that 50% of the population (4000 of 8000) are in the e-commerce business.  Similarly, if you knew what percent of the 8000 were in the e-commerce business, then you could get some feeling for whether the participant group of 100 was biased towards e-commerce or not.

Especially in press releases, watch out for these closely-worded and often intentional slights of hand describing the actual segments of participants.  They are often written using language that can be defended as a “misunderstanding” and often you can find the true composition of participants in the source documentation to prove your point. 

The response to your digging and questioning of the company putting out the research will likely be something like, “the press misunderstood the study”, but at least you will know what the real definitions of the segments are.

Get the Questions – if a piece of research really seems to be important to your company and you are considering purchasing it, make sure the full report contains all the research questions

I can’t tell you how many times I have matched up the survey data with the sequencing and language of the questions and found bias built right into the survey.  Creating (and administering, for that matter) survey questions and sequencing them is a scientific endeavor all by itself.  There are known pitfalls and ways to do it correctly, and people who do research for a living understand all of this.  It’s very easy to get this part of the exercise wrong and it can fundamentally affect the survey results.

So, in summary, go ahead and “do research” by e-mailing customers or popping up questionnaires, or read about research in the press, but realize there is a whole lot more going on in statistically significant, actionable research than meets the eye, and most of the stuff you read in the press in nothing more than a Focus Group.

Not that there is anything inherently wrong with a Focus Group, as long as you realize that is what you have.

Research for Press Release

I think one of the reasons “research” has become so lax in design and execution is this idea of doing research to drive a press release and news coverage. Reliable, actionable research is expensive, and if all you really want to do is gin out a bunch of press, why be scientific about it?  Why pay for rigor?  After all, your company is not going to use the research to take action, it’s research for press release.

So here’s a few less scientific but more specific ideas to keep in mind when looking at a press release / news story about the latest “research”, ranked in order of saving your time.  In other words, if you run into a problem with the research at a certain level, don’t bother to look down to the next level – you’re done with your assessment.

Press about Research is Not Research – it’s really a mistake to make any kind of important decision on research without seeing the original source documentation.  For lots of reasons, the press accounts of research output can be selectively blind to the facts of the study. 

If there is no way to access the source research document, I would simply ignore the press account of the research.  Trust me, if the subject / company really had the goods on the topic, they would make the research document available – why wouldn’t they?  Then if / when you get to the research source document, run the numbers a bit for your self to see if they square with the press reports.  If not, you still may learn something – just not what the press report on the research was telling you!

Source of Sample – make sure you understand where the sample came from, and assess the reliability of that source.  Avoid trusting any source where survey participants are “paid to play”.  This PTP “research” is often called a Focus Group and though you can learn something in terms of language and feelings and so forth from a Focus Group, I would never make a strategic decision based on a non-scientific exercise like a Focus Group. 

Go ahead and howl about this last statement Marketers,  I’m not going to argue the fine points of it here, but those wish to post on this topic either way, go ahead.  Please be Less Scientific or More Specific than usual, depending on whether you are a Scientist or a Marketer. 

For a very topical and probably to some folks quite important example of this “source” problem, see Poor Study Results Drive Ad Research Foundation Initiative.  If you want a focus group, do a focus group.  But don’t refer to it as “research” in a scientific way.

Size of Sample – there certainly is a lot of discussion about sample sizes and statistical significance and so forth in web analytics now that those folks have started to enter the more advanced worlds of test design.  Does it surprise you the same holds true for research?  Should’t, it’s just math (I can feel the stat folks shudder.  Take it easy, relax).

Without going all math on this, let’s say someone does a survey of their customers.  The survey was “e-mailed to 8,000 customers” and they get 100 responses to the survey.   I don’t need to calculate anything to understand the sample is probably not representative of the whole, especially given the methodology of “e-mailed our customers”.  Not that a sample of 100 on 8000 is bad, but the way it was sourced is questionable.

What you want to see is something more like “we took a random sample of our customers and 100 interviews were conducted”.  It’s the math thing again.  Responders, by definition, are a biased sample, probably more of a focus group.  This statement is not always true, but is true often enough that you want to verify the responders are representative.  Again, check the research documentation.

OK Jim, so how can political surveys be accurate when they only use 300 or so folks to represent millions of households?  The answer is simple.  They don’t email a bunch of customers or pop-up surveys on a web site.  They design and execute their research according to established scientific principles.  Stated another way, they know exactly and specifically who they are talking to.  That’s because they want the research to be precise and predictive.

How do you know when a survey has been designed and executed properly?  Typically, a confidence interval is stated, as in “results have margin of error +- 5%”.  This generally means you can trust the design and execution of the survey because you can’t get this information without a truly scientific design (Note to self, watch for “fake confidence level info” to be included with future “research for press release” reporting).

More rules for interpreting research