Category Archives: Web Analytics

KFI’s: Key Forecast Indicators

As I said in my presentation at the eMetrics / Marketing Optimization Summit, if you want to get C-Level people to start paying attention to web analytics, you have to get into the business of predicting / forecasting.  Let’s face it, KPI’s are about the past, right?  You don’t know “Performance” until it has already happened.

But C-folks don’t really care much about what has already happened, because they can’t do anything about it.  What they really want to know is what you think will happen.  For example, ideas like “sales pipeline” – a forecast.  If you start forecasting – and you are right – you will get attention from the C-folks pronto.  The web is a great forecasting tool because it’s so frictionless; it tends to provide tangible signals before many other parts of the business.

So: Do you have any KFI’s – Key Forecast Indicators?

I have one for the Lab Store, and it tripped about 2 months ago.  It’s the Unwanted Exotic Index (UEI).

As part of the Lab Store, we run a moderated board where people who want to give up exotic pets can post the availability, and people looking for exotic pets can post requests.  Typically, the ratio of people giving them up to wanting them is about .25 – for every post looking to give an exotic up, there are 4 posts looking to adopt.

A couple of months ago, this ratio starts popping higher.  A couple of weeks ago it hit 1.25 – for every 5 posts looking to give up an exotic there were 4 posts looking to adopt.  The last time something like this happened was prior to the mini-recession of 2004, when the Unwanted Exotic Index tagged 1.0 for a short time.  After this happened, our sales got soft about 2 – 3 months later.

Why is the UEI predictive?  Let’s go through the logic – my logic, anyway!

Keeping certain types of exotic animals can be a strain on a family, both from a time and money perspective.  They can be high maintenance.  On the margin, as the economy gets tougher and people look to manage household budgets, these pets can get some scrutiny – particularly if kids have lost interest or gone off to college.  So more go up for adoption.  At the same time, requests to adopt fall, as families who might have considered an exotic pet put the “owning decision” on hold.  Taken together, these decisions cause the UEI to spike higher.  Both giving up and deciding not to own exotic pets affects Lab Store revenues “expected” in the future.  So the UEI ends up being predictive of future demand.

Makes sense to me.

Now, I’m a pretty good student of macroeconomics and pay attention to many economic indicators, especially predictive ones like the ECRI’s US Weekly Leading Index.  If you’re an analyst, you should too; economic indicators provide context for any analysis you might have to do, and clients often want to understand the impact of these external issues on their business.

As far as the Lab Store specifically, I don’t usually pay much attention to the macroeconomic cycles.  The pet business tends to be insensitive to the economic cycle; people don’t stop caring for pets as the economy wobbles up and down.  That’s why it’s such a good business – if you can find a niche.  So I don’t get too concerned when I see these predictive macroeconomic indexes forecasting a slowing economy.

However, what we have here with our Unwanted Exotic Index is a confirmation of the broader economic forecasting tools that is specific to our exotic pet business.  That makes me sit up and take notice!  Looks like our business is setting up for a repeat of the 2004 slowdown – the last time the UEI spiked like this.  Why is this important?  Because I can do something with this knowledge.  I can re-allocate and re-prioritize based on this knowledge.  For example, I can move from a “grow bigger” to a “grow smarter” mode.

And please note: this KFI has nothing to do with traffic or sales on the web site; traffic and sales are “rear view”.  By the time you see the sales slow down it will be too late to do anything about it.  And that’s why the C-folks don’t care much about web analytics reports.  

You could track an index like the UEI with a web analytics tool, but you’d have to come up with the idea first.  My point is you will probably have to look outside the usual “rear view” metrics to find one with forecasting ability.  I caution you not to substitute a “survey” for a predictive model; people’s opinions are a notoriously lagging indicator.  You’ll be up to your ears in the slowdown before people start turning bearish.

So: Do you have any KFI’s – Key Forecast Indicators?  Tell us about them. 

If you don’t have any KFI’s, now is the time to start looking for them.  What can you see now that predicts what will happen in the future?  Think about the business, think about the data sources, and put together a bunch of different ideas.  Track them back a couple of years and post them monthly going forward.  You’re bound to find something predictive.  Perhaps something about posting, like the UEI.  Recommendations / comments as a percent of visitors or something like that.

If you’re stuck, start with a simple “engagement” idea – percent visitors / members / customers who visited / logged in / bought in the past 90 days.  If this percentage is falling, so will your business in the next 3 – 6 months.  If your business has a lot of seasonality in it, look to year-over-year comps of the same metric.

If you’ve never played this game before, you won’t have proof your KFI’s work until after the business is in the soup, but you’ll be ready with accurate and actionable KFI’s the next time around!

What’s the Frequency?

Jim answers questions from fellow Drillers
(More questions with answers here, Work Overview here, Index of concepts here)


Q: I ordered your book and have been looking at it as I have a client who wants me to do some RFM reporting for them.

A: Well, thanks for that!

Q: They are an online shoe shop who sends out cataloges via the mail as well at present. They have order history going back to 2005 for clients and believe that by doing a RFM analysis they can work out which customers are dead and Should be dropped etc. I understand Recency and have done this.

A: OK, that’s a great start…

Q: But on frequency there appears to be lots of conflicting information – one book I read says you should do it over a time period as an average and others do it over the entire lifecycle of a client.

A: You can do it either way, the ultimate answer is of course to test both ways and see which works better for this client.

Q: Based on the client base and that the catalogues are seasonal my client reckons a client may decide to make a purchase decision every 6 months. My client is concerned that if I go by total purchases , some one who was  really buying lots say two years ago but now buys nothing could appear high up the frequency compared to a newer buyer who has bought a few pairs, who would actually be a better client as they’re more Recent Do I make sense or am I totally wrong?

A: Absolutely make sense. If you are scoring with RFM though, since the “R” is first, that means in the case above, the “newer buyer who has bought a few pairs” customer will get a higher score than the “buying lots say two years ago but now buys nothing” customer.

So in terms of score, RFM self-adjusts for this case. The “Recent average” modification you are talking about just makes this adjustment more severe.  Other than testing whether the “Recent average” or “Lifetime” Frequency method is better for this client, let’s think about it for a minute and see what we get.

The Recent average Frequency approach basically enhances the Recency component of the RFM model by downgrading Frequency behavior out further in the past. Given the model already has a strong Recency component, this “flattens” the model and makes it more of a “sure thing” – the more Recent folks get yet even higher scores.

What you trade off for this emphasis on more recent customers is the chance to reactivate lapsed Best customers who could purchase if approached.  In other words, the “LifeTime Frequency” version is a bit riskier, but it also has more long-term financial reward. Follow?

So then we think about the customer. It sounds like the “make a purchase decision every 6 months” idea is a guess as opposed to analysis.  You could go to the database and get an answer to this question – what is the average time between purchases (Latency), say for heavy, medium, and light buyers?  That would give you some idea of a Recency threshold for each group, where to mail customers lapsed longer than this threshold gets increasingly risky, and you could use this threshold to choose parameters for your period of time for Frequency analysis.

Also, we have the fact these buyers are (I’m guessing) primarily online generated.  This means they probably have shorter LifeCycles than catalog-generated buyers, which would argue for downplaying Frequency that occurred before the average threshold found above and elevating Recency.

So here is what I would do. Given the client is already pre-disposed to the “Recent Frequency” filter on the RFM model, that this filter will generally lower financial risk, and that these buyers were online generated, go with the filter for your scoring.

Then, after the scoring, if you find you will in fact exclude High Frequency / non-Recent buyers, take the best of that excluded group – Highest Frequency / Most Recent – and drop them a test mailing to make sure fiddling with  the RFM model / filtering this way isn’t leaving money on the table.

If possible, you might check this lapsed Frequent group before mailing for reasons why they stopped buying – is there a common category or manufacturer purchased, did they have service problems, etc. – to further refine list and creative. Keep the segment small but load it up if you can, throw “the book” at them – Free shipping, etc.

And see what happens. If you get minimal  response, then you know they’re dead.

The bottom line is this: all models are general statements about behavior that benefit from being tweaked based on knowledge of the target groups. That’s why there are so many “versions” of RFM out there – people twist and  adopt the basic model to fit known traits in the target populations, or to better fit their business model.

Since it’s early in the game for you folks and due to the online nature of the customer generation, it’s worth being cautious. At the same time, you want to make sure you don’t leave any knowledge (or money!) on the table. So you drop a little test to the “Distant Frequents” that is “loaded” up / precisely targeted and if you get nothing, then you have your answer as to which version of the model is likely to work better.

Short story: I could not convince management at Home Shopping Network that a certain customer segment they were wasting a lot of resources on – namely brand name buyers of small electronics like radar detectors – was really worth very little to the company. So I came up with an (unapproved) test that would cost very little money but prove the point.

I took a small random sample of these folks and sent them a $100 coupon – no restrictions, good on anything. I kept the quantity down so if redemption was huge, I would not cause major financial damage.

With this coupon, the population could buy any of about 50% of the items we showed on the network completely free, except for shipping and handling.

Not one response.

End of management discussion on value of this segment.

If you can, drop a small test out to those Distant Frequents and see what you get. They might surprise you…

Good luck!

Jim

Get the book at Booklocker.com

Find Out Specifically What is in the Book

Learn Customer Marketing Concepts and Metrics (site article list)

Download the first 9 chapters of the Drilling Down book: PDF 

Web Data: Randomly Erratically Variably Unpredictably Incomplete?

So there I am at the eMetrics Summit, sitting with WAA President Richard Foley who also has the impressive title of World Wide Product Manager and Strategist for SAS Institute.?  He asks me what I’m going to talk about for my “Guru” (hate that word) session with Avinash and John Q and I respond with the Accuracy versus Precision thing. You know, that web analytics folks are generally far too obsessed with Accuracy when the data is really too “dirty” to support that obsession.

Well, don’t you know, (and this is 90 minutes before the Guru gig, but I have a Track presentation first), Richard responds, “Web Data isn’t dirty, it’s some of the cleanest data around.”

Hmmm, I think.  This has to be another one of those Marketing / Technology Interface things.  Clearly a semantic rift of some kind.  But he’s a SAS guy, so there must be substance behind this statement!

So we spend the next half hour or so Drilling Down into the meat of the issue.  Turns out none of his analysts would call web data “dirty” because it’s created by machines, don’t you know.  No mistakes.  Data is “clean”.  You haven’t seen dirty data until you start looking at human keystroke input, for example. Think large call centers.  Or how about botched data integration projects. Millions of records with various fields incomplete or truncated.  That’s dirty data.

Dirty, from both an Operational and Marketing perspective, you see.  But web server logs, they might be dirty from a Marketing perspective, but they’re not dirty from an Operational perspective.  They just are what they are; super-clean records of what the server did or the tag read or the sniffer sniffed.

OK, I’m with Richard on this idea, having seen some horrendously dirty data in my time by his definition.  So what do we call web data, if it’s clean?  Even a 404 Error isn’t really “dirty”, right?  It sure is dirty from a customer / user perspective; but from an already widely-used Operational / BI definition, it’s not dirty, it just “is”. 

So how do we get to this idea of all the problems with web data that can lead an analyst down the wrong track if they focus so much on Accuracy they never get Precision?  You know, cookie deletion, network serving errors, crashing browsers, multiple users of a single machine, single users of multiple machines, tabbed browsing, etc. etc. etc.? What do we call that kind of data, if not dirty?

We start going through all the lingo, like trying on different sets of clothes, looking for something that fits.  What other kinds of data are like web data?  What is the precise nature of the “problem” with web data?  We finally arrive at the notion of Incomplete that seems to fit pretty well.  It’s not that the data is dirty, it simply is often “not there” for the end user or analyst, as in missing a cookie, or serving a page that is never rendered in the browser, or a tag that never gets to execute properly.

But that’s not quite it, we decide, because there has been a solution for “incomplete” data around a long time – modeling.  As long as you can get a set of reliable data, you can interpolate or “fill in” the missing data, right?  Like is often done with geo-demographic modeling?

There’s a word, we think – “reliable”.  Web data is certainly not reliable, but that’s not quite it.  Why is it not reliable?

Well, because at a fundamental level, the incompleteness is Random, so it cannot be modeled very well.

And there we have it. 

Web data is not dirty, it is Randomly Incomplete.  A label that works for both the Marketing and Technology folks at the same time.  A beautiful thing, don’t you think?  A great example of being a little “less scientific” on the Technical side and a little “more specific” on the Marketing side, I think.  We wrastled it to the ground.

So I rush off to change the phrase “data is dirty” in my Guru presentation to “data is Randomly Incomplete”.  The panel is right after my Track presentation, so I rush up on stage with Avinash and John Q. We’re late so Avinash starts right away; we don’t even have time to mention to each other what we will be presenting.

Avinash is riffing on Creating a Data Driven Boss and his Rule #2 is:

Embrace Incompleteness

Yikes.  That’s some coincidence, don’t you think?

But more importantly, do you think web data is dirty, Randomly Incomplete, or some other definition?  Because if there are no objections, I’m moving from “dirty” to “Randomly Incomplete” – at least when I talk with BI folks!