Causation Vs. Correlation


May 24, 2006

A while back, I heard a report on the radio. Scientists had found four factors that were associated with breast cancer. One of them was a high-level of education. So does that mean that if I skip college and grad school, I am less likely to have breast cancer?

The correlation between education and breast cancer is just that — it is correlation, not causation. We find them together, but that doesn’t mean that one causes the other. In fact, there is at least one (and perhaps more) variables that are lurking in the background which is really causing the cancer. For example, highly-educated women may be less likely to have children when they are very young, and it is may be the child-bearing act that affects cancer.

I use breast cancer to illustrate lurking variables because it is so clear — we know that we can’t skip college and avoid the problem. Online, the issues of causation and correlation (and lurking variables) are just as important, but often not as clear.

For example, I usually see that visitors who spend a long time online (over half an hour) are much more likely to convert than those who spend only 15 minutes. My very first question is about correlation vs. causation. Does the length of time on the site actually cause the conversion (“Well, I’ve wasted the last half hour on this dumb site, let me just buy what I need and move on”)? If that were true, we would work as hard as we could to keep visitors on our site before they convert, because it would increase the chances of them converting. Or does the conversion cause the long time on the site — it’s hard to make a purchase, or the individual needs to learn a lot before he can push the “submit” button, so the interested visitor ends up spending a long time?

Web analytics show us what, and not “why.” However, about a year ago, I posed this question to Dr. Alan Montgomery, who is a professor of clickstream analysis at CMU. His answer? “It’s a little bit of both.”

Robbin Steif