Picture the scene: you go to your local supermarket and buy ten items. As the items are scanned by the shop assistant, each one is logged in a computer database. When you complete your purchase and leave the store, a record of what you bought remains. Although, it’s not actually a record of what you bought at all. It’s a record of what somebody bought.
Next week, you go back to the same store and buy 25 items. Three of these items match three items from last week’s visit. Now the computer knows that you’re likely to buy these three items together. Except that it doesn’t know that: all it knows is that someone might like to buy these three items together.
That same computer receives similar purchase reports from the supermarket chain’s 1,000 stores. Each store serves – say – a thousand shopping visits a day. So each week, one computer receives anonymous market reports on 7 million shopping visits. That data is vital to the supermarket for planning supply and for optimizing its service.
Very few people complain about stores collating this kind of data, and for good reason. The data is anonymous (assuming you eschew loyalty cards). So the focus of collecting this data must be understanding customers’ shopping habits. Not you, singular, but you plural.
The internet works in a similar way. The top 1,000 websites provide a vast amount of services and content, generally for free. Together, the top 1,000 websites employ thousands of people. You might think that these people are volunteers, or that they’re paid by wealthy internet philanthropists complete with top hat and monocle, who fund the web out of the goodness of their own hearts. Of course, that’s not how it works.
The internet is a marketplace, where a lot of the top 1,000 websites are making money despite the fact they offer generally free services and/or content. A large part of the money in this marketplace is generated by advertising, though lots is generated in traditional product sales as well. These two business lines are very different but they share one main need: accurate data about what users are doing. The more data a company has, the better it knows its users.
A website with more than a few million visitors a month will be spending plenty of time analyzing its traffic and visitors, from a technical point of view (shaping traffic, predicting peaks), from a content point of view (“Did the change we made to the homepage affect bounce rate?”) and from a business point of view (“What sort of users tend to convert, and what can we do to boost that sort of traffic?”).
But none of these people will be looking at you personally and saying “Hey, I see Bob’s come in from Google again”. That’s because a major website would need to employ tens of thousands of people just for that sort of snooping data analysis. And frankly, you’re just not worth that much. That is, you – the single user who might be worried about privacy because you read a hundred articles a month on websites that should know better – you, the individual user – are not that interesting from a statistical point of view.
That’s what statistics are all about and once you understand that, you’ll be able to start using Facebook again and sleep peacefully in the knowledge that for a tiny price: a snippet of anonymous data about where you go on the web and what you click… you, an unknown and statistically uninteresting grain of sand – get to use loads of cool websites for free. It’s not a bad deal.