More and more companies are tapping into the wisdom of their customers and users — a very select crowd. They do this through “big data” — collecting treasure troves of anonymous data and then running post-hoc analyses on it.
This effort can lead to some interesting insights. It can also cause companies to suggest that the results are generalizable to the entire population.
And it’s this latter issue that’s the problem. Because if you start out with a self-selected sample, your data are only relevant to people like them — not the whole population. That’s just one of the problems with measuring — and taking action — based upon information from the select crowds.
Websites have been doing “big data” measurements for nearly 20 years now. Every time you visit a website, it leaves a small data trace on the website’s server. The owners of the server take this data and run it through a data analytics platform (like Google Analytics). It gives the website owner aggregate information about the types of people who visit their website.
Since every website is unique, such insights are only relevant to that website. A user who visits CNN, for instance, may have little in common with a user who visits Match.com.
The Select Crowds Problem
In data analysis, statisticians call such sampling a “self-selected sample,” which results in the problem of a “self-selection bias.” Simply put, this means that because your data comes only from people who use a particular app or kind of social media, it’s not representative of the population as a whole. And since it’s not representative of the population as a whole, you cannot generalize about the data.
I call this the “select crowds” problem. Because if you’re gaining your wisdom from the crowd, you’d better make sure that crowd is representative of the population if you’re trying to gain generalizable insights from it.
There are entire companies who do nothing but analyze trends and data from Twitter. But if you look at who uses Twitter — and how they use it — you’d immediately be concerned about what such data really means. For instance, Twitter users are a lot younger than the general population, and older people are greatly under-represented. If you’re running a company looking at health trends on Twitter, you’re going to see something very different than if you conducted a randomized telephone survey.
In other words, what trends on Twitter may or may not have any meaning to the 80+ percent of Americans who don’t use Twitter.
Apps Are No Better
Apps often like to collect their user’s data, anonymize it, and then use it to compare your performance against others who are also using the app. This is supposed to make you feel like you’re part of a social network that has the app in common. It’s a great idea.
Because what if only a certain type of person uses that particular app? What if only depressed people use a mood tracking app meant to help lift people out of their depression by helping them track their moods, comparing their progress with others who also use the app? Such results could be unintentionally depressing in and of themselves.
Can you positively motivate someone through social comparison? You can, but all too often, the research also shows that such social comparisons lead people to feel worse off than before. It has to be done exquisitely carefully — something most typical app developers doesn’t understand.
Leaving Out Important Things to Measure
Any app or service is only as good as the stuff it chooses to measure. You can introduce bias — intentionally or unintentionally — into your results by what you choose to measure — and not measure.
Think of it like this: you’re thinking of moving to a new city with less rain, so you only look at the average annual amount of rain for different cities. You’d look up a city like Miami and think, “You know, I’m not moving to Miami — they get nearly 62 inches a year of rain! Compare that to the meager 37 inches of rain Seattle gets. Seattle’s got to be the sunnier, less rainy place.” Since you didn’t include other important metrics in your measurement, you’d make the wrong choice based upon too limited information.
What an app or website developer thinks is important in the measurement of something may not actually be as important as something that they left out. Imagine an app that only measured your reaction to medication, but left out all the other important factors contributing to your mood and treatment.
Treatment doesn’t take place in a vacuum with you and a single medication. It takes place in a rich, complex ecosystem that may include a medication, but also includes a lot of other important things you’re doing to help yourself recover. It could be how much you exercise, or not ruminate, or days you go without having a panic attack, or being stressed out about a family member or work.
In short, there are are a myriad of things that should be tracked by apps and other well-meaning services, but aren’t. And this gives a distorted perspective of how something that is being measured is connected to one’s mood or recovery progress. Medication is indeed important in many people’s treatment, but it may not — and often is not — the most important thing.