Big Data: Can We Predict Population Trends (Like Happiness) via Health Apps?
More than five years ago, I penned a piece entitled Reliability and Validity in a Web 2.0 World. It spoke about the concerns of gathering data from biased samples — without first understanding in what ways, exactly, those samples may be biased.
Now, with the ubiquity of apps — downloadable programs for people’s smartphones — I’m seeing the same problem arise. Developers and entrepreneurs are pursuing data from these apps without understanding the basics of good, reliable, scientific data collection. And why it matters — especially when you start wanting to analyze all of this “big data” (a somewhat silly term… in epidemiology, for instance, scientists just call it “data”).
Can personal health data be collected by these apps without bias, and somehow be transformed into measuring something bigger?
The short answer: no, not easily.
Sure, there are people who are part of a “quantified self” movement — who want to track and measure every aspect of their personal health (and assumedly, mental health). But those people are currently1 outliers, and in no way representative of the population in general.
Such minorities can quickly make up the majority of an effort to collect larger datasets in order to analyze health or well-being trends. While the resulting analyses can tell you something about this group of people, it would be inappropriate to suggest it generalizes to the rest of the population (who, demographically and behaviorally, may look and act very differently).
This won’t change anytime soon, because most health apps are downloaded by people, used once or twice, and then abandoned. There’s a reason most people stop using health apps — especially ones meant to act as a data diary. They’re boring! Collecting data on yourself is just a very boring task for most of us to commit to actively doing every day (or even every week).
The Complicated Answer: Apps Need to Be Smarter, Connected
Health apps meant to collect data ultimately fail because they require active input by the user. This is why personal health records have largely never taken off in any meaningful way.2 People are too busy living their lives to be bothered with telling an app3 what their daily metrics are.
For health apps to ultimately succeed where most other attempts at personal health tracking software has failed is for them to collect their data passively. That means that no input from the user is required.
Of course, we’re a far way from such metrics providing meaningful data. Sure, there are running devices that track how much you run (from Nike, of course). But a running app is useless if it doesn’t talk to my diet app, or my nutrition app, or my exercise app. Or my mindfulness app. It’s one app measuring a single metric in the complex being that is me. It simply isn’t much to go on.
Trust is a Key Cornerstone
Adoption of such networked apps sharing all of your health data have another, less technical, obstacle as well — trust. Companies like Facebook and Nike ultimately answer to only one set of people — their shareholders. That means that if it’s in their best interests to analyze your data for things they can make money off of, they will.
Startups are no better, because instead of shareholders, they answer only to venture capitalists — money lenders who are only looking for the best and quickest return on their investment.
Why would I want to trust my health information — data that could be used against me for future denial of insurance or setting of my insurance rates — to companies who have little interest in protecting my privacy?
Which brings us back again to the first point — a biased sample. People who gladly give all of their health information to for-profit companies to analyze, collate, and eventually associate back to you (even if such data is initially anonymized) are not like most people. Most of us still care about keeping our health information to ourselves, just as most of us still want to keep our financial information to ourselves.
Where We Go from Here
Attempting to gather population-based data (e.g., conducting epidemiological research) from health apps has some issues and opportunities I’ve identified:
- Biased sampling because of the tiny minority of people who actively and continuously use health apps
- Sampling and continued usage could be improved by passive versus active data collection
- Sampling and use could be further improved by use of a trustworthy authority to collect and store data (not a for-profit company or startup)
- Apps that are aware of one another and exchange relevant health data about me are the next generation — instead of the current wealth of siloed, unaware (stupid?) apps
I think it’s great that developers look at a health problem, develop an app for it, and release it to the world. But all too often these apps go nowhere, with no audience. Or they are orphaned by the original developers for lack of interest. The few popular health apps that gain a robust audience are the exception, not the rule. And even when they do gain widespread acceptance, just like our country’s electronic medical record systems, they don’t talk to one another.
If you want to be able to say something authoritative or meaningful about data collected from an app, you have to show that data comes from a representative sample of the population. Lacking that, your data only tells us about one tiny group in the population — one that doesn’t look like most of us.
Grohol, J. (2018). Big Data: Can We Predict Population Trends (Like Happiness) via Health Apps?. Psych Central. Retrieved on March 31, 2020, from https://psychcentral.com/blog/big-data-can-we-predict-population-trends-like-happiness-via-health-apps/