Imagine doing an online survey to get a better handle of your target audience and throwing out 20% of the responses. Crazy? Well actually if you’re not throwing out about that much, you’re probably using bad data.
Resonate conducts many surveys per year and uses a proprietary “fraud score” to throw out 10-20% of what is scoring as “bad data.” It’s really the only way we’ve found to ensure that the insights we’re providing are the closest measure of consumers.
As is true across the industry, we rely heavily on surveys to measure audience behavior. To understand humans and how to connect with them in an impactful way, we must ask them directly about their buying habits, their daily routines, how they choose which stores to shop at, what kinds of values go into their buying decisions and what their motivations are when pursuing a happy, productive life. But humans are not perfect and there are many factors that can affect the way they answer surveys that ultimately impact data quality.
So, what goes wrong exactly? Well for starters, if you ask someone their political affiliation and they mark Republican but they’re really a Democrat, how are you supposed to know? People provide poor answers for a variety of reasons. Of all the reasons, the most likely is that you’ll have respondents who do lots of online surveys and they blow through the answers to get paid. You’ll also get people who reduce their mental effort while they’re taking the survey to keep their stamina up.
Getting these bad responses has terrible implications for a company seeking high-quality survey data, including skewing data and throwing off compositions used in business decisions. Also, we estimate that about $3 billion-$4 billion is wasted annually on this bad data.
There are a few techniques for identifying bad survey responses but they all have their flaws. Straightlining is a commonly used technique where people choose answers like “agree,” “disagree,” “no opinion” on big matrix questions. People who straightline will just check off the same response all the way down the row. But we’ve actually found that many respondents straightline on these but on all other questions provide high quality data. Consistency checks can be helpful, but they, along with attention checks, can actually cause additional bad data. Extreme timing does catch bad actors, but in general, unless it is used with other techniques, it turns out not to be overly helpful.
Resonate finds bad data through a proprietary approach we call ‘fraud score,’ which is based on a few factors:
- We look at the likelihood of certain answers given the respondent’s other answers. Someone saying they didn’t like their phone all that much means that they probably wouldn’t recommend it to friends. But answering the opposite of that could be a red flag.
- Some questions prompt responses that together can give useful insight into a person’s thinking. For example, if asked for the color of your mother’s living room carpet and their political affiliation, those two answers don’t provide much information about each other. But if someone is asked for your political affiliation and their stance on abortion, their answers provide mutual insight.
We use fraud scoring because it’s an absolute measure that considers mutual informational relationships and is scaled by unconditional likelihoods. Also, when someone fills out a survey, we can tell you how much pure information they gave us. In the end, we throw out at least 15% of the data responses to get the most accurate insights.
I recently presented Resonate’s fraud detection process at the Advertising Research Foundation’s 13th Audience Measurement conference this month. Take a look at my presentation slides to get more details on how we avoid bad data.