Even when data is great, it’s not perfect!
Out of all the demographic attributes that we analyze and score, there is one particular bright spot: geographic attributes (e.g., State, Census region, and Census division). These three geographic attributes consistently have the highest average Truthscore (0.93 or higher in Q1 2021) of all the attributes that we currently score. In contrast, the average Truthscore across the same 11 data providers for presence of children data is 0.37.
Every quarter we run our accuracy scoring, calculating Truthscores for over a dozen demographic attributes (e.g., age, gender, ethnicity, race, household income, presence of children, geography, educational attainment, etc.). The result is over 650 million distinct Truthscore-d consumer records.
It seems logical that higher populated states would be highest in accuracy (more data, more accuracy) and less populated, less accurate. But it's a little bit all over, and not exactly linear.
The most populous states don't necessarily have the highest average Truthscore. Based on our analysis, Texas had the highest average Truthscore, 0.945, however Texas is the second-most populous state in the US. Records from California, the largest state by population in the US, had a slightly lower average Truthscore of 0.943. Interestingly, records from Kentucky, a state that comprises less than 2% of the total US general population, had an average Truthscore of 0.93 — that is, barely any lower than that of California.
There are a few reasons why geographic attributes perform so well in accuracy scoring. For one, many data partners build the initial backbone for their full consumer file (to which other PII and demographic information is later associated) on physical address, either at zip code level or residency. In addition, there are numerous, credible public sources of residence information about specific consumers. For example, information about home purchases and mortgage applications are in the public record. Therefore, data providers have many reputable, external sources of geographic information about individual consumers that they can use to either validate their own geographic data or inform their own models.
While geography data is generally good-- and for the above reasons, perhaps unsurprisingly -- data for all states isn’t created equal. There is significant variation in data accuracy (according to average Truthscores) from state to state. If you’re an advertiser running geo-targeted ads, you want to be sure your dollars are well spent in the states you are trying to reach.
Let’s zoom in on our graph a little.
Consumer records that were asserted to live in the least populous states in the US (states comprising less than 1% of the total US population) were scored lower on accuracy. For example, Wyoming, the least populous state in the US, Truthset calculated an average Truthscore ™ of just 0.59. For the next 4 least populous US states, Vermont, Alaska, North Dakota, and South Dakota (in that order), the average state Truthscore generally increased with population, but always falls considerably short of the average Truthscore across all states (0.93 as previously mentioned). Records from Vermont, Alaska, North Dakota, and South Dakota had an average Truthscore of 0.68, 0.73, 0.83, and 0.84, respectively. But a state like Hawaii, 0.87, is an outlier, with a lower than expected average Truthscore ™ given it’s population.
Bottom line — while state and other geographic information about consumers is, on average, very likely to be accurate, data buyers can’t assume that all geographic data is equal in quality. If you want to ensure that you are targeting the consumers most likely to fit into your target geography, regardless of whether that US region is densely or sparsely populated, contact Truthset to optimize your spend for segments with the highest Truthscores ™.
Census Bureau 2019