It’s difficult to understate how disruptive the shutdown is to everyone, not just the people who directly work for Uncle Sam, or just how long the aftereffects will last.
Before I say anything else, I want to make clear that the people who have been forced against their will to work for free for nearly a month are and should be our first priority.
Now, that said, what insidious ways will this shutdown affect your marketing analytics?
What Government Data Is Missing
The data shutdown will have far-reaching impact on nearly every model and framework used to forecast business, economic, social, and demographic data.
As of the time of this writing, almost a month of economic data is missing; some of it can never be retroactively collected. (Example: Census Bureau and Bureau of Economic Analysis holiday shopper interviews will remain permanently missing). In a world powered by data, a month-long chunk of data missing is a big, big deal.
- Your 401k managers use data like this to model and inform what’s in your portfolio.
- Your CMO’s strategy reports from major consulting firms draw heavily on this kind of data.
- Your CFO’s decisions about how and where to manage a firm’s money is built in part on this data.
Imagine for a minute that you turned off Google Analytics for a month. How much would that impact your marketing reporting – not just now, but for months and years to come. Every year-over-year comparison for the next 2-3 years will have an asterisk next to it. Now extend that to data everywhere and you get a sense of how bad any shutdown is.
Every industry deals with government data in some form. Here’s a partial list, via Yahoo Finance:
Many government reports will likely be affected. This includes the January jobs report, future job reports, factory orders, inflation data, and productivity reports.
The January jobs report “may show an artificially high unemployment rate and low unemployment figure” because many of these federal employees could be counted as unemployed. This would raise the U.S. unemployment rate by 0.2%, according to the Associated Press.
With the Census Bureau shut down, future job reports may not be released. The USDA can’t release farming data and although CPI data was released on Jan. 11, the Fed’s preferred inflation was not.
Other data releases affected by the shutdown include those of the Bureau of Economic Analysis, Bureau of Justice Statistics, Bureau of Transportation Statistics, and the Economic Research Service.
Go to a government site like Data.gov, long a preferred provider of data for machine learning and data science:
Even functioning data sources like the St. Louis Federal Reserve’s data system, FRED, has large swaths of missing data. Every model of the economy used by financial technology and investment firms will have massive data quality challenges for the next two years until we have continuous data again year over year. Some of it could be inferred, but still requires annotation to ensure our models deal with the shutdown.
How To Handle Missing Government Data from the Shutdown
So, what should you do? For the present day at time of writing, if you’re an eligible voter, nag the heck out of your elected officials to turn the government back on.
Annotate all your data that relies or uses government data in any way that this shutdown period should be excluded from forecasts until what back data is available is filled in. Look hard at other credible third party data is available from non-government sources ranging from Google to the United Nations and high-integrity, fully-functioning foreign governments. (The EU, Canada, etc.) This will be especially important if you’re trying to infer or impute unrecoverable missing data. Double down on your first-party data as well; you should be collecting, cleaning, and analyzing your internal data most of all.
If your company does business in sectors affected by the shutdown, such as agriculture, be sure to account for the shutdown in your models. Even if data is available, it will be skewed during and after the shutdown until the government catches up.
For future readers, meaning people who find this post after the shutdown ends, note the dates of any models or forecasts beginning December 22, 2018 until the end of the shutdown and consider having multiple parallel data series to infer or impute any missing information. Also know that for the months after the shutdown, datasets from the US government will be in flux as employees catch up on back tasks.
Finally, know that some models will just break. Anyone doing predictive analytics with government data already knows that black swan events can throw a wrench into models. This shutdown, the longest ever, is a giant black swan that we couldn’t have predicted and can’t model for; in building models, we may just have to stop using government data for some specific tasks until we are certain the government is stable again and we have enough historical data to ignore this shutdown’s data gap.
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of AI For Marketers (2019 Edition)
Also published on Medium.