Abstract
The COVID-19 pandemic presented enormous data challenges in the United States.
Policy makers, epidemiological modelers, and health researchers all require
up-to-date data on the pandemic and relevant public behavior, ideally at fine
spatial and temporal resolution. The COVIDcast API is our attempt to fill this
need: Operational since April 2020, it provides open access to both traditional
public health surveillance signals (cases, deaths, and hospitalizations) and
many auxiliary indicators of COVID-19 activity, such as signals extracted from
deidentified medical claims data, massive online surveys, cell phone mobility
data, and internet search trends. These are available at a fine geographic
resolution (mostly at the county level) and are updated daily. The COVIDcast API
also tracks all revisions to historical data, allowing modelers to account for
the frequent revisions and backfill that are common for many public health data
sources. All of the data are available in a common format through the API and
accompanying R and Python software packages. This paper describes the data
sources and signals, and provides examples demonstrating that the auxiliary
signals in the COVIDcast API present information relevant to tracking COVID
activity, augmenting traditional public health reporting and empowering research
and decision-making.