This repository contains tools to generate a COVID-19 database for research and analysis, and links to a pre-generated database. The database is a self-contained Sqlite database which can be used on any platform.
The program in this library can be run on your machine to download data from the Internet and assemble your own database. The process takes approximately two minutes and you can run it however often you like to obtain the latest data. Alternatively, a database is generated daily that you can download as well.
You can download a compressed database for yourself here: covid19db.zip.
This file is automatically regenerated daily.
Besides the Sqlite command-line tools, here are some other tips on using the data:
Please note that various included data requests or requires attribution. Please give credit to original sources of data (eg, The New York Times) and aggregators in your work.
You can find a complete database schema in dbschema.rs. A Rust API for sqlx
is also provided for select tables. Direct source data download URLs are in loader.rs.
Here are the sources:
cdataset
is from the COVID-19 derived datasets project, which includes data from Johns Hopkins University, the New York Times, and ECDC. This integrates the "combined" set, so you will almost certainly want to use a WHERE dataset='foo'
in every query so that you use only a single dataset. select distinct dataset from cdataset order by dataset;
will show you the available datasets. Please see the derived datasets link above for a description of the sources and the augmentation done there. Additional augmentation is done on reading in to this system:
factbook_population
column using the Johns Hopkins data (see below).loc_lookup
is from the Johns Hopkins dataset, the bulk of which it already included above in cdataset
. This table represents the UID_ISO_FIPS_LookUp_Table.csv
file, which contains county-level population data that is integrated into cdataset
or can be queried separately.rtlive
is from rt.live. Julian dates and YYYY-MM-DD dates are added to the CSV source; no other changes were made. These are potential future integrations:
A command like this should do it
sh
git clone https://github.com/jgoerzen/covid19db
cd covid19db
cargo run --release
You will then get a file named covid19.db
in the working directory. Just use this with Sqlite.
With these commands, you can verify these results for yourself. If you don't already have Rust installed, see the Rust installation page.
It is pretty skeletal at the moment, but you can browse the docs.
This data is used by the Kansas COVID-19 Charts project and perhaps others.
This code is Copyright (c) 2019-2020 John Goerzen
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.