Fetch data files from a URL, but only if needed. Verify contents via SHA256.
Fetch-Data
checks a local data directory and then downloads needed files. It always verifies the local files and downloaded files via a hash.
Fetch-Data
makes it easy to download large and small samples files. For example, here we download a genomics file from GitHub (if it has not already been downloaded). We then print the size of the now local file.
```rust use fetchdata::samplefile;
let path = sample_file("small.fam")?; println!("{}", std::fs::metadata(path)?.len()); // Prints 85
```
ureq
to download files via blocking I/O).You can set up FetchData
many ways. Here are the steps -- followed by sample code -- for one set up.
Create a registry.txt
file containing a whitespace-delimited list of files
and their hashes. (This is the same format as Pooch. See section Registry Creation for tips on creating this file.)
As shown below, create a global static
FetchData
instance that reads your registry.txt
file. Give it:
a qualifier
, organization
, and application
-- Used to
create a local data
directory when the environment variable is not set. See crate ProjectsDir for details.
As shown below, define a public sample_file
function that takes a file name and returns a Result
containing the path to the downloaded file.
```rust use fetch_data::{ctor, FetchData, FetchDataError}; use std::path::{Path, PathBuf};
static STATICFETCHDATA: FetchData = FetchData::new( includestr!("../registry.txt"), "https://raw.githubusercontent.com/CarlKCarlK/fetch-data/main/tests/data/", "BARAPPDATADIR", // env_key "com", // qualifier "Foo Corp", // organization "Bar App", // application );
/// Download a data file.
pub fn samplefile
```
You can now use your sample_file
function to download your files as needed.
You can create your registry.txt
file many ways. Here are the steps -- followed by sample code -- for one way to create it.
Fetch-Data
puts its sample data files
in tests/data
, so they upload to this GitHub folder. In GitHub, by looking at the raw view of a data file, we see the root URL for these files. In cargo.toml
, we keep these data files out of our crate via exclude = ["tests/data/*"]
FetchData
instance without registry contents.gen_registry_contents
method on your list of files. This method will download
the files, compute their hashes, and create a string of file names and hashes.registry.txt
.```rust use fetchdata::{FetchData, dirtofilelist};
let fetchdata = FetchData::new( "", // registrycontents ignored "https://raw.githubusercontent.com/CarlKCarlK/fetch-data/main/tests/data/", "BARAPPDATADIR", // envkey "com", // qualifier "Foo Corp", // organization "Bar App", // application ); let filelist = dirtofilelist("tests/data")?; let registrycontents = fetchdata.genregistrycontents(filelist)?; println!("{registrycontents}");
```
sample_file
. Define your own sample_file
that
knows where to find your data files.FetchData
instance need not be global and static. See FetchData::new
for an example of a non-global instance.methods on the FetchData
instance can fetch multiples files
and can give the path to the local data directory.registry.txt
file
and FetchData
instance. You can instead use the stand-alone function fetch
to retrieve a single file with known URL, hash, and local path.Fetch-Data
always does binary downloads to maintain consistant line endings across OSs.Fetch-Data
.To make FetchData
work well as a static global,
FetchData::new
never fails. Instead,
FetchData
stores any error
and returns it when the first call to fetch_file
, etc., is made.
Debugging this crate under Windows can cause a "Oops! The debug adapter has terminated abnormally" exception. This is some kind of LLVM, Windows, NVIDIA(?) problem via ureq.