Fetch data files from a URL, but only if needed. Verify contents via SHA256.
Fetch-Data checks a local data directory and then downloads needed files. It always verifies the local files and downloaded files via a hash.
Fetch-Data makes it easy to download large and small samples files. For example, here we download a genomics file from GitHub (if it has not already been downloaded). We then print the size of the now local file.
```rust use fetchdata::samplefile;
let path = sample_file("small.fam")?; println!("{}", std::fs::metadata(path)?.len()); // Prints 85
```
ureq to download files via blocking I/O).You can set up FetchData many ways. Here are the steps -- followed by sample code -- for one set up.
Create a registry.txt file containing a whitespace-delimited list of files
and their hashes. (This is the same format as Pooch. See section Registry Creation for tips on creating this file.)
As shown below, create a global static
FetchData
instance that reads your registry.txt file. Give it:
a qualifier, organization, and application -- Used to
create a local data
directory when the environment variable is not set. See crate ProjectsDir for details.
As shown below, define a public sample_file function that takes a file name and returns a Result
containing the path to the downloaded file.
```rust use fetch_data::{ctor, FetchData, FetchDataError}; use std::path::{Path, PathBuf};
static STATICFETCHDATA: FetchData = FetchData::new( includestr!("../registry.txt"), "https://raw.githubusercontent.com/CarlKCarlK/fetch-data/main/tests/data/", "BARAPPDATADIR", // env_key "com", // qualifier "Foo Corp", // organization "Bar App", // application );
/// Download a data file.
pub fn samplefile
```
You can now use your sample_file function to download your files as needed.
You can create your registry.txt file many ways. Here are the steps -- followed by sample code -- for one way to create it.
Fetch-Data
puts its sample data files
in tests/data, so they upload to this GitHub folder. In GitHub, by looking at the raw view of a data file, we see the root URL for these files. In cargo.toml, we keep these data files out of our crate via exclude = ["tests/data/*"]FetchData instance without registry contents.gen_registry_contents method on your list of files. This method will download
the files, compute their hashes, and create a string of file names and hashes.registry.txt.```rust use fetchdata::{FetchData, dirtofilelist};
let fetchdata = FetchData::new( "", // registrycontents ignored "https://raw.githubusercontent.com/CarlKCarlK/fetch-data/main/tests/data/", "BARAPPDATADIR", // envkey "com", // qualifier "Foo Corp", // organization "Bar App", // application ); let filelist = dirtofilelist("tests/data")?; let registrycontents = fetchdata.genregistrycontents(filelist)?; println!("{registrycontents}");
```
sample_file. Define your own sample_file that
knows where to find your data files.FetchData instance need not be global and static. See FetchData::new for an example of a non-global instance.methods on the FetchData instance can fetch multiples files
and can give the path to the local data directory.registry.txt file
and FetchData instance. You can instead use the stand-alone function fetch to retrieve a single file with known URL, hash, and local path.Fetch-Data always does binary downloads to maintain consistant line endings across OSs.Fetch-Data.To make FetchData work well as a static global,
FetchData::new never fails. Instead,
FetchData stores any error
and returns it when the first call to fetch_file, etc., is made.
Debugging this crate under Windows can cause a "Oops! The debug adapter has terminated abnormally" exception. This is some kind of LLVM, Windows, NVIDIA(?) problem via ureq.