netrunner
is a tool to help build, validate, & create archives for
Spyglass lenses.
Lenses are a simple set of rules that tell a crawler which URLs it
should crawl. Combined w/ data from sitemaps and/or the Internet Archive, netrunner
can crawl and created an archive of the pages represented by the lens.
In Spyglass, this is used to create a personalized search engine that only crawls & indexes sites/pages/data that you're interested in.
cargo install --path .
``` netrunner 0.1.0
USAGE:
netrunner --lens-file
OPTIONS:
-h, --help Print help information
-l, --lens-file
SUBCOMMANDS:
check-urls Grabs all the URLs represented by
check-urls
This command will grab all the urls represented by this lens. This will be gathered in either of two ways:
Sitemap(s) for the domain(s) represented. If available, the tool will prioritize using sitemaps or
Using data from the Internet Archive to determine what URLs are represented by the rules in the lens.
The list will then be sorted alphabetically and outputted to stdout. This is a great way to check whether the URLs that will be crawled & indexed are what you're expecting for a lens.
crawl
This will use the rules defined in the lens to crawl & archive the pages within.
This is primarily used as way to create cached web archives that can be distributed w/ community created lenses so that others don't have to crawl the same pages.
validate
This will validate the rules inside a lens file and, if previouisly crawled, the cached web archive for this lens.