Maman is a Rust Web Crawler saving pages on Redis.
Pages are send to list <MAMAN_ENV>:queue:maman
using
Sidekiq job format:
json
{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
"document":"<html><body><a href='#' /><a href='/new' /></html>",
"urls": ["http://example.net/new"],
"extra": [],
"headers": {"content-type": "text/html"},
"url": "http://example.net/"
}
}
~~~ cargo install maman ~~~
~~~ maman URL [LIMIT] ~~~
LIMIT
must be an interger or 0
is the default, meaning no limit.
The MIT License
Copyright (c) 2016 Laurent Arnoud laurent@spkdev.net