Maman

Maman is a Rust Web Crawler saving pages on Redis.

Pages are send to list <MAMAN_ENV>:queue:maman using Sidekiq job format:

json { "class": "Maman", "jid": "b4a577edbccf1d805744efa9", "retry": true, "created_at": 1461789979, "enqueued_at": 1461789979, "args": { "document":"<html><body><a href='#' /><a href='/new' /></html>", "urls": ["http://example.net/new"], "extra": [], "headers": {"content-type": "text/html"}, "url": "http://example.net/" } }

Dependencies

Installation

~~~ cargo install maman ~~~

Usage

~~~ maman URL [LIMIT] ~~~

LIMIT must be an interger or 0 is the default, meaning no limit.

Default environment variables

LICENSE

The MIT License

Copyright (c) 2016 Laurent Arnoud laurent@spkdev.net


Build Version License Project status