crates.io Dependency status

Crusty - polite && scalable broad web crawler

Introduction

Broad web crawling is an activity of going through practically boundless web by starting from a set of locations(urls) and following outgoing links

It presents a unique set of challenges one must overcome to get a stable and scalable system, Crusty is an attempt to tackle on some of those challenges to see what's out here while having fun with Rust ;)

This particular implementation could be used to quickly fetch a subset of all observable internet, discover most popular domains/links

Built on top of crusty-core which handles all low-level aspects of web crawling

Key features

example

Getting started

install docker && docker-compose, follow instructions at

https://docs.docker.com/get-docker/

https://docs.docker.com/compose/install/

git clone https://github.com/let4be/crusty cd crusty


if you decide to build manually via cargo build, remember - release build is a lot faster(and default is debug)

In the real world usage scenario on high bandwidth channel docker might become too expensive,so it might be a good idea either to run directly or at least in network_mode = host

just use docker-compose, it's the recommended way to play with Crusty

however...

to create / clean db use this script(must be executed -in context- of clickhouse docker container)

grafana dashboard is exported as json model

Contributing

I'm open to discussions/contributions, - use github issues,

pull requests are welcomed ;)