ptt-crawler (ptc)  ![Crates.io latest version badge] ![Docs.rs badge] ![Crates.io download latest badge] ![Crates.io license badge]

A crawler for the web version of PTT, the largest online community in Taiwan.

Yet another PTT crawler but written in Rust. Can be used as binary directly or as crate.

Table of Contents

Created by gh-md-toc

Features

Getting started

Installation

The binary name for ptt-crawler is ptc . Currently, no precompiled binary is available. You need Rust 1.40 or higher and use cargo to build ptt-crawler from the sources.

From crates.io

``` shell

cargo install ptt-crawler ```

From the sources

``` shell

git clone https://github.com/cwouyang/ptt-crawler.git cd ptt-crawler cargo build --release ```

How to use

``` shell

ptc url https://www.ptt.cc/bbs/Gossiping/M.1597463395.A.478.html ```

Specify flags user agent -u and proxy -p used during crawling

``` shell

ptc -u "user/agent/string" -p "https://some.proxy" url https://www.ptt.cc/bbs/Gossiping/M.1597463395.A.478.html

pass "random" to use randomly generated user agent

ptc -u "random" https://www.ptt.cc/bbs/Gossiping/M.1597463395.A.478.html ```

``` shell

From page 100 (https://www.ptt.cc/bbs/Gossiping/index100.html) to 200 (https://www.ptt.cc/bbs/Gossiping/index200.html)

ptc board Gossiping -r 100 200

From page 1 to latest page

ptc board Gossiping ```

Use -l flag to list supported boards

``` shell

ptc board Gossiping --list ````

Used as crate

Add ptt-crawler as dependence in Cargo.toml file

toml [dependencies] ptt-crawler = "0.1"

See document for usages.

Run unit tests

``` shell

cargo test --all ```

Contributing

If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Before submit pull request, make sure

Links

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

Copyright (c) 2020 cwouyang.

This project is licensed under the terms of MIT License. See the LICENSE file for details.