A crawler for the web version of PTT, the largest online community in Taiwan.
Yet another PTT crawler but written in Rust. Can be used as binary directly or as crate.
Created by gh-md-toc
The binary name for ptt-crawler is ptc
.
Currently, no precompiled binary is available.
You need Rust 1.40 or higher and use cargo
to build ptt-crawler from the sources.
``` shell
cargo install ptt-crawler ```
``` shell
git clone https://github.com/cwouyang/ptt-crawler.git cd ptt-crawler cargo build --release ```
``` shell
ptc url https://www.ptt.cc/bbs/Gossiping/M.1597463395.A.478.html ```
Specify flags user agent -u
and proxy -p
used during crawling
``` shell
ptc -u "user/agent/string" -p "https://some.proxy" url https://www.ptt.cc/bbs/Gossiping/M.1597463395.A.478.html
ptc -u "random" https://www.ptt.cc/bbs/Gossiping/M.1597463395.A.478.html ```
``` shell
ptc board Gossiping -r 100 200
ptc board Gossiping ```
Use -l
flag to list supported boards
``` shell
ptc board Gossiping --list ````
Add ptt-crawler
as dependence in Cargo.toml
file
toml
[dependencies]
ptt-crawler = "0.1"
See document for usages.
``` shell
cargo test --all ```
If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
Before submit pull request, make sure
We use SemVer for versioning. For the versions available, see the tags on this repository.
Copyright (c) 2020 cwouyang.
This project is licensed under the terms of MIT License. See the LICENSE file for details.