Customizable article scraping & curation library and CLI. Also runs in Wasm.
Basic Wasm example with some CORS limitations: https://mattsse.github.io/extrablatt/
Inspired by newspaper.
Html Scraping is done via select.rs.
Customizable for specific news sites/layouts via the Extractor
trait.
Full Documentation https://docs.rs/extrablatt
Extract all Articles from news outlets.
````rust use extrablatt::Extrablatt; use futures::StreamExt;
async fn main() -> Result<(), Box
let site = Extrablatt::builder("https://some-news.com/")?.build().await?;
let mut stream = site.into_stream();
while let Some(article) = stream.next().await {
if let Ok(article) = article {
println!("article '{:?}'", article.content.title)
} else {
println!("{:?}", article);
}
}
Ok(())
} ````
bash
cargo install extrablatt --features="cli"
```text
USAGE:
extrablatt
SUBCOMMANDS: article Extract a set of articles category Extract all articles found on the page help Prints this message or the help of the given subcommand(s) site Extract all articles from a news source.
```
bash
extrablatt article "https://www.example.com/article1.html", "https://www.example.com/article2.html" -o "articles.json"
Licensed under either of these: