extrablatt

Crates.io Documentation

Customizable article scraping & curation library and CLI. Also runs in Wasm.

Basic Wasm example with some CORS limitations: https://mattsse.github.io/extrablatt/

Inspired by newspaper.

Html Scraping is done via select.rs.

Features

Customizable for specific news sites/layouts via the Extractor trait.

Documentation

Full Documentation https://docs.rs/extrablatt

Example

Extract all Articles from news outlets.

````rust use extrablatt::Extrablatt; use futures::StreamExt;

[tokio::main]

async fn main() -> Result<(), Box> {

let site = Extrablatt::builder("https://some-news.com/")?.build().await?;

let mut stream = site.into_stream();

while let Some(article) = stream.next().await {
    if let Ok(article) = article {
        println!("article '{:?}'", article.content.title)
    } else {
        println!("{:?}", article);
    }
}

Ok(())

} ````

Command Line

Install

bash cargo install extrablatt --features="cli"

Usage

```text USAGE: extrablatt

SUBCOMMANDS: article Extract a set of articles category Extract all articles found on the page help Prints this message or the help of the given subcommand(s) site Extract all articles from a news source.

```

Extract a set of specific articles and store the result as json

bash extrablatt article "https://www.example.com/article1.html", "https://www.example.com/article2.html" -o "articles.json"

License

Licensed under either of these: