A command line tool in Rust to scrape data from websites via CSS selectors.
bash
cargo install scwape
```bash
scwape "https://www.wikipedia.org/" -s '[href]'
scwape "https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors" -s '[href*="/en-US/docs/Web/CSS/"]'
bash
curl "https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors" > selector.html scwape selector.html -s "h2" -f "\id: \text\n" ```
The default format is \text\n
, and extra format specifiers are ignored.
bash
scwape <url_or_file> -s "#selector1" -s ".selector2" -f -f "format1\n" -f "format2\n" -f "format3\n"
is equivalent to
bash
scwape <url_or_file> -s "#selector1" -s ".selector2" -f "format1\n" -f "format2\n"
The blank -f
and the extra "format3\n
are ignored.
The possible format specifiers are
* Id (\id
, the element id)
* Name (\name
, the element name)
* Classes (\classes
, the classes for the element)
* Text (\text
, the combined text of child nodes for the element)
* Html (\html
, the html of the element)
* Attrs (\attrs
, the attributes of the element)
\id
will be replaced, but \\id
, \\\id
, and so on will not. Likewise for the other format specifiers.
The disparate -d
option exists to allow for printing out each selector independently. The default behavior is to print the matching elements for all selectors in the order they appear. The disparate option would instead print the elements for the first selector, then the second, then the third and so on.
Fish and Bash shell completions are available on github and are generated upon cargo build.
To generate your own, select the appropiate shell in build.rs
, then run cargo build
. The shell completion will be available in the completions directory. The list of available shells are in clap's documentation.