web-grep

What this?

Grep for HTML or XML.

bash $ echo '<a>Hello</a>' | web-grep '<a>{}</a>' Hello

bash $ echo '<a>Hello</a>' | web-grep '<a>{html}</a>' --json {"html":"Hello"}

```bash

List up all
-innerHTML

$ cat << EOM | web-grep '

{}

hello

world

EOM hello world ```

```bash

filtering with attributes

$ cat << EOM | web-grep '

{}

hello

world

EOM world ```

```bash

Place-holder {} can be attribute

$ cat << EOM | web-grep '

hello

world

EOM here ```

How this?

This is just a CLI for an awesome library, tanakh/easy-scraper.

Installation

Install cargo
- Recommended Way: Install rustup
Then,
- cargo install web-grep

Usage

bash $ web-grep <QUERY> [INPUT]

The QUERY is a HTML (XML) Pattern.

Patterns are valid HTML structures which has placeholders for innerHTMLs or attributes. web-grep has various placeholders for cases.

Placeholders

Anonymous Palceholder `{}`

If you need exact one placeholer in the pattern, use {}.

```html

{}

``` ```html

{}

```

web-grep outputs all texts matching for {}.

bash $ echo "123" | web-grep "{}" 1 2 3

Numbered Placeholders `{n}`

html <a href="{1}">{2}</a>

web-grep outputs matched texts for {1}, {2}... in order, separated by \t.

bash $ echo '<a href=hoge>fuga</a>' | web-grep "<a href={2}>{1}</a>" fuga hoge

The delimiter can be specified with -F.

bash $ echo '<a href=hoge>fuga</a>' | web-grep "<a href={2}>{1}</a>" -F ' ' fuga hoge

Named Placeholders `{xxx}`

html <a href="{href}">{innerHTML}</a>

The output can be formatted as JSON with --json.

bash $ echo '<a href=hoge>fuga</a>' | web-grep "<a href={href}>{html}</a>" --json {"href":"hoge","html":"fuga"}

web-grep

What this?

List up all -innerHTML

filtering with attributes

Place-holder {} can be attribute

How this?

Installation

Usage

Placeholders

Anonymous Palceholder {}

Numbered Placeholders {n}

Named Placeholders {xxx}

List up all
-innerHTML

Anonymous Palceholder `{}`

Numbered Placeholders `{n}`

Named Placeholders `{xxx}`