Grep for HTML or XML.
bash
$ echo '<a>Hello</a>' | web-grep '<a>{}</a>'
Hello
bash
$ echo '<a>Hello</a>' | web-grep '<a>{html}</a>' --json
{"html":"Hello"}
```bash
-innerHTML
$ cat << EOM | web-grep '
{}
'hello
world
```bash
$ cat << EOM | web-grep '
{}
'hello
world
```bash
$ cat << EOM | web-grep '
'
hello
world
This is just a CLI for an awesome library, tanakh/easy-scraper.
cargo install web-grep
bash
$ web-grep <QUERY> [INPUT]
The QUERY
is a HTML (XML) Pattern.
Patterns are valid HTML structures which has placeholders for innerHTMLs or attributes.
web-grep
has various placeholders for cases.
{}
If you need exact one placeholer in the pattern, use {}
.
```html
{}
``` ```html
{}
```
web-grep
outputs all texts matching for {}
.
bash
$ echo "<p>1</p><p>2</p><p>3</p>" | web-grep "<p>{}</p>"
1
2
3
{n}
html
<a href="{1}">{2}</a>
web-grep
outputs matched texts for {1}
, {2}
... in order, separated by \t
.
bash
$ echo '<a href=hoge>fuga</a>' | web-grep "<a href={2}>{1}</a>"
fuga hoge
The delimiter can be specified with -F
.
bash
$ echo '<a href=hoge>fuga</a>' | web-grep "<a href={2}>{1}</a>" -F ' '
fuga hoge
{xxx}
html
<a href="{href}">{innerHTML}</a>
The output can be formatted as JSON with --json
.
bash
$ echo '<a href=hoge>fuga</a>' | web-grep "<a href={href}>{html}</a>" --json
{"href":"hoge","html":"fuga"}