Like jq
, but for HTML. Uses CSS selectors to extract bits of content from HTML files.
sh
cargo install htmlq
sh
brew install htmlq
```console $ htmlq -h htmlq 0.3.0 Michael Maclean michael@mgdm.net Runs CSS selectors on HTML
USAGE: htmlq [FLAGS] [OPTIONS] [selector]...
FLAGS:
-B, --detect-base Try to detect the base URL from the
OPTIONS:
-a, --attribute
ARGS:
```console $ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'
</div>
```
console
$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.org/
/learn/get-started
https://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
```console $ curl --silent https://nixos.org/nixos/about.html | htmlq --text .main
About NixOS
NixOS is a GNU/Linux distribution that aims to improve the state of the art in system configuration management. In existing distributions, actions such as upgrades are dangerous: upgrading a package can cause other packages to break, upgrading an entire system is much less reliable than reinstalling from scratch, you can’t safely test what the results of a configuration change will be, you cannot easily undo changes to the system, and so on. We want to change that. NixOS has many innovative features:
[...] ```
(This is a bit of a work in progress)
console
$ curl --silent https://mgdm.net | htmlq --pretty '#posts'
<section id="posts">
<h2>I write about...
</h2>
<ul class="post-list">
<li>
<time datetime="2019-04-29 00:%i:1556496000" pubdate="">
29/04/2019</time><a href="/weblog/nettop/">
<h3>Debugging network connections on macOS with nettop
</h3></a>
<p>Using nettop to find out what network connections a program is trying to make.
</p>
</li>
[...]
bat
console
$ curl --silent example.com | htmlq 'body' | bat --language html