Like jq
, but for HTML. Uses CSS selectors to extract bits of content from HTML files.
sh
cargo install htmlq
sh
brew install htmlq
```console $ htmlq -h htmlq 0.4.0 Michael Maclean michael@mgdm.net Runs CSS selectors on HTML
USAGE: htmlq [FLAGS] [OPTIONS] [--] [selector]...
FLAGS:
-B, --detect-base Try to detect the base URL from the
OPTIONS:
-a, --attribute
ARGS:
```console $ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'
</div>
```
console
$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.org/
/learn/get-started
https://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
```console $ curl --silent https://nixos.org/nixos/about.html | htmlq --text .main
About NixOS
NixOS is a GNU/Linux distribution that aims to improve the state of the art in system configuration management. In existing distributions, actions such as upgrades are dangerous: upgrading a package can cause other packages to break, upgrading an entire system is much less reliable than reinstalling from scratch, you can’t safely test what the results of a configuration change will be, you cannot easily undo changes to the system, and so on. We want to change that. NixOS has many innovative features:
[...] ```
There's a big SVG image in this page that I don't need, so here's how to remove it.
```console $ curl --silent https://nixos.org/ | ./target/debug/htmlq '.whynix' --remove-nodes svg
Nix builds packages in isolation from each other. This ensures that they are reproducible and don't have undeclared dependencies, so if a package works on one machine, it will also work on another.
Nix makes it trivial to share development and build environments for your projects, regardless of what programming languages and tools you’re using.
Nix ensures that installing or upgrading one package cannot break other packages. It allows you to roll back to previous versions, and ensures that no package is in an inconsistent state during an upgrade.
```
(This is a bit of a work in progress)
console
$ curl --silent https://mgdm.net | htmlq --pretty '#posts'
<section id="posts">
<h2>I write about...
</h2>
<ul class="post-list">
<li>
<time datetime="2019-04-29 00:%i:1556496000" pubdate="">
29/04/2019</time><a href="/weblog/nettop/">
<h3>Debugging network connections on macOS with nettop
</h3></a>
<p>Using nettop to find out what network connections a program is trying to make.
</p>
</li>
[...]
bat
console
$ curl --silent example.com | htmlq 'body' | bat --language html