Rhit reads your nginx log files in their standard location(even gzipped), does some analysis and tells you about it in pretty tables in your console, storing and polluting nothing.
It lets you filter hits by dates, status, referers or paths, and does trend analysis.
And it's fast enough (about one second per million lines) so you can iteratively try queries to build your insight.
Here I'm especially looking at dates and trends on hits with status 2xx and 3xx, on a given period:
Rhit is only tested on linux but is expected to work on Mac.
You need the Rust toolchain. Do
bash
cargo install rhit
You may download linux binaries from https://dystroy.org/rhit/download.
If rhit is on the server, and the logs are at their usual location:
bash
rhit
(you may have to prefix with sudo to read the files in /var/log
)
Tell rhit what files to open:
bash
rhit ~/trav/nginx-logs
Filtering can be quite simple:
bash
rhit -p download
But the syntax allows for much more interesting queries.
You may use a regular expression.
For example when I want to see all downloads of broot:
bash
rhit -p '^/download/.*broot(.exe)?$'
You may negate expressions with a !
.
For example, I have many paths which are just a number (eg /12345
) and If I want to filter them, I can do
bash
rhit -p '!^/\d+$'
(remember to use simple quotes and not double quotes to not have your shell interpret the expression)
Separating filters with a comma is an easy way to do a "AND".
If I want to get paths which are neither broot
nor just a number, I'll do
bash
rhit -p '!^/\d+$,!broot'
If I want to get all paths containing a digit, but not just a number, and not broot
, I do
bash
rhit -p '!^/\d+$,!broot,\d'
For a more complex logic, switch to binary expressions with parentheses and logic operators &
, |
and !
.
For example to get all paths containing dystroy
or blog
but not broot
:
bash
rhit -p '( dystroy | blog ) & !broot'
(add spaces inside parenthesis to avoid them being understood as part of a regular expression)
To get all paths containing dystroy
but neither blog
, nor space
nor any 4 digits numbers:
bash
rhit -p 'dystroy & !( \d{4} | space | blog )'
bash
rhit -r reddit
As for the path, you may use a complex expression.
bash
rhit -d 12/25
This shows only Christmas hits, assuming all the hits are from the same year.
If the log contains several years, you need to precise it, eg rhit -d 2020/12/25
.
Symmetrically, you may omit the month if it's not ambiguous: rhit -d 25
.
bash
rhit -d 2020/12/25-2021/01/03
rhit -d 2020/12
rhit -d 2020
rhit -d '>2020/12/25'
rhit -d '!2020/12/25'
rhit -d '<12/25'
The syntax is quite versatile:
bash
rhit -s 404
rhit -s 5xx
rhit -s 3xx,410-421
rhit -s 301-305
rhit -s '!404'
rhit -s '4xx,!404'
bash
rhit -i 123.123.123.123
rhit -i !123.123.123.123
You can use several arguments.
For example, to get all paths resulting in a 404
but not the robots.txt
(which are legit queries) or the /crashy
path:
The displayed fields can be chosen with the -f
argument.
Default fields: date,status,ref,path
Available fields: date,method,status,ip,ref,path
For example to only show remote IP adresses, statuses, and referers:
bash
rhit -f ip,status,ref
Table lengths is decided with the -l
argument.
Use rhit -l 0
to have just a few lines in the various tables, and rhit -l 5
for huge tables. Default value is 1
.
The measure used for sorting, histograms, and trends is either hits
(default) or bytes
(bytes in the response).
It's highlighted in pink in the report.
You set it with the --key
argument:
Use the --changes
(short: -c
) argument so that Rhit shows you the paths, referers or remote addresses which are notably more popular or less popular.
Settings related to displayed fields and filtered values still apply.