tbuck is a simple CLI tool allows you to take lines of text, group them into buckets according to some time granularity, and emit the count of occurrences for each bucket. My motivation for writing it was that I found myself debugging an issue for work where I was trying to find how often a particular event was occurring, identified by a line in an application's log file. The event did not correspond to any metric being emitted into our monitoring system, but I wanted to see a graph of how often the event was occurring. This requirement came up multiple times for multiple different formats of files during the investigation, and I wrote a per-format script for each case. Finally I realized that all the scripts were doing basically the same thing, and wrote tbuck.
``` tbuck 1.1.0 Drake Tetreault ekardnt@ekardnt.com A command line tool for bucketing time-series text data
USAGE:
tbuck [FLAGS] [OPTIONS]
FLAGS: -d, --descending By default stream mode expects entries to be in monotonically ascending order by date (earlier dates followed by later dates), which is the usual order of log files. If this flag is present then stream mode will instead expect entries in monotonically decreasing order by date (later dates followed by earlier dates). In normal mode, this flag will cause the buckets to be printed in descending order instead of the default ascending order. -h, --help Prints help information
-n, --no-fill
By default buckets which had no entries present will be displayed with a count of 0. If this flag is present
then instead the bucket will not be printed at all.
-s, --stream
Enable stream mode. Entries will be expected to arrive in monotonically increasing (or --decreasing) order,
and bucket information will be printed live as soon as the bucket is known to be finished. By default the
presence of any entry violating the monotonic order will cause an error, but this can be made --tolerant.
-t, --tolerant
By default when a non-monotonic entry is encountered in stream mode the program will terminate with an
error. If this flag is present then non-monotonic entries will instead be silently discarded.
-V, --version
Prints version information
OPTIONS:
-g, --granularity
-m, --match-index <MATCH_INDEX>
0-based index of match to use if multiple matches are found [default: 0]
ARGS:
Suppose you're working with the following log file.
$ cat demo.txt
2019-03-14 12:01:00 Event A
2019-03-14 12:01:10 Event B
2019-03-14 12:01:20 Event A
2019-03-14 12:01:30 Event B
2019-03-14 12:01:40 Event A
2019-03-14 12:01:50 Event B
2019-03-14 12:02:00 Event A
2019-03-14 12:02:10 Event B
2019-03-14 12:02:20 Event A
2019-03-14 12:02:30 Event B
2019-03-14 12:02:40 Event A
2019-03-14 12:02:50 Event B
2019-03-14 12:03:00 Event A
2019-03-14 12:03:10 Event B
2019-03-14 12:03:20 Event A
2019-03-14 12:03:30 Event B
2019-03-14 12:03:40 Event A
2019-03-14 12:03:50 Event B
You want to see how many log lines there are for every 1-minute bucket in the file.
$ tbuck --granularity 1m '%F %T' demo.txt
2019-03-14 12:01:00 UTC,6
2019-03-14 12:02:00 UTC,6
2019-03-14 12:03:00 UTC,6
You want to see how many log lines there are for every 30-second bucket in the file. Note that from now on, these examples will use the short form -g
of the --granularity
argument.
$ tbuck -g 30s '%F %T' demo.txt
2019-03-14 12:01:00 UTC,3
2019-03-14 12:01:30 UTC,3
2019-03-14 12:02:00 UTC,3
2019-03-14 12:02:30 UTC,3
2019-03-14 12:03:00 UTC,3
2019-03-14 12:03:30 UTC,3
You want to see how many log lines of event A there are for every 15-second bucket in the file. rg
is ripgrep.
$rg "Event A" demo.txt | tbuck -g 15s '%F %T'
2019-03-14 12:01:00 UTC,1
2019-03-14 12:01:15 UTC,1
2019-03-14 12:01:30 UTC,1
2019-03-14 12:01:45 UTC,0
2019-03-14 12:02:00 UTC,1
2019-03-14 12:02:15 UTC,1
2019-03-14 12:02:32019-03-14 12:02:45 UTC,00 UTC,1
2019-03-14 12:02:45 UTC,0
2019-03-14 12:03:00 UTC,1
2019-03-14 12:03:15 UTC,1
2019-03-14 12:03:30 UTC,1
You noticed that the previous command printed 0s for buckets without any entries that fell within them, and you don't want that for some reason.
$rg "Event A" demo.txt | tbuck -g 15s --no-fill '%F %T'
2019-03-14 12:01:00 UTC,1
2019-03-14 12:01:15 UTC,1
2019-03-14 12:01:30 UTC,1
2019-03-14 12:02:00 UTC,1
2019-03-14 12:02:15 UTC,1
2019-03-14 12:02:30 UTC,1
2019-03-14 12:03:00 UTC,1
2019-03-14 12:03:15 UTC,1
2019-03-14 12:03:30 UTC,1