cleanup-history-rs

Filters my .bash_history through a set of regexes, deduplicates, and sorts it by most recently used.

Based on https://github.com/naggie/dotfiles/blob/master/scripts/cleanup-history

Notes on .bash_history

Format:

```plaintext

1593575811

echo each command has a timestamp immediately before it

2

1593575811

echo 'after multiple timestamp lines, history will show the timestamp 1593575811'

1593575811

3

echo after multiple timestamp lines, this will show the timestamp 3

1

echo 'when you run history this will show up with a timestamp long ago but still at the end of the list'

1593575811

echo this will have the same timestamp as others above, duplicates don\'t matter

1593575812

1593575813

1593575814

1593575815

1593575816

1593575817

1593575818

1593575819

1593575820

1593575821

echo 'once you history -w all these extra timestamps will get removed'

1593576854

for ((i=0;i<5;i++)); do echo $i; done

1593576854

echo ^^ that was written on multiple lines

1593576874

echo 'foo bar'

1593576874

echo ^^ that was also written on multiple lines, cmdhist=on, lithist=off ```

Gotchas

If a line starts with #\d+, it will be interpreted as a timestamp.

```console $ export HISTFILE=./foo $ history -c $ echo 'this

1234

that' $ history -w $ cat foo

1594044806

echo 'this

1234

that'

1594044814

history -w $ history -c $ history -r $ history 1 2020-07-06 08:16.14 | history -r 2 2020-07-06 08:15.15 | echo 'this 3 1969-12-31 17:20.34 | that' 4 2020-07-06 08:15.25 | history -w 5 2020-07-06 08:16.16 | history ```

```console $ history -c $ echo 'foo

1234 bar

baz' $ history -w $ history # correct in memory 1 2020-07-06 08:24.30 | echo 'foo

1234 bar

baz' 2 2020-07-06 08:24.38 | history -w 3 2020-07-06 08:24.41 | history $ cat foo

1594045470

echo 'foo

1234 bar

baz'

1594045478

history -w $ history -c # clear in-memory history $ history -r # reread from file $ history # now incorrectly interprets #1234 bar as a timestamp 1 2020-07-06 08:19.49 | history -r 2 2020-07-06 08:19.09 | echo 'foo 3 1969-12-31 17:20.34 | baz' 4 2020-07-06 08:19.31 | history -w 5 2020-07-06 08:19.51 | history ```

Benchmarks

The deduplicated line count is a little different due to slightly different regexes ¯\(ツ)/¯. I think it's close enough to be informational.

console $ wc -l bash_history.bak 86636 bash_history.bak $ hyperfine --warmup=5 --prepare='cp bash_history.bak bash_history_python' \ --export-markdown=bash-history-python.txt \ --time-unit=millisecond \ 'python3 cleanup-history.py bash_history_python' $ wc -l bash_history_python 73149 bash_history_python $ hyperfine --warmup=5 --prepare='cp bash_history.bak bash_history_rust' \ --export-markdown=bash-history-rust.txt \ --time-unit=millisecond \ 'cleanup-history-rs/target/release/cleanup-history bash_history_rust' $ wc -l bash_history_rust 64638 bash_history_rust

| Command | Mean [ms] | Min [ms] | Max [ms] | |:---|---:|---:|---:| | python3 cleanup-history.py bash_history_python | 2069.9 ± 112.4 | 1935.1 | 2356.4 | | cleanup-history-rs/target/release/cleanup-history bash_history_rust | 653.5 ± 22.1 | 631.2 | 698.9 |