It uses the XOR-cipher to compute a checksum digest. Basically, it splits the data in chunks whose length is the same as the digest size (padding with 0), and XOR
s all chunks between each other into a new chunk that's used as output.
This isn't a good hash function. It lacks the Avalanche Effect, because flipping 1 input bit flips 1 output bit.
The raw digest size is 64bit (8Byte) by default, but can be set to any valid usize
value with the --length
option. The actual size is bigger because the raw digest is expanded to hexadecimal by default. I choose 8, because CRC32 uses 4 and MD5 uses 16, and to make it easier for downgrade implementations to replicate, because 64b fits within a CPU register and can be emulated using 2 u32
s.
The initialization-vector is hardcoded to be 0.
Name and behavior are heavily influenced by cksum
, md5sum
, and b3sum
.
To install latest release from crates.io registry:
sh
cargo install xorsum
This isn't guaranteed to be the latest version, but it will never throw compilation errors.
To install latest dev crate from GH:
sh
cargo install --git https://github.com/Rudxain/xorsum.git
This is the most recent version. Compilation isn't guaranteed. Semver may be broken. And --help
may not reflect actual program behavior.
To get already-compiled non-dev executables, go to GH releases. *.elf
s will only be compatible with GNU-Linux x64. *.exe
s will only be compatible with Windows x64. These aren't setup/installer programs, these are the same executables cargo
would install, so you should run them from a terminal CLI, not click them.
For a Llamalab Automate implementation, visit XOR hasher.
Argument "syntax":
sh
xorsum [OPTIONS] [FILE]...
For ℹinfo about options, run:
sh
xorsum --help
```sh
echo -n > a xorsum --length 4 a
echo -n aaaa > a xorsum a -l 4
echo -n aaaa | xorsum -l4
xorsum a --brief #-l 8
is implicit
```
Note:
echo -n
has different behavior depending on OS and binary version, it might include line endings like\n
(LF) or\r\n
(CR-LF). The outputs shown in the example are the (usually desired) result of NOT including an EOL.PowerShell will ignore
-n
becauseecho
is an alias ofWrite-Output
and therefore can't recognize-n
.Write-Host -NoNewline
can't be piped nor redirected, so it's not a good alternative.
--length
doesn't truncate the output:
sh
xorsum some_big_file -bl 3 #"00ff55"
xorsum some_big_file -bl 2 #"69aa" NOT "00ff"
As you can see, -l
can return very different hashes from the same input. This property can be exploited to emulate the Avalanche Effect (to some extent).
If you have 2 copies of a file and 1 is corrupted, you can attempt to "🔺️triangulate" the index of a corrupted byte, without manually searching the entire file. This is useful when dealing with big raw-binary files
```sh xorsum a b
-l 8
xorsum a b -l 3
xorsum a b -l 2
-l
values, to solve it easier.```
There are programs (like diff
) that compare bytes for you, and are much more efficient and user-friendly. But if you are into math puzzles, this is a good way to pass the time by solving systems of linear modular equations 🤓.
I was surprised that I couldn't find any implementation of a checksum algorithm completely based on the XOR
op. So I posted this for the sake of completeness, and because I'm learning Rust. I also made this for people with low-power devices.
sbox
will (probably) have enough bytes to "mix well".0.x.y
to reflect the incompleteness of the code. I'm sorry for the inconvenience and potential confusion.