gcs-rsync

build codecov License:MIT docs.rs crates.io crates.io (recent)

Lightweight and efficient Rust gcs rsync for Google Cloud Storage.

gcs-rsync is faster than gsutil rsync according to the following benchmarks.

no hard limit to 32K objects or specific conf to compute state.

How to install as crate

Cargo.toml

bash [dependencies] gcs-rsync = "0.1"

How to install as cli tool

```bash cargo install --example gcs-rsync gcs-rsync

~/.cargo/bin/gcs-rsync

```

How to run with docker

Mirror local folder to gcs

bash docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToUpload>:/source:ro superbeeeeeee/gcs-rsync -r -m /source gs://<YourBucket>/<YourPrefix>

Mirror gcs to folder

bash docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourPrefix> /dest

Benchmark

Important note about gsutil: The gsutil ls command does not list all object items by default but instead list all prefixes while adding the -r flag slowdown gsutil performance. The ls performance command is very different to the rsync implementation.

new files only (first time sync)

winner: gcs-rsync

gcs-rsync sync bench

bash rm -rf ~/Documents/test4 && cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync

real 2.20 user 0.13 sys 0.21 7606272 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 1915 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 394 messages sent 1255 messages received 0 signals received 54 voluntary context switches 5814 involuntary context switches 636241324 instructions retired 989595729 cycles elapsed 3895296 peak memory footprint

gsutil sync bench

bash rm -rf ~/Documents/gsutil_test4 && mkdir ~/Documents/gsutil_test4 && /usr/bin/time -lp -- gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/

Operation completed over 215 objects/50.3 KiB. real 9.93 user 8.12 sys 2.35 47108096 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 196391 page reclaims 1 page faults 0 swaps 0 block input operations 0 block output operations 36089 messages sent 87309 messages received 5 signals received 38401 voluntary context switches 51924 involuntary context switches 12986389 instructions retired 12032672 cycles elapsed 593920 peak memory footprint

no change (second time sync)

winner: gcs-rsync (due to size and mtime check before crc32c like gsutil does)

gcs-rsync sync bench

bash cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync

real 1.79 user 0.13 sys 0.12 7864320 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 1980 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 397 messages sent 1247 messages received 0 signals received 42 voluntary context switches 4948 involuntary context switches 435013936 instructions retired 704782682 cycles elapsed 4141056 peak memory footprint

gsutil sync bench

bash /usr/bin/time -lp -- gsutil -m -q rsync -r gs://test-bucket/sync_test4/ ~/Documents/gsutil_test4/

real 2.18 user 1.37 sys 0.66 46899200 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 100108 page reclaims 1732 page faults 0 swaps 0 block input operations 0 block output operations 6311 messages sent 12752 messages received 4 signals received 6145 voluntary context switches 14219 involuntary context switches 13133297 instructions retired 13313536 cycles elapsed 602112 peak memory footprint

gsutil rsync config

bash gsutil -m -q rsync -r -d ./your-dir gs://your-bucket

bash /usr/bin/time -lp -- gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/

About authentication

All default functions related to authentication use GOOGLEAPPLICATIONCREDENTIALS env var as default conf like official Google libraries do on other languages (golang, dotnet)

Other functions (from and from_file) provide the custom integration mode.

For more info about OAuth2, see the related README in the oauth2 mod.

How to run tests

Unit tests

bash cargo test --lib

Integration tests + Unit tests

bash TEST_SERVICE_ACCOUNT=<PathToAServiceAccount> TEST_BUCKET=<BUCKET> TEST_PREFIX=<PREFIX> cargo test --no-fail-fast

Examples

Upload object

bash cargo run --release --example upload_object "<YourBucket>" "<YourPrefix>" "<YourFilePath>"

Download object

bash cargo run --release --example download_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

Delete object

bash cargo run --release --example delete_object "<YourBucket>" "<YourPrefix>/<YourFileName>"

List objects

bash cargo run --release --example list_objects "<YourBucket>" "<YourPrefix>"

List objects with default service account

bash GOOGLE_APPLICATION_CREDENTIALS=<PathToJson> cargo r --release --example list_objects_service_account "<YourBucket>" "<YourPrefix>"

List objects

list a bucket having more than 60K objects

bash time cargo run --release --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>" | wc -l

Profiling

Humans are terrible at guessing-about-performance

bash export CARGO_PROFILE_RELEASE_DEBUG=true sudo -- cargo flamegraph --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"

bash cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"

Native bin build (static shared lib)

```bash docker rust rust:alpine3.14 apk add --no-cache musl-dev pkgconfig openssl-dev

LDFLAGS="-static -L/usr/local/musl/lib" LDLIBRARYPATH=/usr/local/musl/lib:$LDLIBRARYPATH CFLAGS="-I/usr/local/musl/include" PKGCONFIGPATH=/usr/local/musl/lib/pkgconfig cargo build --release --target=x8664-unknown-linux-musl --example buckettofoldersync ```