Mongodb realtime synchronizer, which is similar to py-mongo-sync
--collection-concurrent
to define how many threads to sync a database, use --doc-concurrent
to define how many threads to sync a collection.--log-path
option. Or else log information will be output to stdout.Mongodb 3.6+ (because of official mongodb driver only support mongodb 3.6+)
The recommended way to install mongo_sync is using cargo
:
shell
cargo +nightly install mongo_sync
You can download released binary as well.
To running integration tests, you need to config SYNCER_TEST_SOURCE
to a testing mongodb uri, or mongodb://localhost:27017
will be used.
To run synchronizer, you need to start oplog_syncer to make a realtime mongodb oplog sync first.
Then you can run db_sync
to sync database in realtime.
shell
./target/release/oplog_syncer --src-uri "mongodb://localhost:27017" --oplog-storage-uri "mongodb://localhost:27018/"
shell
db_sync --src-uri "mongodb://localhost:27017/?authSource=admin" --oplog-storage-uri "mongodb://localhost:27018/?authSource=admin" --target-uri "mongodb://localhost:27019" --db test_db
Note that the --oplog-storage-uri
in oplogsyncer and dbsync must be the same.
```shell
USAGE:
oplog_syncer [OPTIONS] --src-uri
FLAGS: -h, --help Prints help information -V, --version Prints version information
OPTIONS:
--log-path
-o, --oplog-storage-uri <oplog-storage-uri> target oplog storage uri
-s, --src-uri <src-uri> source database uri, must be a mongodb cluster
```
```shell
USAGE:
db_sync [OPTIONS] --src-uri
FLAGS: -h, --help Prints help information -V, --version Prints version information
OPTIONS:
--collection-concurrent
-d, --db <db> database to sync
--doc-concurrent <doc-concurrent> how many threads to sync a collection
--log-path <log-path>
log file path, if no specified, all log information will be output to stdout
-o, --oplog-storage-uri <oplog-storage-uri>
mongodb uri which save oplogs, it's saved by `oplog_syncer` binary
-s, --src-uri <src-uri> source mongodb uri
-t, --target-uri <target-uri> target mongodb uri
```
┌───────────────┐
│ target db │
└───┬───────────┘
│
xxxx ▼ xxxxxxx
xx ┌───────┐ x
xx │db_sync│
x └───────┘ x
xxxxxx ▲ ▲ x
xx │ │ xxxx
│ │
│ │
Full dump │ │Incr dump
│ │(Real time)
│ │
│ │
│ │
│ │ ┌─────────────────────┐
│ └─────────┤oplog storage db │
│ └──────▲──────────────┘
│ xxxxxxx │ xxxxx
│ x ┌──────┴──────┐x
│ x │Oplog syncer │x Sync oplog from source cluster
│ x └──────▲──────┘x to oplog storage in real time
│ xxxxxxxxx│ xxxxxxx
│ │
│ │
│ │
│ ┌──────┴───────┐
└────────────┤Source cluster│
└──────────────┘
According to the diagram, you can find that there are 2 basic programs provided by mongo_sync
1. oplog syncer: sync mongodb cluster's oplog to target oplog storage db
.
2. db sync: sync data from source cluster
to target db
.
It's not strictly benchmark test, I just test it manually.
Scenario:
When source cluster insert 50,000 records, how long the target_db
can synchronizer these new 50,000 insert.
My testing result:
db_sync
takes about 50 seconds to sync these update, and py-mongo-sync
takes about 225 seconds to sync these update. In general, it's about 3.5x faster than py-mongo-sync
.
And, please note that 50 seconds
is not accrutely, it highly depends on your database and running machine performance.
oplog_syncer
, oplog storage db
will create and using databse named source_oplog
, and create and using collection named source_oplog
. For now this is hardcoded.db_sync
, target databse will create a new collection named oplog_records
, it saves the latest oplog timestamp applied to the database.