Timely dataflow is a low-latency cyclic dataflow computational model, introduced in the paper Naiad: a timely datalow system. This project is an extended and more modular implementation of timely dataflow in Rust.
To use timely dataflow, add the following to the dependencies section of your project's Cargo.toml
file:
[dependencies]
timely="*"
This will bring in the timely
crate from crates.io, which should allow you to start writing timely dataflow programs like this one (also available in examples/hello.rs):
```rust extern crate timely;
use timely::construction::; use timely::construction::operators::;
fn main() {
// initializes and runs a timely dataflow computation
timely::execute(std::env::args(), |root| {
// create a new input, and inspect its output
let mut input = root.subcomputation(move |builder| {
let (input, stream) = builder.new_input();
stream.inspect(|x| println!("hello {:?}", x));
input
});
// introduce data and watch!
for round in 0..10 {
input.give(round);
input.advance_to(round + 1);
root.step();
}
// seal the input
input.close();
// finish off any remaining work
while root.step() { }
});
} ```
You can run this example from the root directory of the timely-dataflow
repository by typing
% cargo run --example hello
Running `target/debug/examples/hello`
hello 0
hello 1
hello 2
hello 3
hello 4
hello 5
hello 6
hello 7
hello 8
hello 9
A program like that written above can be run, and will by default use a single worker thread.
To use multiple threads in a process, use the -w
or --workers
options followed by the number of threads you would like to use.
To use multiple processes, you will need to use the -h
or --hostfile
option to specify a text file whose lines are hostname:port
entries corresponding to the locations you plan on spawning the processes. You will need to use the -n
or --processes
argument to indicate how many processes you will spawn (a prefix of the host file), and each process must use the -p
or --process
argument to indicate their index out of this number.
Said differently, you want a hostfile that looks like so,
% cat hostfile.txt
host0:port
host1:port
host2:port
host3:port
...
and then to launch the processes like so:
host0% cargo run -w 2 -h hostfile.txt -n 4 -p 0
host1% cargo run -w 2 -h hostfile.txt -n 4 -p 1
host2% cargo run -w 2 -h hostfile.txt -n 4 -p 2
host3% cargo run -w 2 -h hostfile.txt -n 4 -p 3
The number of workers should be the same for each process.
Timely dataflow is intended to support multiple levels of abstraction, from the lowest level manual dataflow assembly, to higher level "declarative" abstractions.
There are currently a few options for writing timely dataflow programs. Ideally this set will expand with time, as interested people write their own layers (or build on those of others).
Timely dataflow: Timely dataflow includes several primitive operators, including standard operators like map
, filter
, and concat
. It also including more exotic operators for tasks like entering and exiting loops (enter
and leave
), as well as generic operators whose implementations can be supplied using closures (unary
and binary
).
Differential dataflow: A higher-level language built on timely dataflow, differential dataflow includes operators like group_by
, join
, and iterate
. Its implementation is fully incrementalized, and the details are pretty cool (if mysterious).
There are also a few application built on timely dataflow, including a streaming worst-case optimal join implementation and a PageRank implementation, both of which should provide helpful examples of writing timely dataflow programs.