Timely Dataflow

Timely dataflow is a low-latency cyclic dataflow computational model, introduced in the paper Naiad: a timely datalow system. This project is an extended and more modular implementation of timely dataflow in Rust.

An example

To use timely dataflow, add the following to the dependencies section of your project's Cargo.toml file:

[dependencies] timely="*"

This will bring in the timely crate from crates.io, which should allow you to start writing timely dataflow programs like this one (also available in examples/hello.rs):

```rust extern crate timely;

use timely::construction::; use timely::construction::operators::;

fn main() {

// initializes and runs a timely dataflow computation
timely::execute(std::env::args(), |root| {

    // create a new input, and inspect its output
    let mut input = root.subcomputation(move |builder| {
        let (input, stream) = builder.new_input();
        stream.inspect(|x| println!("hello {:?}", x));
        input
    });

    // introduce data and watch!
    for round in 0..10 {
        input.give(round);
        input.advance_to(round + 1);
        root.step();
    }

    // seal the input
    input.close();

    // finish off any remaining work
    while root.step() { }

});

} ```

You can run this example from the root directory of the timely-dataflow repository by typing

% cargo run --example hello Running `target/debug/examples/hello` hello 0 hello 1 hello 2 hello 3 hello 4 hello 5 hello 6 hello 7 hello 8 hello 9

Execution

A program like that written above can be run, and will by default use a single worker thread.

To use multiple threads in a process, use the -w or --workers options followed by the number of threads you would like to use.

To use multiple processes, you will need to use the -h or --hostfile option to specify a text file whose lines are hostname:port entries corresponding to the locations you plan on spawning the processes. You will need to use the -n or --processes argument to indicate how many processes you will spawn (a prefix of the host file), and each process must use the -p or --process argument to indicate their index out of this number.

Said differently, you want a hostfile that looks like so, % cat hostfile.txt host0:port host1:port host2:port host3:port ... and then to launch the processes like so: host0% cargo run -w 2 -h hostfile.txt -n 4 -p 0 host1% cargo run -w 2 -h hostfile.txt -n 4 -p 1 host2% cargo run -w 2 -h hostfile.txt -n 4 -p 2 host3% cargo run -w 2 -h hostfile.txt -n 4 -p 3 The number of workers should be the same for each process.

The ecosystem

Timely dataflow is intended to support multiple levels of abstraction, from the lowest level manual dataflow assembly, to higher level "declarative" abstractions.

There are currently a few options for writing timely dataflow programs. Ideally this set will expand with time, as interested people write their own layers (or build on those of others).

Timely dataflow: Timely dataflow includes several primitive operators, including standard operators like map, filter, and concat. It also including more exotic operators for tasks like entering and exiting loops (enter and leave), as well as generic operators whose implementations can be supplied using closures (unary and binary).
Differential dataflow: A higher-level language built on timely dataflow, differential dataflow includes operators like group_by, join, and iterate. Its implementation is fully incrementalized, and the details are pretty cool (if mysterious).

There are also a few application built on timely dataflow, including a streaming worst-case optimal join implementation and a PageRank implementation, both of which should provide helpful examples of writing timely dataflow programs.