zkmq version 0.1.6

zkmq.

rust zookeeper backed message queue. designed for low throughput, high durability, message queue using only
zookeeper. messages also support filtering. the producer can set filters, and the client can chose to respect the
filters.

note: this is still experimental and in development. don't assume anything will work or the api will remain the same


tasks are given a randomly generated ID and written into /dir/tasks/<task>. Once tree is written, insert message
into /dir/queue.

a worker will try and read the next message from /dir/queue. once it has a task, it will try and make a lock in
/dir/tasks/<task>/claim. once the lock is in place, starts processing the task.

a background thread will perform garbage collection at the interval configured in /dir/conf/task_gc_interval


task has filter performance_class=standard
node has filter performance_class=high_performance
filter config is OrderedEnum(performance_class)=("low", "standard", "performance", "high_performance")

if filter_method=absolute, a performance_class=standard node will be chosen
if filter_method=at_least, a performance_class=high_performance can be selected


tree layout.

/dir
  /queue                Pending Messages
    /<task>
    /<task>
    /<task>

  /tasks
    /<task>
      /filters
        /<filter_name>  => filter value
      /state            => state enum
      /metadata         => json encoded metadata
      /data             => binary data

  /consumers            Active Consumers
    /<id>
      /current          Current Task

  /conf
    /task_gc_lifetime
    /task_gc_interval
  /lock



design.


messages.
messages are stored in two parts. first, the message is assigned a uuid and written to /<dir>/tasks/<id>/data, and
then metadata & filters is written to /<dir>/tasks/<id>. once all the data is written, a message is inserted to
represent the pending task.

this structure allows for two things.
1. limited tracking of the task state, from inserted, to claimed, to finished.
2. tasks can be filtered based on consumer preferences.

the main drawback is that there needs to be some garbage collection to remove completed tasks. this is configured by
the value in /<dir>/conf/task_gc_lifetime. tasks older then this value will be deleted. the polling interval is
governed by /<dir>/conf/task_gc_interval. every producer will try and perform this garbage collection (unless
explicitly configured not to.

( TTL could be used to replace this on clusters that have TTL enabled and configured )


consumers.

the consumers also perform cordinated matience tasks, like cleanup and garbage collection. the reason this is done
on the consumer side, is that it is assumed that the consumers will always be alive longer then the producers. the
consumers will always be around to read messages, even if there isn't any being produced.