webrtc-sctp: A pure-Rust userspace SCTP implementation

Work In Progress

This crate is a work in progress. The existing features may be minimally implemented and not in compliance with the SCTP specification, while other mandatory features may not be implemented at all. Unless you are interested in contributing to the stack, you are urged to avoid using this code in either a production or a development environment at this time. (Until congestion control is properly implemented, it may actually be harmful to networks.)

Goal

The goal is to develop a simple pure-Rust user-space SCTP stack that provides the minimal feature set needed to implement WebRTC data channels.

Background

Peer-to-peer networking has been used to good effect over the past 20 years to improve services such as audio/video conferencing, online gaming, and overlay networks, in spite of NAT traversal requiring exotic techniques and reinventing the universe on top of UDP. WebRTC is an IETF standard that bundles together the best-known methods and several pre-existing standards for peer-to-peer networking to provide a common target for interoperability. Non-media data channels between peers are based on the Stream Control Transport Protocol (SCTP), a "better TCP" originally invented by the telecommunications industry. However, implementing transport protocols is difficult enough that most (all?) implementations of WebRTC data channels re-use the same C library for SCTP. I think it would be interesting and educational to develop an SCTP implementation, and eventually a full WebRTC stack, in pure Rust. This would not only provide a second option for application developers, but expand the ecosystem of network building blocks implemented in a safe programming language.

My SCTP implementation is based on Tokio and provides an asynchronous API using futures-based MPSC command queues to open/close associations, configure streams, etc. A synchronous API wraps the asynchronous API for convenience and ease of testing by running the Tokio reactor in its own thread. Support for a configurable lower-layer protocol allows SCTP-over-UDP for testing interoperability with libusrsctp, SCTP over an in-process switching layer to allow for network simulations in integration tests, and eventually SCTP-over-DTLS for WebRTC data channels. My general development strategy is to avoid early optimization until the SCTP state machine is complete and functional. In addition to hunting down needless Box's and clone()'s, there are many design choices that will need to be reassessed. Is Nom really an efficient way to parse packets? Is std::collections::BinaryHeap really the best way to implement an ordered reassembly queue? The complex nature of transport protocols allows for a great many bugs that the Rust compiler can't save us from, so testing and debugging will likely be a major effort even after functional completion. Additionally, stress testing has revealed at least one race condition in a dependency (futures::sync::mpsc) that will need to be addressed. The implementation is developed to a point where short messages can be passed back and forth, but work on some critical functionality (e.g. congestion control, MTU probing) is still in progress.

Motivation

My personal motivation for this work is to become more proficient in Rust by developing a non-trivial software library, and also gain more exposure to the details of developing network transport layers.
Most of the world (at least Chrome, Firefox, and the FreeBSD kernel) seems to re-use the same SCTP stack, written in C. While it's great to have such a gold standard, there is value in increasing implementation diversity with an independent library.
While the current proof-of-concept code is a long way from being efficient or reliable, the constrained feature set could some day, for certain use cases, yield a tighter, more efficient implementation with a smaller footprint and attack surface.
Peer-to-peer software is a research interest of mine, and I'd love to see some basic building blocks enabling Rust programmers to easily experiment with new ideas using an open standard like WebRTC Data Channels.

Anti-motivation

Implementing a network transport protocol like SCTP involves a huge number of moving parts that must all work seamlessly together, making this a fairly involved project. If I realized how much work it would be, I might have settled on some other project.
Research into network protocols has picked up considerably in recent years, and it's possible that SCTP may some day be superseded by a more modern protocol like QUIC. There is currently talk about QUIC becoming a first-class WebRTC transport, although it seems to lack explicit support for unordered/unreliable delivery, so it may not be a 100% drop-in replacement for SCTP in WebRTC data channels.

Remaining tasks

[ ] Remaining retransmission tasks
- [ ] When a TSN previously acknowledged in a gap ack block is no longer acknowledged (i.e. it was dropped by the peer), then start the timer.
- [ ] Update congestion control as needed -- adjust ssthresh according to Section 7.2.3, and set cwnd to the MTU.
[ ] Congestion control
[ ] Periodic heartbeats
[ ] Path MTU probing
[ ] Delayed SACK
[ ] Exceptional conditions:
- [ ] Better error returns
- [ ] Send ABORTs when needed
- [ ] Handle ABORTS
  - [ ] Close association, notify application layer
- [ ] Send/handle ERROR
- [ ] Handle Out-Of-The-Blue (OOTB) packets
[ ] Implement SCTP extensions required for full compliance with the WebRTC data channel standards:
- [ ] RFC 3758: SCTP Partial Reliability Extension
- [ ] RFC 7496: Additional Policies for the Partially Reliable SCTP Extension (in particular, the limited retransmission policy)
- [ ] RFC 6525: SCTP Stream Reconfiguration (for closing channels)
- [ ] RFC 5061: Dynamic Address Reconfiguration (partial -- only used to signal support for the stream reset extension)
- [ ] RFC 4820: Padding Chunk and Parameter for SCTP (for Path MTU probing)
- [ ] RFC 8260: Stream Schedulers and User Message Interleaving for SCTP (This is a "SHOULD")
[ ] Upgrade to latest dependencies
- [ ] Tokio-core is deprecated and should be replaced with tokio.
  - [ ] Think about the proper way of handling the SctpStack lifecycle. The impetus is partially due to the new default tokio::run() reactor behavior of terminating when all tasks are complete instead of when the main future completes. But we also need to consider how SctpStack works as a component within larger network machinery. (Since we don't spawn any tasks at present, maybe there's nothing to do?)
- [ ] Tokio-timer should be upgraded. (Use Tokio's embedded tokio::timer instead of upgrading the crate dependency directly.)
  - [ ] Update the retransmission code's CLOCK_GRANULARITY_NS to reflect the finer clock granularity.
[ ] Test/simulation tasks
- [ ] Test lots of associate+shutdown steps
- [ ] Test dropped packets & out-of-order (not random delay) packets
- [ ] Test large streams in ideal conditions
- [ ] Test large streams in hostile conditions (drop, delay, OoO, etc.)
- [ ] Devise a means of testing congestion control.
- [ ] Test Path MTU probing.
[ ] Revisit design decisions
- [ ] Is nom the best way to parse packets? Should we even be trying to parse packets at all, rather than just using the data fields in-place (converting from big endian as needed)?
- [ ] Reconsider the buffer management strategies. Are there ways of reducing copies?
[ ] Optimizations
- [ ] Consider alternatives for cases where Box and .clone() are used.
- [ ] Remove (or document) unwrap()'s.
- [ ] Implement benchmarks to find more opportunities for optimization.
[ ] Evaluate additional features for possible implementation
- [ ] Partial Delivery API
[ ] A considerable amount of additional testing, debugging, and optimization. In particular, we are looking for:
- [ ] Discovery and elimination of potential deadlock conditions
  - [ ] Could this happen when we receive a message that is larger than the recv buffer size?
- [ ] Correctness with regards to the specifications. This will require quite a bit of code auditing.
- [ ] Discovery and removal of bugs.
- [ ] Efficiency.

Living with Path MTU

A classic problem we encounter when developing network transports is deciding how large of a packet we can send to a peer, without triggering IP fragmentation which is best avoided. Each network link may have a different maximum transmission unit (MTU), and discovering the Path MTU (the lowest MTU of all the links between us and our peer) can be quite an involved task.

Traditionally, Path MTU is determined by setting the don't fragment (DF) bit on transmitted IP packets, and listening for ICMP messages indicating that the packets are too large. This isn't suitable for our SCTP stack for the following reasons:

Even in the conventional case of SCTP-over-IP (or even TCP-over-IP), hostile networks can prevent these ICMP messages from being generated or routed. This is known as the Path MTU Black Hole problem.
Since our SCTP is expected to live further up the stack (e.g. in the WebRTC data channel case, SCTP-over-DTLS-over-UDP-over-IP), it becomes problematic to implement ICMP-based Path MTU discovery at the SCTP layer.
Listening for ICMP messages associated with a UDP socket is a platform-specific problem. Linux provides a facility for this (IPMTUDISCOVER), but it's not clear how much work would be required (or even if it's possible) to support this in a cross-platform fashion. (Mio does not currently provide a cross-platform means of setting DF.)
The WebRTC Data Channel specification explicitly gives up on an ICMP method in favor of probing:

Incoming ICMP or ICMPv6 messages can't be processed by the SCTP layer, since there is no way to identify the corresponding association. Therefore SCTP MUST support performing Path MTU discovery without relying on ICMP or ICMPv6 as specified in [RFC4821] using probing messages specified in [RFC4820]. The initial Path MTU at the IP layer SHOULD NOT exceed 1200 bytes for IPv4 and 1280 for IPv6.

We should implement RFC4820/RFC4821 Path MTU probing. For reference, the libusrsctp approach is to start with the local network interface's MTU and step up or down a fixed list of 18 common MTUs. (We'd start with 1200 or 1280 to comply with WebRTC, of course.)

Shortcomings

The first version of this library will have a number of notable shortcomings.

SCTP shortcomings: - We do not implement any features related to multi-homing, as multi-homing is not required for WebRTC data channels. - Limited configurability from the application-layer. - No Partial Delivery API. - We're not currently supporting any specific API guidelines (e.g. Sockets API or the SCTP "Interface with Upper Layer" API from RFC 4960).

Performance sacrifices: - Lots of clones and moves. While we make use of a simple reference-counted shared buffer scheme for payloads, the end-to-end data path needs to be audited for needless copies. - We stick to the collection types available in the Rust standard library, even where custom algorithms could perform better. (And even some of the standard collections/algorithms used may be poor choices and need to be revisited.) - In the UDP lower layer protocol, sending and receiving UDP datagrams using the standard sendto() and recvfrom() system calls means a context switch to and from kernel-mode for each and every packet, which could add considerable overhead to high-volume streams. Linux supports sendmmsg() and recvmmsg() system calls to send and receive multiple datagrams at once, and perhaps other operating systems have a similar feature. However, Mio does not currently expose such a feature.

Reference

WebRTC: * WebRTC Data Channels (draft-ietf-rtcweb-data-channel-13.txt)

Major stack components: * RFC 4960: Stream Control Transmission Protocol * RFC 4347: Datagram Transport Layer Security Version 1.0 * RFC 6347: Datagram Transport Layer Security Version 1.2 * RFC 8261: Datagram Transport Layer Security (DTLS) Encapsulation of SCTP Packets * RFC 5245: Interactive Connectivity Establishment (ICE)

Associated standards: * WebRTC Data Channel Establishment Protocol (draft-ietf-rtcweb-data-protocol-09) * RFC 5764: DTLS Extension to Establish Keys for SRTP

SCTP extensions: * RFC 3758: SCTP Partial Reliability Extension * RFC 7496: Additional Policies for the Partially Reliable SCTP Extension (in particular, the limited retransmission policy) * RFC 6525: SCTP Stream Reconfiguration (for closing channels) * RFC 5061: Dynamic Address Reconfiguration) (partial -- only used to signal the support of the stream reset extension) * RFC 4820: Padding Chunk and Parameter for SCTP and RFC 4821: Packetization Layer Path MTU Discovery (for Path MTU probing) * RFC 8260: Stream Schedulers and User Message Interleaving for SCTP ("SHOULD")

License

This crate is distributed under the terms of both the MIT license and the Apache License (Version 2.0). See LICENSE-MIT and LICENSE-APACHE for details.

Contributing

Unless you explicitly state otherwise, any contribution you intentionally submit for inclusion in the work, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.