A pure Rust baseball data aggregation and analytics library. Supports data aggregation from a number of sources including the MLB stats API, MLB gameday files. Eventually, other sources such as RetroSheet and NCAA will be added.
BOSS is designed from the ground up to be extremely efficient. ALl text fields that can be converted to an enum have been size of the data set. One of BOSS' primary design goals is to be as efficient as possible. carefully mapped. The challenge with baseball data isn't the computational complexity of data gathering, it is the sheer
TODO
toml
[dependencies]
boss = "0.1"
baseballr by Bill Petti
pitchrx by Carson Sievert
Non-scientific benchmarks show the Rust version performs about 4X as fast as the R version, though this is difficult to measure precisely since the vast majority of the time is spent waiting on the network. Typical CPU usage is negligible using BOSS (peaked at less than 1% on my PC), though this may vary depending on your hardware.
Building a baseball data engine in Rust will enable everyday fans to perform data-intensive workloads, as well as efficient data gathering. Ambitiously, aiming for a baseball data platform that will rival what MLB clubs have internally, from an analytics perspective. Clearly, MLB clubs will have access to more, and likely better, data.
This project is also a learning project for the author and should change a lot as the author better hones his Rust skills.
BOSS relies on three crates for the bullk of its workload.
* Isahc handles all of the network requests to grab the JSON files. Isahc's powerful Futures support allowed for easy construction of Asynchronous requests.
* serde-json combined with SerDe handle all the JSON parsing through declarative deserialization. We simply tell serde-json what structure to expect, point it to a file and the rest is handled magically.
Rayon is used to add parallelism. At some point, I'm hoping this evolves into Async Parallel Generators (or something like that) where Rayon is aware of all the yield points in any of its iterations so it can bounce around as needed.