Biosphere

Simple, fast random forests.

Random forests with a runtime of O(n d log(n) + n_estimators d n max_depth) instead of O(n_estimators mtry n log(n) max_depth).

biosphere is available as a rust crate and as a Python package.

Benchmarks

Ran on an M1 Pro with n_jobs=4. Wall-time to fit a Random Forest including OOB score with 400 trees to the NYC Taxi dataset, minimum over 10 runs. After feature engineering, the dataset consists of 5 numerical and 7 one-hot encoded features.

| model | 1000 | 2000 | 4000 | 8000 | 16000 | 32000 | 64000 | 128000 | 256000 | 512000 | 1024000 | 2048000 | |:-------------|:-------|:-------|:-------|:-------|:--------|:--------|:--------|:---------|:---------|:---------|:----------|:----------| | biosphere | 0.04s | 0.08s | 0.15s | 0.32s | 0.65s | 1.40s | 2.97s | 6.48s | 15.53s | 37.91s | 96.69s | 231.82s | | scikit-learn | 0.28s | 0.34s | 0.46s | 0.69s | 1.23s | 2.47s | 4.99s | 10.49s | 22.11s | 51.04s | 118.95s | 271.03s |