FOSSlim stands for Free Open Source Software LIcense Matcher and it matches the text of the OSS license with SPDX id, but user can easily change & update training data with additional EULAs and license text;
It is designed to be modular and to provide many low-level high-speed utilities which libraries written in high-level languages like Ruby & Javascript could benefit; Which means you could take advantage of various models implemented here, but they alone are not enough to provide a response with high-confidence. This task is left for the RubyGem & NPM packages, which are cleaning up a raw-text and combining results from multiple models to increase the confidence of the match result;
It is still under active development, but it will be released as
... TBD = release time unknown: priority depends on interests from community 4. NodeJS library with example AWS lambda function, TBD 5. Rust Microservice, TBD 6. commandline tool to scan files, TBD
... in near future * TF/IDF models with Cosine similarity * Okapi25 model * Winnowing model * Simple probabilistic ML models ~ Naive Bayes, HMM, ...?
```rust use fosslim::index; use fosslim::document::Document; use fosslim::naivetf; // Simple wordbag model with Jaccard similarity ... let idxfilepath = "data/index.msgpack"; // it is pre-built index from SPDX data, includes ~300 licenses let mittxt = r#" Permission is hereby granted, free of charge, to any person obtaining a copy of this software \ and associated documentation files (the "Software"), to deal in the Software without restriction,\ including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,\ and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,\ subject to the following conditions:\ "#;
let doc1 = Document::new(0, "mit".tostring(), mittxt.to_string());
// matching document with SPDX label if let Ok(idx) = index::load(idxfilepath) { let mdl = naivetf::fromindex(&idx);
mdl::match_document(&doc1);
} ... ```
check tests
folder for more usage examples;
And yes, you can build your own index with index::build_from_path()
function; you just have to use same file structure
the JSON files in the data/licenses
folder;
here are some of alternatives you could use already now: