WhichLicense detection

This is a library to facilitate the detection of licenses in source code.

Usage

License Detection

Gaoya detection

rust let mut gaoya = GaoyaDetection { index: MinHashIndex::new(42, 3, 0.5), min_hasher: MinHasher32::new(42 * 3), shingle_text_size: 50, }; gaoya.load_from_file("licenses.json"); // OR: // for l in load_licenses_from_folder("./licenses/RAW"){ // gaoya.add_plain(l.name, l.text); // }

Fuzzyhash-rs Detection

rust let mut fuzzy = FuzzyDetection { licenses: vec![], min_confidence: 50, exit_on_exact_match: false, }; fuzzy.load_from_file("licenses.json"); // OR: // for l in load_licenses_from_folder("./licenses/RAW"){ // fuzzy.add_plain(l.name, l.text); // }

Pipeline System

The pipeline system was developed to automatically improve the results of license detection outputs by allowing further processing when a confidence is, for example, too low.

Diffing pipeline

The diffing pipeline works by only taking the modified license parts and putting them in a new string. This string is then passed to the regex provided to check if the changes matches the regex. diffing<em>pipeline</em>expl_1

```rust let regexpipeline = DiffingPipeLine { regex: String::from(r"\d{4}-\d{2}-\d{2}"), // date finding regex originallicense: String::from("this is a sample license created on [enterlicensecreationdatehere] copyright Some Company"), modifiedlicense: String::from("this is a sample license created on 2014-01-01 copyright Some Company. and stuff"), runcondition: PipelineTriggerInstruction { // adjust this to the trigger condition you want condition: PipelineTriggerCondition::Always, // does not matter on always value: 10, }, action: PipeLineAction { // what is the action you want to take when the regex matches? action: PipelineActionType::Add, value: 5, }, };

let result = regexpipeline.run(10); asserteq!(result, 15) ```

Regex pipeline

The regex pipeline works by taking the entire (incoming) license text and checking if it matches the regex provided. ```rust let regexpipeline = RegexPipeLine { regex: String::from("some text"), licensetext: String::from("this is a sample license with some text"), run_condition: PipelineTriggerInstruction { condition: PipelineTriggerCondition::GreaterThan, value: 50, }, action: PipeLineAction { action: PipelineActionType::Add, value: 5, }, };

let result = regexpipeline.run(95); asserteq!(result, 100) ```

Attributions

ScanCode License data

The initial database was generated by making use of the license data from the ScanCode toolkit. You do not need to make use of this copyright notice in your project if you choose not to use the ScanCode license database. However, if you do make use of the ScanCode license database, you must include this copyright notice in your project.

Copyright (c) nexB Inc. and others. All rights reserved. ScanCode is a trademark of nexB Inc. SPDX-License-Identifier: CC-BY-4.0 See https://creativecommons.org/licenses/by/4.0/legalcode for the license text. See https://github.com/nexB/scancode-toolkit for support or download. See https://aboutcode.org for more information about nexB OSS projects.