This is the CLI application that consumes the Core library.
Full API documentation can be found on docs.rs.
This application takes a CSV file and parses text records to detect and extract drug mentions using string similarity algorithms to account for common misspellings.
In general, we expect users to know which string similarity algorithm they want and the limits on its interpretations. For more resources on string similarity algorithms see the main ToolBox page. When in doubt, the defaults in the interactive
command are quite reasonable.
If you are wondering about specific use cases, check out the Workflows section below!
Cargo is available as a part of the Rust toolchain and is readily available via curl + sh combo (see here).
To install the drug-extraction-cli application, simply:
bash
cargo install drug-extraction-cli
This will install an executable called extract-drugs
.
*IMPORTANT*: The cli package is
drug-extraction-cli
but the binary program is renamed toextract-drugs
for more intuitive commands.
This application has three primary commands: interactive
, simple-search
, and drug-search
. Both of the search commands share similar flags/options while the interactive command guides users through selecting options.
In any of the commands, you can set
max-edits
to 0 orthreshold
to 1.0 to return only exact matches 🙂
This will present you with a series of prompts to help you select correct options. Highly recommended for new users or one-off runs.
Usage:
bash
extract-drugs interactive
This helps users select the correct options by, for example, only providing an edit distance limit when an edit-distance metric is selected.
There is also a series of nice select prompts to select the desired id-column
and target-column
:
It also provides options for the desired output format.
The output file will always be named
extracted_drugs
and will be suffixed by either.csv
or.jsonl
depending on your selected format.
Example:
Output:
As you can see we even provide some useful comments on the data collected.
Simple Search works great if you have only a few drugs you are interested in and you want to search for them exclusively. As noted above, these search variants are better if you are more familiar with Linux/shell commands and more comfortable passing flags and/or using this for automation purposes or running on multiple files and combing results. For an example of the latter, see Workflows
BONUS: You can use this to search for MORE than just drugs. For example, we frequently use it to search for COVID-19 (and variants), Narcan/Naloxone, and other key words in the healthcare industry.
You should pass in your search-words separated by a "|" symbol.
The usage here is a bit more complicated:
bash
extract-drugs simple-search \
cli/data/records.csv \
--algorithm "l" \
--max-edits 1 \
--id-column "Case Number" \
--target-column "Primary Cause" \
--search-words "coacine|heroin|Fentanil" \
--format csv \
--analyze
Remember, you can always use --help
after any command to get help and more information.
Help:
For more direct access to drugs of interest, we provide interactivity with the popular RxNorm resource. We utilize RxClass in order to get a group of drugs as opposed to a single/few drugs (for which simple-search
should be used).
We specifically use this request, so if you need to know exactly what we are looking for you will find it there in the RxClass documentation.
This command requires knowledge of two(2) key data points regarding your target RxClass:
We can find these by using the NIH/NLM's RxNav explorer. This page contains all of the information that we will need. Below is a screenshot demonstrating usage of the navigator to find the correct parameters to pass to this drug-extraction tool.
A full list of RelaSources can be found here although I recommended just sticking to either ATC
or MESH
and then using the RxNav explorer to find the target Class ID.
The usage here is very similar to simple-search
but replacing search-words
with RxClass information.
bash
extract-drugs drug-search cli/data/records.csv \
--algorithm "d" \
--max-edits 2 \
--target-column "Primary Cause" \
--rx-class-id N02A \
--rx-class-relasource ATC \
--format jsonl \
--analyze
Remember, you can always use --help
after any command to get help and more information.
Help:
To get more help use the CLI:
bash
extract-drugs --help
We see two primary workflows for this tool. I do not considering one-offs as need a repeated "workflow" and that is why they are suggested to use the interactive
command 😃.
However, sometimes we want more. Maybe we want to run the tool using drug-search
on an RxClass and then run it again using simple-search
and combine the results. Or maybe we want to run the exact same command twice just switching the --target-column
to another text field.
A (very) general workflow:
mermaid
flowchart LR
title[General Data Flow]
A(Get your data) --> B(Preprocess your data)
B --> C2(Run other tools)
C2 --> D
B --> C(Run Drug-Extraction ToolBox)
B --> D(Analyze your data/output)
C --> D
D --> E(Report results)
We provide some convenience scripts on the main Toolbox page for a few key workflows we see. These are written in different languages (Python) but don't have any other dependencies besides the languages themselves and thus should run smoothly. Note that the tools is invoked from Python via the subprocess
module and the data is then manipulated inside Python. Python was chosen due to my personal profeciency with it and its commonality as a data science tool.
--id-column
to the tool.simple-search
so be sure to adapt it to drug-search
by switching both the command options and the column you access in the csv filedrug-search
then simple-search
and combining resultsIf you encounter any issues or need support please either contact @nanthony007 or open an issue.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)See CONTRIBUTING.md for more details.