Build Status License: MIT dependency status Cargo

finde-rs

This is a CLI tool written in Rust, which indexes the 'directory' with a multi-threaded crawler. It has been designed to be generic in nature, currently, filesystem implementation exists. In future, anything that can be decomposed into 'directories' (items to be crawled) and 'files' (contents of 'directories') can be added by implementing the Resource interface:

```rust

pub enum Response { DirFileResponse { dirs: Vec, files: Vec }, }

trait Resource: Send + Sync { /// Return the directories and leaves for this resource fn getdirsand_leaves(&self, path: &T) -> Response;

/// Get the path representation of the resource.
fn get_path(&self) -> Result<T>;

}

```

A filesystem implementation of this exists in fileresource.rs for input patterns starting with '/'.

Dependencies

It uses crossbeam for channels, threadpool library for threadpool support and finally, tantivy for full-text indexing support.

It has three main components:

Usage

```sh

git clone https://github.com/ronin13/finde-rs && cd finde-rs cargo build --release

./target/release/finde-rs --help finde-rs 0.1.3 CLI finder tool

USAGE: finde-rs [FLAGS] [OPTIONS]

FLAGS: -h, --help Prints help information

-q, --quiet
        Pass many times for less log output

-V, --version
        Prints version information

-v, --verbose
        Pass many times for more log output

        By default, it'll only report errors. Passing `-v` one time also prints warnings, `-vv` enables info
        logging, `-vvv` debug, and `-vvvv` trace.

OPTIONS: -I, --index-dir Root path to crawl from [default: /tmp/]

-i, --initial-threads <initial-threads>
        Initial number of threads to spawn

-m, --max-threads <max-threads>
        Maximum number of threads that threadpool can scale upto. Defaults to number of cpus

-p, --path <path>
        Root path to crawl from [default: /usr/lib]

```

Running

```sh

./target/release/finde-rs -p $HOME/repo -v -i 6 -m 12 --index-dir /tmp 2020-02-15 14:10:59,683 INFO [finders] Crawling /Users/raghu/repo 2020-02-15 14:10:59,684 INFO [finders::indexer] Starting indexer 2020-02-15 14:10:59,684 INFO [finders::crawler] Waiting on upto 12 crawler threads 2020-02-15 14:10:59,684 INFO [finders::indexer] Index directory created in /tmp/5ryH1 2020-02-15 14:10:59,684 INFO [tantivy::indexer::segmentupdater] save metas 2020-02-15 14:10:59,687 INFO [finders::indexer] Iterating over results 2020-02-15 14:10:59,785 INFO [finders::scheduler] Updating number of threads to 7, length of work queue 3818, pool size 6 2020-02-15 14:10:59,886 INFO [finders::scheduler] Updating number of threads to 8, length of work queue 6883, pool size 6 2020-02-15 14:10:59,988 INFO [finders::scheduler] Updating number of threads to 9, length of work queue 11192, pool size 6 2020-02-15 14:11:00,089 INFO [finders::scheduler] Updating number of threads to 10, length of work queue 12956, pool size 6 2020-02-15 14:11:00,190 INFO [finders::scheduler] Updating number of threads to 11, length of work queue 12857, pool size 6 2020-02-15 14:11:00,290 INFO [finders::scheduler] Updating number of threads to 12, length of work queue 12607, pool size 6 2020-02-15 14:11:04,834 INFO [finders::scheduler] Updating number of threads to 6, length of work queue 0, pool size 6 2020-02-15 14:11:05,739 INFO [finders::fileresource] Crawling done in ThreadId(5), leaving, bye! 2020-02-15 14:11:05,740 INFO [finders::fileresource] Crawling done in ThreadId(4), leaving, bye! 2020-02-15 14:11:05,740 INFO [finders::fileresource] Crawling done in ThreadId(2), leaving, bye! 2020-02-15 14:11:05,740 INFO [finders::fileresource] Crawling done in ThreadId(7), leaving, bye! 2020-02-15 14:11:05,740 INFO [finders::fileresource] Crawling done in ThreadId(6), leaving, bye! 2020-02-15 14:11:05,740 INFO [finders::fileresource] Crawling done in ThreadId(3), leaving, bye! 2020-02-15 14:11:05,740 INFO [finders::indexer] Commiting the index 2020-02-15 14:11:05,740 INFO [tantivy::indexer::indexwriter] Preparing commit 2020-02-15 14:11:05,757 INFO [finders::scheduler] No more threads to schedule, I am done. Bye! 2020-02-15 14:11:05,899 INFO [tantivy::indexer::segmentupdater] Starting merge - [Seg("8cc31b4d"), Seg("97576eb1"), Seg("2b7bcba3"), Seg("f1bbcb09"), Seg("4c3cf582"), Seg("699c0c3b"), Seg("4e08a0dd"), Seg("1e6b5009")] 2020-02-15 14:11:05,904 INFO [tantivy::indexer::indexwriter] Prepared commit 500530 2020-02-15 14:11:05,904 INFO [tantivy::indexer::preparedcommit] committing 500530 2020-02-15 14:11:05,904 INFO [tantivy::indexer::segmentupdater] save metas 2020-02-15 14:11:05,905 INFO [tantivy::indexer::segmentupdater] Running garbage collection 2020-02-15 14:11:05,905 INFO [tantivy::directory::manageddirectory] Garbage collect 2020-02-15 14:11:05,905 INFO [finders::indexer] Index created in "/tmp/" 2020-02-15 14:11:05,905 INFO [finders::indexer] Index has 12 segments 2020-02-15 14:11:05,906 INFO [finde_rs] Finished crawling /Users/raghu/repo, took 6s ./target/release/finde-rs -p $HOME/repo -v -i 6 -m 12 --index-dir 12.81s user 26.84s system 636% cpu 6.232 total

```

Tests

```

cargo test Compiling finde-rs v0.1.1 (/Users/raghu/repo/finde-rs) Finished test [unoptimized] target(s) in 1.22s Running target/debug/deps/finde_rs-c62a74cfdff79a3e

running 3 tests test scheduler::test::testscalewithbounds ... ok test crawler::test::testrootfromdisconnectedchannel ... ok test crawler::test::testrootfromchannel ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

Linting

cargo clippy