DawnSearch is an open source distributed web search engine that searches by meaning. It can index the Common Crawl data. It uses semantic search (searching on meaning), using all-MiniLM-L6-v2. It uses USearch for vector search. DawnSearch is written in Rust. DawnSearch is licensed AGPLv3.0+.
A public instance is available at dawnsearch.org.
This will build and run DawnSearch on a recent Ubuntu, without GPU acceleration.
sudo apt-get update && sudo apt-get install -y build-essential libssl-dev pkg-config python3-pip
# Install rust if you don't have it already:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
pip3 install torch==2.0.0 --index-url https://download.pytorch.org/whl/cpu
Now we need to make sure the build system can find PyTorch. We search for the package:
pip3 show torch
This prints the following:
Name: torch
Version: 2.0.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/ubuntu/.local/lib/python3.10/site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by:
Using the path from 'Location', put this in .bashrc. Note that you need to append '/torch'.
export LIBTORCH=/home/ubuntu/.local/lib/python3.10/site-packages/torch
export LD_LIBRARY_PATH=${LIBTORCH}/lib:$LD_LIBRARY_PATH
We can now load the new environment variables and build:
source ~/.bashrc
mv DawnSearch.toml.example DawnSearch.toml
cargo run --release
If you want to upgrade to GPU acceleration try this:
pip3 install torch==2.0.0
cargo clean
cargo run --release
Alternatively, follow the steps as documented for the tch crate.
Feel free to open an issue if you encounter problems!
You can configure DawnSearch through DawnSearch.toml or through environment variables like DAWNSEARCHINDEXCC.
Please open issues, or create pull requests. Note that DawnSearch is licensed AGPLv3.0+ or later, which is slightly unusual for a Rust project.