Edgesearch

Build a full text search API using Cloudflare Workers and WebAssembly.

Documentation is currently WIP.

At a glance

Features

Demos

Check out the demo folder for live demos with source code.

Usage

Get the CLI

Windows | macOS | Linux

Build the worker

bash edgesearch build \ --documents documents.txt \ --document-terms terms.txt \ --maximum-query-results 20 \ --output-dir dist/worker/

Deploy the worker

bash edgesearch deploy \ --default-results default.txt \ --account-id CF_ACCOUNT_ID \ --account-email me@email.com \ --global-api-key CF_GLOBAL_API_KEY \ --name my-edgesearch \ --output-dir dist/worker/ \ --namespace CF_KV_NAMESPACE_ID \ --upload-data

Calling the API

A client for the browser is available for using a deployed Edgesearch worker:

```typescript import * as edgesearch from 'edgesearch-client';

type Document = { id: string; title: string; description: string; };

const client = new edgesearch.Client('my-edgesearch.me.workers.dev'); const query = new edgesearch.Query(); query.add(edgesearch.Mode.REQUIRE, 'world'); query.add(edgesearch.Mode.CONTAIN, 'hello', 'welcome', 'greetings'); query.add(edgesearch.Mode.EXCLUDE, 'bye', 'goodbye'); const results = await client.search(query); ```

How it works

Bit sets

Terms for each document are merged into a large set of possible search terms. A bit set is created for each possible search term, and for each document with ID n containing the term, the bit set has its nth bit set.

Each bit set is compressed using Roaring Bitmaps. CRoaring is used as the implementation for unserialising and operating on bit sets, which is compiled to WebAssembly.

Searching

Searching is done by looking for terms in a document. There are three modes for each term:

The results are generated by doing bitwise operations across multiple bit sets. The general computation can be summarised as:

c result = (req_a & req_b & req_c & ...) & (con_a | con_b | con_c | ...) & ~(exc_a | exc_b | exc_c | ...)

Bits set in the resulting bit set are mapped to the entry at their corresponding positions.

Cloudflare

The entire app runs off a single JavaScript script + accompanying WASM code. It does not need any database or server, and uses Cloudflare Workers. This allows for some cool features:

Performance