Sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples, that can then be queried against.
Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing most relevant results for a query.
🇫🇷 Crafted in Nantes, France.
TODO: link to journal (below)
:newspaper: The Sonic project was initially announced in a post on my personal journal.
TODO: sonic hedgehog image (from personal drawing) (below)
![]() |
Crisp |
👋 You use Sonic and you want to be listed there? Contact me.
Sonic is built in Rust. To install it, either download a version from the Sonic releases page, use cargo install
or pull the source code from master
.
Install from source:
If you pulled the source code from Git, you can build it using cargo
:
bash
cargo build --release
You can find the built binaries in the ./target/release
directory.
Install from Cargo:
You can install Sonic directly with cargo install
:
bash
cargo install sonic-server
Ensure that your $PATH
is properly configured to source the Crates binaries, and then run Sonic using the sonic
command.
Install from packages:
Debian & Ubuntu packages are also available. Refer to the How to install it on Debian & Ubuntu? section.
Install from Docker Hub:
You might find it convenient to run Sonic via Docker. You can find the pre-built Sonic image on Docker Hub as valeriansaliou/sonic.
First, pull the valeriansaliou/sonic
image:
bash
docker pull valeriansaliou/sonic:v1.0.0
Then, seed it a configuration file and run it (replace /path/to/your/sonic/config.cfg
with the path to your configuration file):
bash
docker run -p 1491:1491 -v /path/to/your/sonic/config.cfg:/etc/sonic.cfg valeriansaliou/sonic:v1.0.0
In the configuration file, ensure that:
channel.inet
is set to 0.0.0.0:1491
(this lets Sonic Channel be reached from outside the container)Sonic Channel will be reachable from tcp://localhost:1491
.
Use the sample config.cfg configuration file and adjust it to your own environment.
Available configuration options are commented below, with allowed values:
[server]
log_level
(type: string, allowed: debug
, info
, warn
, error
, default: error
) — Verbosity of logging, set it to error
in production[channel]
inet
(type: string, allowed: IPv4 / IPv6 + port, default: [::1]:1491
) — Host and TCP port Sonic Channel should listen ontcp_timeout
(type: integer, allowed: seconds, default: 300
) — Timeout of idle/dead client connections to Sonic Channel[channel.search]
query_limit_default
(type: integer, allowed: numbers, default: 10
) — Default search results limit for a query command (if the LIMIT command modifier is not used when issuing a QUERY command)query_limit_maximum
(type: integer, allowed: numbers, default: 100
) — Maximum search results limit for a query command (if the LIMIT command modifier is being used when issuing a QUERY command)[store]
[store.kv]
path
(type: string, allowed: UNIX path, default: ./data/store/kv/
) — Path to the Key-Value database store[store.kv.database]
compress
(type: boolean, allowed: true
, false
, default: true
) — Whether to compress database or not (uses LZ4)parallelism
(type: integer, allowed: numbers, default: 2
) — Limit on the number of compaction and flush threads that can run at the same timemax_files
(type: integer, allowed: numbers, default: 1000
) — Maximum number of database files kept open at the same time (this should be balanced)max_compactions
(type: integer, allowed: numbers, default: 1
) — Limit on the number of concurrent database compaction jobsmax_flushes
(type: integer, allowed: numbers, default: 1
) — Limit on the number of concurrent database flush jobs[store.fst]
path
(type: string, allowed: UNIX path, default: ./data/store/fst/
) — Path to the Finite-State Transducer database storeSonic can be run as such:
./sonic -c /path/to/config.cfg
Both searches and object management (ie. data ingestion) is handled via the Sonic Channel protocol only. As we want to keep things simple with Sonic (similarly to how Redis does), connecting to Sonic Channel is the way to go when you need to interact with the Sonic search database.
Sonic Channel can be accessed via the telnet
utility from your computer. The very same system is also used by all Sonic Channel libraries (eg. NodeJS).
START <mode>
: select mode to use for connection (either: search
or ingest
)Issuing any other command — eg. QUIT
— in this mode will abort the TCP connection, effectively resulting in a QUIT
with the ENDED not_recognized
response.
The Sonic Channel Search mode is used for querying the search index. Once in this mode, you cannot switch to other modes or gain access to commands from other modes.
➡️ Available commands:
QUERY
: query database (syntax: QUERY <collection> <bucket> "<terms>" [LIMIT(<count>)]? [OFFSET(<count>)]?
)SUGGEST
: auto-completes sentence (syntax: SUGGEST <collection> <bucket> "<sentence>"
)PING
: ping server (syntax: PING
)HELP
: show help (syntax: HELP [<manual>]?
)QUIT
: stop connection (syntax: QUIT
)⏩ Syntax terminology:
<collection>
: index collection (ie. what you search in, eg. messages
, products
, etc.);<bucket>
: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, ..
, otherwise use a common bucket name eg. generic, default, common, ..
);<terms>
: text for search terms (between quotes);<count>
: a positive integer number; set within allowed maximum & minimum limits;<manual>
: help manual to be shown (available manuals: commands
);Notice: the bucket
terminology may confuse some Sonic users. As we are well-aware Sonic may be used in an environment where end-users may each hold their own search index graph in a given collection
, we made it possible to manage per-end-user search graphs with bucket
. If you only have a single index graph per collection
(most Sonic users will), we advise you use a static generic name for your bucket
, for instance: default
.
⬇️ Search flow example (via telnet
):
bash
T1: telnet sonic.local 1491
T2: Trying ::1...
T3: Connected to sonic.local.
T4: Escape character is '^]'.
T5: CONNECTED <sonic-server v1.0.0>
T6: START search
T7: STARTED
T8: QUERY messages user:0dcde3a6 "valerian saliou" LIMIT(10)
T9: PENDING Bt2m2gYa
T10: EVENT QUERY Bt2m2gYa conversation:71f3d63b conversation:6501e83a
T11: QUERY helpdesk user:0dcde3a6 "gdpr" LIMIT(50)
T12: PENDING y57KaB2d
T13: QUERY helpdesk user:0dcde3a6 "law" LIMIT(50) OFFSET(200)
T14: PENDING CjPvE5t9
T15: PING
T16: PONG
T17: EVENT QUERY CjPvE5t9
T18: EVENT QUERY y57KaB2d article:28d79959
T19: SUGGEST messages user:0dcde3a6 "valerian"
T20: PENDING z98uDE0f
T21: EVENT SUGGEST z98uDE0f valerian saliou
T22: QUIT
T23: ENDED quit
T24: Connection closed by foreign host.
Notes on what happens:
search
mode (this is required to enable search
commands);messages
, in bucket for platform user user:0dcde3a6
with search terms valerian saliou
and a limit of 10
on returned results;Bt2m2gYa
(the marker is used to track the asynchronous response);Bt2m2gYa
and sends 2 search results (those are conversation identifiers, that refer to a primary key in an external database);helpdesk
twice (in the example, this one is heavy, so processing of results takes more time);The Sonic Channel Ingest mode is used for altering the search index (push, pop and flush). Once in this mode, you cannot switch to other modes or gain access to commands from other modes.
➡️ Available commands:
PUSH
: Push search data in the index (syntax: PUSH <collection> <bucket> <object> "<text>"
)POP
: Pop search data from the index (syntax: POP <collection> <bucket> <object>
)COUNT
: Count indexed search data (syntax: COUNT <collection> [<bucket> [<object>]?]?
)FLUSHC
: Flush all indexed data from a collection (syntax: FLUSHC <collection>
)FLUSHB
: Flush all indexed data from a bucket in a collection (syntax: FLUSHB <collection> <bucket>
)FLUSHO
: Flush all indexed data from an object in a bucket in collection (syntax: FLUSHO <collection> <bucket> <object>
)PING
: ping server (syntax: PING
)HELP
: show help (syntax: HELP [<manual>]?
)QUIT
: stop connection (syntax: QUIT
)⏩ Syntax terminology:
<collection>
: index collection (ie. what you search in, eg. messages
, products
, etc.);<bucket>
: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, ..
, otherwise use a common bucket name eg. generic, default, common, ..
);<object>
: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact);<text>
: search text to be indexed (can be a single word, or a longer text; within maximum length safety limits; between quotes);<manual>
: help manual to be shown (available manuals: commands
);Notice: the bucket
terminology may confuse some Sonic users. As we are well-aware Sonic may be used in an environment where end-users may each hold their own search index graph in a given collection
, we made it possible to manage per-end-user search graphs with bucket
. If you only have a single index graph per collection
(most Sonic users will), we advise you use a static generic name for your bucket
, for instance: default
.
⬇️ Ingest flow example (via telnet
):
bash
T1: telnet sonic.local 1491
T2: Trying ::1...
T3: Connected to sonic.local.
T4: Escape character is '^]'.
T5: CONNECTED <sonic-server v1.0.0>
T6: START ingest
T7: STARTED
T8: PUSH messages user:0dcde3a6 conversation:71f3d63b Hey Valerian
T9: ERR invalid_format(PUSH <collection> <bucket> <object> "<text>")
T10: PUSH messages user:0dcde3a6 conversation:71f3d63b "Hello Valerian Saliou, how are you today?"
T11: OK
T12: COUNT messages user:0dcde3a6
T13: RESULT 43
T14: COUNT messages user:0dcde3a6 conversation:71f3d63b
T15: RESULT 1
T16: POP messages user:0dcde3a6 conversation:71f3d63b
T17: RESULT 1
T18: FLUSHB messages user:0dcde3a6
T19: RESULT 42
T20: PING
T21: PONG
T22: QUIT
T23: ENDED quit
T24: Connection closed by foreign host.
Notes on what happens:
ingest
mode (this is required to enable ingest
commands);Hey Valerian
to the index, in collection messages
, bucket user:0dcde3a6
and object conversation:71f3d63b
(the syntax that was used is invalid);<text>
should be quoted);messages
and bucket user:0dcde3a6
;messages
and bucket user:0dcde3a6
;📦 Sonic Channel Libraries:
👉 Cannot find the library for your programming language? Build your own and be referenced here! (contact me)
Sonic was built for Crisp from the start. As Crisp was growing and indexing more and more search data into a full-text search SQL database, we decided it was time to switch to a proper search backend system. When reviewing Elasticsearch (ELS) and others, we found those were full-featured heavyweight systems that did not scale well with Crisp's freemium-based cost structure.
At the end, we decided to build our own search backend, designed to be simple and lightweight on resources. We did some benchmarks on how Sonic behaves at scale.
TODO: benchmarks (graphs + tables + load tests)
If you find a vulnerability in Sonic, you are more than welcome to report it directly to @valeriansaliou by sending an encrypted email to valerian@valeriansaliou.name. Do not report vulnerabilities in public GitHub issues, as they may be exploited by malicious people to target production servers running an unpatched Sonic instance.
:warning: You must encrypt your email using @valeriansaliou GPG public key: :key:valeriansaliou.gpg.pub.asc.
:gift: Based on the severity of the vulnerability, I may offer a $200 (US) bounty to whomever reported it.