Authenticated deterministic encryption for 64-bit integers based on the AES-CMAC-SIV construction.
AES-SID is a technique for deterministically encrypting 64-bit integers (e.g. database primary keys) as 128-bit ciphertexts which can be serialized as e.g. UUIDs.
While many schemes exist which offer these general properties, AES-SID detects forgeries via techniques adapted from misuse-resistant authenticated encryption (i.e. synthetic initialization vectors)
Both the design and Rust implementation of this scheme have received no external review. There are properties/limits of this scheme that need to be mathematically quantified which are presently undescribed.
This scheme uses deterministic encryption which, if applied improperly (e.g. naive inverted search index) can lead to catastrophic failures including full plaintext recovery.
Before attempting to experiment with this scheme, please make sure to read the full threat model section and make sure the cryptographic properties of this scheme actually apply to your intended threat model.
DO NOT USE THIS CODE IN PRODUCTION!
AES-SID encrypts 64-bit values as 128-bit values. However, if naively (mis)used as a general-purpose construction for encrypting 64-bit values, it can fail catastrophically (and similar constructions have in-practice, as described in the "Security Warning" section above).
The auto-incrementing primary key model utilized by many databases is extremely convenient for developers for many reasons. However, it comes at a cost: primary keys are easily guessable by attackers, who are able to enumerate and explore the entire key space. Some notable examples of this problem include an AT&T security flaw which exposed the email addresses of all iPad users.
More recently these low-entropy identifiers have enabled so-called "Zoom Bombing", where attackers are able to guess the identifiers of valid Zoom channels and thereby gain access to them.
These identifiers are a standard feature of all SQL databases, easily remembered, easy-to-communicate (in text or spoken form), and generally ubiquitous in many applications.
AES-SID is designed to allow developers to retrofit applications which use low-entropy auto-incrementing primary keys in such a way that they can be deterministically and reversibly mapped to 128-bit external/"masked" values (that can be serialized as e.g. a UUID), while ensuring that the "masked" values are randomly distributed and unguessable by an attacker (with greater-than-chance success in the 128-bit integer space, which is widely regarded as the baseline for symmetric cryptography).
One way to solve this problem is to use a (cryptographically) random UUID as a primary key instead of an auto-incrementing one. This is a perfectly valid approach, and one worth considering, but it comes at a price: UUIDs are long and high-entropy, which means they aren't easily spoken, or even remembered or manually typed by someone who has read them.
However, if applications are already leveraging auto-incrementing integer
identifiers, a migration to randomized UUIDs is potentially complex. That said,
even for greenfield applications, low-cardinality auto-incrementing IDs
starting at (0,1)
are extremely convenient from a developer experience
perspective: they're easy to remember, to type, and to speak.
For this reason, schemes for "masking" / encrypting low-entropy numerical developers have been developed. Historically, these schemes have at least one of these two problems:
AES-SID attempts to create a space-optimal authenticated identifier which includes a cryptographic MAC. While other schemes providing the same properties exist, this scheme is notable as being based on the SIV Mode of Operation as described by cryptographer Phil Rogaway.
SIV was originally designed for the purposes of "key wrapping" (i.e. encryption). AES-SID is a specialization of that notion intended for "primary key wrapping".
In addition to being guessable, primary keys leak information about the records they identify: they often expose the total cardinality of the record type they identify as well as a lexicographic ordering, almost certainly by insertion order, which often also exposes a creation date and allows an honest-but-curious attacker to potentially scrape and compute a complete graph of the record type of interest.
Apps which both utilize auto-incrementing low-entropy IDs and expose a creation timestamp are leaking valuable competitive intelligence to potential competitors/attackers.
An encrypted primary key may seem like a powerful building block for encrypted databases. For example, 64-bits is enough to store the UTF-8 encodings of most "short words" in most languages which can be represented in Unicode and/or the primary keys of encrypted documents. Naively it might seem deterministic ciphertexts of keywords could be used to build an encrypted "inverted index" providing fulltext search, however such systems are unsound and fail catastrophically in practice.
As noted in the "Security Warning" section above, attempting to abuse deterministic encryption for these purposes can have disastrous results including full plaintext recovery and for this reason many cryptography professionals may react adversely to the phrase "deterministic encryption" (and rightfully so!)
Problems like searchable symmetric encryption (SSE) and private information retrieval (PIR) are ongoing research areas which fraught with peril and an ongoing history of broken schemes, implementations, and compromises.
The most promising solutions in these spaces involve much more complex schemes than AES-SID, such as structured encryption and oblivious ram.
DO NOT USE AES-SID TO BUILD AN ENCRYPTED DATABASE!
AES-SID is a simplification of the AES-CMAC-SIV scheme as described in the paper The SIV Mode of Operation for Deterministic Authenticated-Encryption (Key Wrap) and Misuse-Resistant Nonce-Based Authenticated-Encryption by Phil Rogaway and later specified as [RFC 5297].
Below is a pseudocode description of AES-CMAC-SIV encryption:
enc_key = key[0..Kenclen]
prf_key = key[Kenclen..Ktotal]
siv = vPRF(prf_key, header0, header1, ... headerN, plaintext)
ciphertext = siv || AES-CTR(enc_key, siv, plaintext)
Where the terms are as follows:
key
: input encryption + PRF keyKenclen
: size of the encryption key in bytesKtotal
: size of the combined encryption + PRF key in bytesenc_key
: encryption keyprf_key
: (v)PRF keyvPRF
: vectorized pseudorandom function: a "keyed hash" which operates
over an arbitrary-sized vector of input messages. The AES-CMAC-SIV
construction function specifies a vPRF called "S2V" which is based on [CMAC].header1
.. headerN
: an arbitrary number of "additional associated data"
messages to authenticate along with the plaintext, typically a single AAD
string and a noncesiv
: synthetic initialization vector (SIV): a dual purpose IV and message
authentication tagAES-CTR
: the AES block cipher (with a Kenclen
key size) instantiated as a
stream cipher in counter mode (CTR)ciphertext
: authenticated ciphertext which is a concatenation of siv
and the AES-CTR encryption of the plaintextWhile naively this might appear to be a "MAC-then-encrypt" scheme, which are classically vulnerable to things like padding oracle attacks (e.g. BEAST, Lucky 13), SIV modes are provably secure from these attacks for two reasons:
Below is pseudocode of AES-SID and a comparison to AES-CMAC-SIV:
enc_key = KDF(key, 0, Kenclen)
prf_key = KDF(key, Kenclen, Ktotal)
siv = PRF(prf_key, plaintext)[0..8bytes]
ciphertext = siv || AES-CTR(enc_key, siv, plaintext)
Where the terms (not already described above) are as follows:
KDF
: key derivation function. AES-SID uses a [CTR_DRBG]-style KDF, namely
the one described in [RFC 8452 Section 4] as used by AES-GCM-SIVPRF
: pseudorandom function. AES-SID replaces the vectorized PRF used
above with a single-input PRF: [CMAC], making it deterministic.
AES-SID as instantiated with CMAC can be more specifically described as
AES-CMAC-SID. It could potentially be instantiated with another secure PRF
(e.g. HMAC-SHA-256).siv
: PRF output truncated to 8-bytes (64-bits)plaintext
: the little endian encoding of an unsigned 64-bit integerciphertext
: a 128-bit uniformly random deterministic encryption of the
plaintext value comprising a 64-bit dual purpose IV/message authenticator
and 64-bit AES-CTR encryption of the plaintextA1: Not yet. We are awaiting cryptographic review of the scheme, which we think we can encourage organically because we know a lot of cryptographers and they are naturally cantankerous people who will provide their opinions whether you want them or not.
A2: Many "masking" schemes which provide a 128-bit ciphertext for a 64-bit integer plaintext use naive (unauthenticated) methods like ECB mode.
AES-SID is notable in its use of a synthetic initialization vector (SIV) mode to solve the "masking" problem. which is both the only notable cryptographically secure way to realize the goals described above in the threat model in a deterministic encryption scheme.
A3: The test vectors are presently contained in the reference Rust implementation. We realize that isn't ideal and plan on extracting them out into a form which is more amenable to mechanical consumption.
A4: If there is sufficient interest, we intend to build out implementations of this scheme in Go, JavaScript, Ruby, and Python in addition to Rust. We would also be interested to hear from potential users in other languages.
Copyright © 2020 iqlusion
Licensed under either of:
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be licensed as above, without any additional terms or conditions.