AES ciphertext of an SSN looks like a 256-bit blob; a downstream system expecting
NNN-NN-NNNN cannot validate it, index it, or route on it. In production
systems that is usually fine — decrypt at the boundary. But for
non-prod environments (CI, load tests, demos, vendor integration
testing) you need realistic-looking data that flows through schema validators,
regexes, and join keys without exploding.
Format-preserving encryption (FPE) encrypts a value into a ciphertext
with the same format: a 9-digit SSN in, a different 9-digit SSN out; a
16-digit credit-card number in, a Luhn-valid 16-digit number out. NIST standardizes
two modes in SP 800-38G: FF1 and FF3-1, both built
on AES.
1. When FPE Is the Right Tool
Non-prod data — realistic test data without exposing
production PII.
Legacy systems with hard-coded length/format constraints that
cannot accept arbitrary ciphertext.
Partial exposure — you need to show a partially masked
value (last 4 digits of an account number) without decrypting the whole string.
Join keys across systems where plaintext cannot be shared,
but a consistent pseudonym must match across databases.
2. FF1 vs FF3-1
FF1 — Feistel construction, handles very short domains
(as few as 6 characters), supports variable-length tweaks. Recommended default.
FF3-1 — Faster; requires 7+ character domains. The
original FF3 was withdrawn after a 2017 cryptanalysis; FF3-1 is the corrected
variant.
Both use AES-128/192/256 as the underlying PRF. Security is tight against known
attacks when the domain size and tweak are handled correctly; avoid rolling your own
mode.
3. Example: FF1 in Python (pyffx)
import string
from pyffx import String, Integer
KEY = bytes.fromhex("0123456789abcdef" * 2) # 16-byte AES key
def encrypt_ssn(ssn: str, tweak: bytes = b"ssn-v1") -> str:
# SSN as a 9-digit integer so the ciphertext is also 9 digits.
digits = ssn.replace("-", "")
assert len(digits) == 9 and digits.isdigit()
cipher = Integer(KEY, length=9)
enc = cipher.encrypt(int(digits))
enc_str = f"{enc:09d}"
return f"{enc_str[:3]}-{enc_str[3:5]}-{enc_str[5:]}"
def encrypt_account(name: str) -> str:
# Alphabetic handle that must remain alphabetic.
cipher = String(KEY, alphabet=string.ascii_lowercase, length=len(name))
return cipher.encrypt(name.lower())
print(encrypt_ssn("123-45-6789")) # e.g. "482-19-3071"
print(encrypt_account("acmecorp")) # e.g. "rjkqwhfm"
Because FPE is deterministic for a given (key, tweak), the same SSN always produces
the same pseudonym — which is exactly what you want for joins. That also means
it leaks equality: if the attacker sees the ciphertext of a known SSN, they can
match other ciphertexts of the same value. See section 5 for when this matters.
4. Tweaks & Key Management
Tweak = domain separator. Use distinct tweaks per field
(ssn-v1, acct-v1) so the same plaintext in two columns
encrypts to different ciphertexts.
Per-environment keys. The key that seeds the pseudonyms in
staging must differ from any prod key; otherwise staging ciphertexts collide
with prod.
Envelope encryption. Wrap FPE keys with a KMS CMK; log every
key-release event.
Rotation. Rotation invalidates existing pseudonyms. Plan for
cutover windows or keep the old key available for decryption only.
5. FPE vs Tokenization vs Deterministic AEAD
Tokenization (vault-based) — random token, map stored
in a vault. Most secure; no cryptographic leakage. Requires a lookup service on
the reversal path.
FPE — no vault needed, reversal is cryptographic.
Preserves format. Leaks equality (same input → same output).
Deterministic AEAD (AES-SIV) — preserves equality but
not format; ciphertext is longer than plaintext.
Choose FPE only when format preservation is the requirement. For most
production redaction, the vault-based tokenizer described on the
PII redaction page
is the safer default.
6. Gotchas
Small domains are weak — FPE on a 4-digit PIN has only
10 000 possible ciphertexts; an attacker with any plaintext/ciphertext
access can brute-force the key.
Luhn preservation requires care — FF1 on the first 15
digits + recompute check digit, not FF1 on all 16; otherwise you break card
validation.
Not a substitute for access control — FPE ciphertexts
are still regulated data. Treat them as PII in logs, backups, and third-party
integrations.
Audit every decryption. A decrypt-to-plaintext path exists;
that is the riskiest operation and must be logged with actor, tweak, and purpose.