Test Automation in Cryptography

Automated Test Vectors for Cyptography Libraries

Filippo Valsorda recently published an article on Accumulated Test Vectors. This is a novel approach to test automation, in a field where it seemed there was not much left to be said, so it is of interest.

Looking deeper, this unique approach to test automation seems a surprisingly good fit for crypto libraries. Test automation specialists from other walks of life, however, may instinctively feel there is something wrong with the approach.

What Was Being Tested?

The new Post-Quantum Cryptography spec from NIST is being implemented. It is to be called ML-KEM-768. A little bird told me it could be in Go 1.24 in February ‘25. Fundamentally, it’s a new first-choice public key cryptography system for TLS. Exciting times, not seen since elliptic curve cryptography dropped in about 2013.

The algorithm is implemented from a published standard. Go’s implementation exists, but test automation improvements provide safer refactoring, and may allow implementations to be compared for interoperability.

What is an Automated Test Vector?

This next part may seem a little wild for test automation aficionados, but there is method in the madness. To simplify, you run your ML-KEM implementation 10,000 times, with inputs generated by a predictable hashing function.

With the outputs, keep adding them to an accumulator hash, and re-hashing it with each result.

After 10,000 iterations, your accumulated hash should be 8a518cc63da366322a8e7a818c7a0d63, inferred from the reference implementations.

Why This Seems So Wrong

Imagine an edge-case in your cryptography algorithm breaks during refactoring. What test output do you get? The stated purpose of the specific edge-case test that failed? A detailed description of why that test case was special? A clear indication of how many test cases are failing??

No, you get a different hash. What you do with that is your problem. Or perhaps the Go team’s problem.

Here are some natural objections that a testing library author or unit testing connoisseur might have:

Many individual test cases aren’t interesting, and are testing the same thing
Because each test case isn’t labeled with a reason for its existence, the error message doesn’t tell you what you broke, only that you broke something
In the event of a test failure, you can’t even tell whether it was one test case that failed, or all of them - only that the accumulated result was different
The output being 16 bytes in size, you can’t infer anything from it about what happened
If the requirements change, there is no good way to alter the test suite to reflect new expectations as a first step.

Why It Is Really So Right

Those may sound like reasonable objections coming from a mere web developer like me, but let’s bear in mind what we’re testing: a cryptography library that is supposed to implement a certain standard. We might reach the following conclusions:

The search space of inputs is massive, and this is giving pretty good coverage of any edge-cases
That just means the cryptographer may have more debugging to do, but that’s their decision
See 2.
See 2.
The requirements are not going to change.

As I follow the world’s cryptographers across various platforms, like a pigeon with one knobbly foot hungrily hobbling after some pedestrians who are eating crumbly pastries, their conclusion is clear: there is great value to this approach. Filippo himself said (and I don’t remember where) words to the effect of:

“My business is high assurance software. Checking I’m not introducing bugs is the priority. Helping me with my debugging is secondary to that.”

Test Algorithm in More Detail

For those who are interested, here is some Go-ish pseudocode explaining the test algorithm in more detail:

// `shakeRandom` is a SHAKE128 variable length hash with empty input
// Used as a Random source
var shakeRandom sha3.ShakeHash = sha3.NewShake128() 
shakeRandom.Write([]byte(""))

// `shakeA` is another SHAKE128 hash
// Used as an accumulator
var shakeAccumulator sha3.ShakeHash = sha3.NewShake128()

// The subject under test is an ML-KEM key encapsulation system
var mlkem MLKEM = mlkem.New()

// Accumulate 10,000 hashes into a single, testable hash
for i := 0; i < 10000; i++ {
    randomSeed := shakeRandom.Read(64)
    ek, dk := mlkem.KeyGen_internal(randomSeed) // encapsulation key, decapsulation key
    shakeAccumulator.Write(ek)
    message := shakeRandom.Read(32)
    ct, k := mlkem.Encaps_internal(ek, message) // ciphertext and shared key
    shakeAccumulator.Write(ct+a)
    result := mlkem.Decaps_internal(dk, ct)

    // Having encapsulated and decapsulated the key, should get the same value
    if string(result) != string(k) {
        t.Errorf("Expected %x but got %x", expected, output)
    }

    invalidCiphertext := shakeRandom.Read(32) // Get random invalid ciphertext
    rejectionKey := mlkem.Decaps_internal(dk, invalidCiphertext)
    shakeAccumulator.Write(rejectionKey)
}

accumulatedResult := shakeAccumulator.Read(32)
if string(accumulatedResult) != "8a518cc63da366322a8e7a818c7a0d63" {
    t.Errorf("Expected %x but got %x", expected, output)
}

Info

Glossary:

SHAKE128: A hash function that lets you choose the length of the output
ML-KEM-768: The subject under test - public key cryptography system
KeyGen_internal: An internal key generation function of the ML-KEM algorithm
Encaps_internal: An internal function for encapsulating keys within the ML-KEM algorithm
Decaps: A decryption function that attempts to retrieve the encapsulated key using a private key
ek: Encapsulation Key
ct: Ciphertext
k: The shared secret key generated by the KEM process.

Filippo Valsorda has written a library called CCTV to implement this properly.

Automated Test Vectors for Cyptography Libraries#

What Was Being Tested?#

What is an Automated Test Vector?#

Why This Seems So Wrong#

Why It Is Really So Right#

Test Algorithm in More Detail#