Trevor Perrin

2017-06-27 16:40:05 UTC

There are different generalizations of Ed25519, e.g. "EdDSA" from the

Ed25519 authors, "EdDSA" from RFC 8032, XEdDSA/VXEdDSA, and SchnorrQ.

Despite all this I think there's room for improvement. So I'm

thinking of factoring the XEdDSA spec into a more generalized EdDSA

framework, with a few ideas:

(A) "Synthetic nonces" based on hashing the message, private key, and

some randomness, to combine the security benefits of randomization

with "deterministic"-style nonces.

(B) A signing function that takes a user-specified private scalar

(instead of Ed25519-style key derivation) to support extensions like

XEdDSA where signing uses an existing X25519 private key; or Bitcoin's

Hierarchical Deterministic key derivation.

(C) Features to help extension to other Schnorr signature variants (like VRFs).

(D) A naming scheme and simple rules for combining curves and hashes, e.g

- EdDSA_25519_SHA512 = compatible with Ed25519

- VEdDSA_25519_SHA512 = VRF extension

- EdDSA_448_BLAKE2b

- VEdDSA_P384_SHA3/512, etc

(E) XEdDSA would be a thin wrapper around this generalized EdDSA,

e.g. "XEdDSA_25519_SHA512" would just be "EdDSA_25519_SHA512" verified

with an X25519 public key, and created with an X25519 private key.

I'm looking for feedback, and wondering if anyone else would use this.

Below I'll quickly sketch the algorithms, then give more rationale.

Algorithm sketch

=================

Recall Ed25519:

# (K,k) : Signer's (public key, private scalar)

# X : Signer's master secret key, byte sequence of length 32

# M : Message, byte sequence of arbitrary length

# N : Nonce derivation key, byte sequence of length 32

# (R,r) : Nonce (R = public value aka "commitment", r = private value)

# B : Elliptic curve base point of order q

# q : Subgroup order

# h : Schnorr challenge

# hash() : SHA-512

ed25519_sign(K, X, M):

k || N = hash(X)

r = hash(N || M) (mod q)

R = r*B

h = hash(R || K || M) (mod q)

s = r + (k*h) (mod q)

return R || s

As a first step we'll turn this into what I'll call a "synthetic

nonce" EdDSA signing algorithm, for goals (A) and (B):

# hash() : Any cryptographic hash with HASHLEN at least 64 bits larger

than bitlen(q).

# Z : Output from an RNG, byte sequence of length 32

# empty_labelset : byte sequence of length 3 = (0x02, 0x00, 0x00),

explained later

# pad1, pad2: padding filled with zero bytes to align the next hash

input with a hash block (128-byte alignment for SHA-512).

synthetic_eddsa_sign((K,k), Z, M):

r = hash(B || empty_labelset || Z || pad1 || k || pad2 ||

empty_labelset || K || M) (mod q)

R = r*B

h = hash(R || K || M) (mod q)

s = r + (k*h) (mod q)

return R || s

So this is the same as Ed25519 except it takes a private scalar k as

argument, and performs a different nonce derivation. To roughly

explain the nonce derivation (more later):

* The secret nonce r must be uniformly random to an attacker, and

must take a different value whenever h does. For security redundancy,

this is done in two ways: first, by hashing random data (Z); second,

by hashing the same inputs that are hashed into h (K and M) plus the

secret key k. We'll discuss the value of this redundancy later.

* B is hashed at the beginning to provide an indepent random oracle

from the challenge hash h=hash(R...). Also explained more later.

* The labelset is used to provide independent random oracles in more

complex protocols. It can contain identifiers for the protocol, a

user-specified customization label, and identifiers for the hash

instance. For collision-resilience we hash the labelset a second time

after the secret values (k and Z) have been mixed in.

* k is hashed in its own hash block (due to padding) to provide a

"cascade" PRF as in HMAC or AMAC: If Z is a random secret value, then

r is the output of a PRF keyed by Z. If Z is public (e.g. a bad RNG)

then r is the output of a PRF keyed by k.

To make this more extensible, we'll rewrite it and expand the "labelset" idea:

generalized_eddsa_sign((K,k), Z, M, customization_label):

labelset = new_labelset("", customization_label)

R, r = commit(labelset, "", (K,k), Z, M)

h = challenge(labelset, "", R, K, M)

s = prove(r, k, h)

return R || s

commit(labelset, extra, (K,k), Z, M):

r = hash(B || labelset || Z || pad1 || k || pad2 || labelset || K

|| extra || M) (mod q)

R = r*B

return (R, r)

challenge(labelset, extra, R, K, M):

if is_labelset_empty(labelset):

return hash(R || K || M) (mod q)

else:

return hash(B || labelset || R || labelset || K || extra || M) (mod q)

prove(r, k, h):

return r + (k*h) (mod q)

new_labelset(protocol_name, customization_label):

labelset[0] = 2

labelset = labelset || len(protocol_name) || protocol_name

labelset = labelset || len(customization_label) || customization_label

is_labelset_empty(labelset):

return len(labelset) == 3

A labelset contains a sequence of byte-strings, with the "empty"

labelset containing two zero-length byte strings for the protocol name

and a customization label.

Generalized EdDSA would use an empty protocol name and customization

label by default, thus would be compatible with Ed25519, since the

challenge would be h=hash(R || K || M).

A non-empty labelset modifies the challenge hash: The labelset is

hashed following B for domain separation, and is repeated following R

for collision resilience. Thus even if an attacker finds 2 colliding

labelsets where hash(B || labelset1) == hash(B || labelset2), a

signature under labelset1 won't verify under labelset2.

Non-empty labelsets also indicate a more complex protocol that might

bind "extra" data into the Schnorr challenge. For example, below is a

"VRF" extension ("VEdDSA") which provides a signature with the same

security properties as EdDSA, but where every signature is associated

with a VRF output which is an unpredictable but verifiably-unique

function of the message and keypair.

# c = cofactor (i.e. 8 for Ed25519)

# v = VRF output, byte sequence of length 32

# Elligator2() maps a 256-bit value to an Ed25519 point

generalized_veddsa_sign((K,k), Z, M, customization_label):

labelset = new_labelset("VEdDSA_25519_SHA512_Elligator2",

customization_label)

labelset1 = add_label(labelset, "1")

labelset2 = add_label(labelset, "2")

labelset3 = add_label(labelset, "3")

labelset4 = add_label(labelset, "4")

Bv = c*Elligator2(hash(B || labelset1 || K || M)[...32])

Kv = k * Bv

R, r = commit(labelset2, (Bv || Kv), (K,k), Z, M)

Rv = r * Bv

h = challenge(labelset3, (Bv || Kv || Rv), R, K, M)

s = prove(r, k, h)

v = hash(B || labelset4 || c*Kv)[...32]

return (Kv || h || s), v

add_label(labelset, new_label):

labelset[0]++

return labelset || len(new_label) || new_label

Note that the added hashes for Bv and v use the form hash(B ||

labelset || ...), which allows them to be analyzed as independent

random oracles.

Note also that we're careful to bind the new values Bv, Kv, and Rv as

"extra" data in the Schnorr challenge. We also include Bv and Kv as

"extra" data in nonce derivation, following the "deterministic"

strategy of making the nonce depend on the same inputs as the

challenge hash. We could use extra_data for additional extensions,

e.g. adding a trapdoor commitment to add a "designated verifier"

property.

To extend all this another way: XEdDSA and "XVEdDSA" signatures could

be seen as EdDSA and VEdDSA signatures that are created and verified

with Montgomery-form keys (X25519 or X448):

generalized_xeddsa_sign(k_mont, Z, M, customization_label):

K, k = calculate_key_pair(k_mont)

return generalized_eddsa_sign((K,k), Z, M, customization_label)

generalized_xveddsa_sign(k_mont, Z, M, customization_label):

K, k = calculate_key_pair(k_mont)

return generalized_veddsa_sign((K,k), Z, M, customization_label)

I've skipped over a lot of things, so more rationales below:

Rationales

===========

Nonce randomization

--------------------

Traditionally discrete-log signatures choose r from an RNG. (By

"discrete-log signatures" meaning both Schnorr signatures like EdDSA,

and DSA/ECDSA which calculate s differently).

"Deterministic" signatures have been popularized by Ed25519 for

Schnorr, and RFC 6979 for DSA/ECDSA. In a deterministic signature the

nonce is a hash or PRF of the message and some secret key which is

unique for each signer (e.g. "N" in Ed25519, or the signer's private

scalar (our "k") in RFC 6979).

Deterministic signatures are intended to be more robust, since they're

not vulnerable to RNG failures. Even a small RNG failure could be

catastrophic:

(a) If a nonce r becomes known to an attacker, the signer's private

scalar is solvable as k = (s-r)/h.

(b) If r repeats with different h, the private scalar is solvable as

k = (s1-s2)/(h1-h2).

(c) If the attacker has a few bits of information about many nonces,

the private scalar might be solved with more advanced techniques

[NONCE].

RNG failures happen, so making the nonce a function of the message and

a secret key is a good idea. However randomization *also* provides

benefits:

(1) Faulty computations could be caused by attackers glitching clock

or power lines, or other tampering. Without randomization, any fault

in the calculation R=r*B, or h=hash(...), while signing the same

message repeatedly, would give multiple signatures with the same r but

different h. This would leak the private scalar as in (b), provided

the attacker can calculate or guess the faulty h [FAULT]. With

randomization, I believe fault attacks are less effective, for

example:

- Flipping one bit in the private scalar to leak the private scalar

in hundreds of faulty signatures [FAULT1]

- Randomizing one byte of the private scalar to leak the private

scalar in thousands of faulty signatures [FAULT2]

- Skipping scalarmult instructions to leak the private scalar in

tens of faulty ECDSA signatures [FAULT3]

(2) Without randomization, the algorithm is fragile to modifications

and misuse. In particular, modifying it to add an extra input to

h=... without also adding the input to r=... would leak the private

scalar if the same message is signed with a different extra input. So

would signing a message twice, once passing in an incorrect public key

K (though the synthetic-nonce algorithm fixes this separately by

hashing K into r).

(3) Without randomization, an attacker can trigger repeated

calculations using the same inputs. Consider a sidechannel attacker

recording traces of the signer's power consumption or EM radiation.

This attacker might request repeated signing of the same message in an

attempt to average out noise and get information about secret inputs

in the calculations of r, R, or s.

Of course, there are other ways besides randomization to mitigate

fault and sidechannel risk. Verifying a new signature before

returning it, or performing the signature twice and comparing, would

help against faults, but makes the signing operation a few times

slower. The R = r*B calculation could be "blinded" in different ways

to reduce sidechannel risk, but this probably involves either a

larger, slower scalar, e.g. R = (r+qj)B for random j; or calculating

new blinding / unblinding factors frequently. So random nonces appear

to be a simple and efficient way to increase protection.

Given this, I'll argue that randomized vs. deterministic is a false

dichotomy here. For private-key protection it's not important that

nonces be deterministic based on the message; nonces just have to

differ for different "h" values. Mixing (key, message) into the nonce

can provide this when an RNG fails, and an RNG can provide this when

mixing (key, message) fails. Combining these approaches yields what

we could call a "synthetic nonce" (by analogy with "synthetic IV"

authenticated encryption).

Combining these approaches isn't novel. Adam Langley switched

OpenSSL's DSA/ECDSA to combined hashing of RNG output and private

scalar in 2013, and this remains in OpenSSL plus the BoringSSL and

LibreSSL forks [OPENSSL].

Let's consider two counter-arguments:

* DJB has pointed out that a malicious RNG might inspect memory and

choose RNG outputs to bias the nonce and leak the private key as in

(c) above [ENTROPY]. I'd argue that's a far-fetched threat which is

outweighed by the above risks.

* It could be argued that determinism is valuable apart from

private-key protection, e.g. to help testing. The above signing

functions specify exactly how the random input Z is used, so are

deterministic and testable.

Hashing private scalar into nonce

----------------------------------

Having decided to mix a function of (secret key, message) into the

nonce, what secret key to use?

* The signer's private scalar k as in OpenSSL, XEdDSA, RFC 6979, and

libsecp256k1?

* A separate key N, as in Ed25519?

Using N has a clearer security analysis: We can consider nonce

derivation as a PRF with key N, so there's not the "circularity" of

deriving r and h from k, then using r to mask s=r+kh.

However dealing with N complicates the API, since the signer has to

either store and pass N to the signing function, or pass in a single

"master key" which derives both k and N. Ed25519 uses such a master

key. However, that means Ed25519 can't be used with an external

private scalar, such as an X25519 scalar for XEdDSA, or a derived

scalar as in Bitcoin's BIP32 Hierarchical Deterministic key

derivation.

For a generalized EdDSA I think a simple and flexible API which

supports external scalars is best, thus the signing function mixes in

k. In practice, I think this has security equivalent to using N:

* The main Schnorr security proof is the Pointcheval-Stern proof in

the Random Oracle Model. In the ROM it doesn't matter whether we

query the Random Oracle with k or N.

* Hashing the scalar has seen wide deployment for DSA/ECDSA (e.g.

OpenSSL, and Bitcoin's libsecp256k1). While DSA/ECDSA isn't identical

to Schnorr, the main relevant difference seems to be lack of a ROM

proof for DSA/ECDSA. If people are comfortable with this construct in

DSA/ECDSA (without a proof), I think they should be comfortable in the

Schnorr case (with a proof).

* With synthetic nonces, nonce derivation is a PRF keyed by the

random input Z. Thus mixing k (or N) can be viewed as a backup

strategy, which is only relevant if the RNG fails.

* I'm finding it hard to come up with a plausible hash weakness that

is likely to be insecure with k but secure with N. (Anyone?)

If someone prefers using a nonce derivation key N, that's still easy

with a pre-processing step:

Z' = hash(N || pad || Z)[...32]

generalized_eddsa_sign(..., Z', ...)

For example, here's a synthetic-nonce Ed25519 which uses an Ed25519

"master key" X:

generalized_ed25519_sign(K, X, Z, M):

k || N = hash(X)

Z' = hash(N || pad || Z)[...32]

return generalized_eddsa_sign((K,k), Z', M, "")

Choice of hash

---------------

Ed25519 and the RFC 8032 variants tie each curve to a single hash

function (e.g. SHA-512 for Ed25519, and SHAKE256 for Ed448). They

also specify hash outputs of double the public key size, requiring an

unnecessary and unusual 912-bit hash for Ed448 signatures.

I think it would be better to allow flexible combination of hashes and

EC groups (and even non-EC groups, such as DSA domain parameters).

The hash would be required to have output length at least 64 bits

larger than the subgroup order q. The extra 64 bits would ensure that

any bias in r and h, after the hash output is reduced by q, is very

small. This would also match the nonce-derivation requirement in FIPS

186-4 section B.5.1. Finally, this would allow common 512-bit hash

functions with Curve448 and smaller curves.

Naming

-------

The "EdDSA" name should probably be kept as part of a generalized

naming scheme. This name has become associated with Schnorr plus

hashing the message into nonce.

Specifying the group and hash in the name makes the combinations

explicit. Consider the below group and hash names, and examples.

"25519" = Edwards encoding of Curve25519 from RFC 7448

"448" = Edwards encoding of Curve448 from RFC 7448

"Ed448" = Edwards encoding of Edwards448 from RFC 7448

"Ed448decaf" = Decaf encoding of Ed448

"FourQ" = Compressed encoding of MSR's FourQ

"secp256k1" = Compressed encoding of Bitcoin's secp256k1

"P256" = Compressed encoding of NIST P-256

"P384" = Compressed encoding of NIST P-384

"P521" = Compressed encoding of NIST P-521

"SHA512" = SHA-512

"BLAKE2b" = BLAKE2b with output 512 bits

"SHA3/512" = SHA3-512

"SHAKE256/N" = SHAKE256 with output N bits

Examples:

EdDSA_25519_SHA512 (compatible with Ed25519)

EdDSA_FourQ_BLAKE2b

XEdDSA_448_SHA512

VEdDSA_Ed448decaf_SHAKE256/512

XVEdDSA_25519_BLAKE2b

EdDSA_P384_SHA512

VEdDSA_P521_SHAKE256/640

Labelsets

----------

Labelsets have a few uses:

* A user might want to create EdDSA, VEdDSA, and other types of

signatures from the same key pair. These different variants should

use hashes that can be modelled as independent random oracles, so that

signatures of one type can't be interpreted as, or affect the security

of, other types.

* A protocol might involve several hash calls which should be

independent random oracles (as in the VRF case).

* A user might want to customize signatures so they are bound to a

specific application context.

The "customization labels" use (3rd bullet) has debatable value. RFC

8032 supports them ("contexts"), but I'm not aware of protocols

planning to use them. Because we'd have a label mechanism it would be

easy to experiment with customization labels, and leave them empty if

they don't prove useful.

Labelsets aren't compatible with RFC 8032, but I don't think the RFC

8032 "context" mechanism is that good:

* RFC 8032 contexts are inconsistent between Ed25519 and Ed448:

Ed25519 doesn't support contexts; Ed448 does, and is analogous to

Ed25519ctx (not Ed25519), except with Ed25519ctx one "SHOULD NOT" use

zero-length contexts. Also, while Ed25519ctx and Ed448 both use

special prefixes for domain separation in the hash function, the

details are different.

* The RFC 8032 mechanism doesn't provide a space for protocol names

or additional labels.

* RFC 8032 contexts have weaker security than message contents

because they are hashed before R, and thus a hash collision between a

pair of contexts (C1, C2) would allow a message signed under context

C1 to be verified under context C2. Schnorr is normally more

resilient to hash collisions due to R randomizing the hash

calculation.

Using hash(B || labelset || ...) with a unique labelset provides an

independent random oracle, because:

* Uniqueness of labelsets provides domain-separation between hashes

of the form hash(B || labelset || ...).

* Assuming hash(B || labelset || ...) is used for all new hashing in

these signatures, the only non-domain-separated case is the EdDSA

challenge hash(R || K || M), and hash(B || labelset || ...) if B=R.

These ROM hash queries can be answered by the latter (non-empty

labelset) oracles without affecting the EdDSA ROM security proof for

hash(R || K || M), because:

- A simulator for the EdDSA signing oracle won't need to program

the hash oracle for the case R=B, since the simulator can choose R at

random.

- An extractor for the EdDSA private key will also never need to

program this case, since a forgery with R=B allows the private key k

to be extracted directly via:

R = sB - hK (verification equation)

B = sB - hK (R = B)

1 = s - hk

k = (s-1)/h

Conclusion

===========

I'd love any feedback on why people would or wouldn't use this,

thoughts, ideas for improvement, etc.

Thanks to feedback from this list, in particular Brian Smith, for

ideas and motivation about refactoring.

Thanks also to David Wu, Henry Corrigan-Gibbs, and Samuel Neves, for

feedback and discussion.

References

===========

[NONCE]

https://eprint.iacr.org/2014/434.pdf

https://eprint.iacr.org/2013/346.pdf

[FAULT]

Observation due to Benedikt Schmdit, 2016, private conversation.

Andrey Jivsov had earlier made a similar, though not identical,

observation:

https://www.ietf.org/mail-archive/web/cfrg/current/msg06759.html

Another paper in 2016 made a similar observation, but mistakenly

thought Ed25519 was resistant since the nonce key N wouldn't leak:

https://link.springer.com/chapter/10.1007%2F978-3-319-44524-3_11

[FAULT1]

https://www.researchgate.net/publication/221291537_Breaking_Public_Key_Cryptosystems_on_Tamper_Resistant_Devices_in_the_Presence_of_Transient_Faults

[FAULT2]

"Fault attacks on Signature Schemes"

ftp://nozdr.ru/biblio/kolxo3/Cs/CsLn/Information%20Security%20and%20Privacy,%209%20conf.,%20ACISP%202004(LNCS3108,%20Springer,%202004)(ISBN%203540223797)(504s).pdf#page=489

[FAULT3]

"A Fault Attack on ECDSA"

https://dl.acm.org/citation.cfm?id=1731211

[OPENSSL]

https://github.com/openssl/openssl/commit/8a99cb29d1f0013243a532bccc1dc70ed678eebe

https://github.com/openssl/openssl/commit/190c615d4398cc6c8b61eb7881d7409314529a75

[ENTROPY] https://blog.cr.yp.to/20140205-entropy.html