Björn Haase

2017-08-01 21:35:52 UTC

Thank you, Mike, for your reply!

conclusion that the speedup factor of ~2 reported by Liu, Longa,

Pereira, Reparaz and Seo [LLPRS] in https://eprint.iacr.org/2017/434.pdf

are realistic for the AVR and MSP430. I moreover expect that this might

hold also for the M0 target (not reported by LLPRS).

It must, however, be clear that one buys this with significantly larger

RAM requirements (that is most precious, particularly in the very small

targets) and I believe that it will also be much harder to protect FourQ

against side channel attacks, especially regarding fault injection,

without loosing the speed advantage.

targets also an interresting choice for PAKE.

I don't believe, however that the claimed huge speedup factor for the M4

from LLPRS actually holds. The Curve25519 figures taken for the

comparision with FourQ for the M4 (~1.4 million cycles cor X25519) are

in my perception much less optimized than the LLPRS implementation for

their "complex" composed field.

I expect that actual speed advantage of FourQ on the M4 might in fact

only be in the range of ~25%. For the M4, the UMAAL instruction gives

you carry accumulation essentially for free and I know that

constant-time field arithmetics on the M4 for the highly regular fe25519

field is actually a factor of ~1.5 faster for Curve25519 than the values

reported for FourQ by LLPRS. I think that this will eat up most of the

advantage from the endomorphisms. Actually this might hold also for

other "larger" targets bigger than the M4, where the regular 255 bit

field could be implemented ay faster than the more complex "complex"

field of FourQ?

for the solinas prime in constant time. At least, I did try it and I did

not find a way to implement that efficiently.

than for 2^255 - 1 will be the reason that prevents significant speedups

in comparison to Curve25519.

expired this spring. We had chosen PACE predominantly because our first

products came on the market way earlier than the expiration date. I

think also that it is a good idea to make both parties contribute

entropy for generating a session-specific ephemeral generator, such as

done in PACE.

When using such an ephemeral generator, one will need to mix in the

entropy from both parties together with the password somehow. I think

that the method used by PACE for doing this is not so bad. We do want

predominantly protect the password. Note that regarding side channels it

might be beneficial to actually use the password only in a symmetric

encryption primitive (such as in PACE). I expect that it will be much

easier to protect a symmetric cypher against side channel leakage (e.g.

by masking) than e.g. a SHA variant function used for mixing the entropies.

augmented version of PACE for our bluetooth product line. It's not yet

published but we are working on a paper on that topic.

details, but this will indeed give an additional speedup for any SPEKE

and PACE implementation running on an curve for that Elligator2 is suitable.

Thank you again, Mike, for your valuable feedback.

Yours,

Björn.

In addition to FourQ and Curve19119, other fast-ish options include

2^252-2^232-1 (but maybe not on tiny micros);NIST’s 2^192-2^64-1 (but again maybe not on the M0?); and the

Goldi-like 2^216-2^108-1. But I agree that youshould stick with Curve25519 if you can afford it.

I've looked a bit closer at FourQ in the last days. I came to theconclusion that the speedup factor of ~2 reported by Liu, Longa,

Pereira, Reparaz and Seo [LLPRS] in https://eprint.iacr.org/2017/434.pdf

are realistic for the AVR and MSP430. I moreover expect that this might

hold also for the M0 target (not reported by LLPRS).

It must, however, be clear that one buys this with significantly larger

RAM requirements (that is most precious, particularly in the very small

targets) and I believe that it will also be much harder to protect FourQ

against side channel attacks, especially regarding fault injection,

without loosing the speed advantage.

Elligator 2 works for any elliptic curve (over a field of odd characteristic IIRC) with at least one point of

order 2. So you can use it on FourQ.

Thank you for pointing this out. I think that this makes FourQ for theseorder 2. So you can use it on FourQ.

targets also an interresting choice for PAKE.

I don't believe, however that the claimed huge speedup factor for the M4

from LLPRS actually holds. The Curve25519 figures taken for the

comparision with FourQ for the M4 (~1.4 million cycles cor X25519) are

in my perception much less optimized than the LLPRS implementation for

their "complex" composed field.

I expect that actual speed advantage of FourQ on the M4 might in fact

only be in the range of ~25%. For the M4, the UMAAL instruction gives

you carry accumulation essentially for free and I know that

constant-time field arithmetics on the M4 for the highly regular fe25519

field is actually a factor of ~1.5 faster for Curve25519 than the values

reported for FourQ by LLPRS. I think that this will eat up most of the

advantage from the endomorphisms. Actually this might hold also for

other "larger" targets bigger than the M4, where the regular 255 bit

field could be implemented ay faster than the more complex "complex"

field of FourQ?

In addition to FourQ and Curve19119, other fast-ish options include

NIST’s 2^192-2^64-1 (but again maybe not on the M0?);

The problem in my opinion is to implement the many conditional additionsNIST’s 2^192-2^64-1 (but again maybe not on the M0?);

for the solinas prime in constant time. At least, I did try it and I did

not find a way to implement that efficiently.

and the Goldi-like 2^216-2^108-1 or 2^252-2^232-1 (but maybe not on tiny micros);

Here I expect that the fact that the field is not really much smallerthan for 2^255 - 1 will be the reason that prevents significant speedups

in comparison to Curve25519.

I realized at some point that “SPAKE2-EE” is actually an existing protocol called “PAK”, which is why I didn’t ever write it up.

On a constrained device, a SPEKE variant (like PACE) is a faster and simpler choice. Also I thought the SPEKE patent expired, so why not use that? SPAKE2 (-EE and otherwise) has a stronger security proof, supports augmentation, and is potentially faster with precomputation.

Actually I did mix up EKE and SPEKE regarding the patent that hasOn a constrained device, a SPEKE variant (like PACE) is a faster and simpler choice. Also I thought the SPEKE patent expired, so why not use that? SPAKE2 (-EE and otherwise) has a stronger security proof, supports augmentation, and is potentially faster with precomputation.

expired this spring. We had chosen PACE predominantly because our first

products came on the market way earlier than the expiration date. I

think also that it is a good idea to make both parties contribute

entropy for generating a session-specific ephemeral generator, such as

done in PACE.

When using such an ephemeral generator, one will need to mix in the

entropy from both parties together with the password somehow. I think

that the method used by PACE for doing this is not so bad. We do want

predominantly protect the password. Note that regarding side channels it

might be beneficial to actually use the password only in a symmetric

encryption primitive (such as in PACE). I expect that it will be much

easier to protect a symmetric cypher against side channel leakage (e.g.

by masking) than e.g. a SHA variant function used for mixing the entropies.

I don’t know if there is an augmented version of SPEKE.

Yes it is possible to augment SPEKE. We have ourselves developed anaugmented version of PACE for our bluetooth product line. It's not yet

published but we are working on a paper on that topic.

It is possible to compute Elligator straight to affine with only one exponentiation, using the inverse square root trick.

You want x = -A/(1+ur^2), or -A-that, so you need to simultaneously divide by 1+ur^2 and compute the Legendre symbol of the putative y^2 = x(x^2+Ax+1). If you compute that projectively to get y^2 = a/b, then L(a/b) = L(ab), so you don’t need to do the division first. Let c=ab, d=1+ur^2. Then you can compute

s := (cd^2) ^ ((p-3)/2)

Then (s^2 * cd^2) = (cd^2)^(p-2) = 1/cd^2, so that 1/d = (s^2 * cd^2) * cd unless c=0 or d=0. But if c=0 then the point is (0,0) or infinity, and if d=0 then it’s infinity, so in either case you should return 0 anyway. At the same time, (s * cd^2) = (cd^2)^((p-1)/2) = L(cd^2) = L(c) unless d=0. So that gives you the inverse and Legendre symbol at the same time. A similar trick with (cd^2)^((p-5)/8) can give you the square root, but you don’t actually need that here if you’re using an x-only ladder.

Thank you for pointing this out. I did not yet find time to analyze theYou want x = -A/(1+ur^2), or -A-that, so you need to simultaneously divide by 1+ur^2 and compute the Legendre symbol of the putative y^2 = x(x^2+Ax+1). If you compute that projectively to get y^2 = a/b, then L(a/b) = L(ab), so you don’t need to do the division first. Let c=ab, d=1+ur^2. Then you can compute

s := (cd^2) ^ ((p-3)/2)

Then (s^2 * cd^2) = (cd^2)^(p-2) = 1/cd^2, so that 1/d = (s^2 * cd^2) * cd unless c=0 or d=0. But if c=0 then the point is (0,0) or infinity, and if d=0 then it’s infinity, so in either case you should return 0 anyway. At the same time, (s * cd^2) = (cd^2)^((p-1)/2) = L(cd^2) = L(c) unless d=0. So that gives you the inverse and Legendre symbol at the same time. A similar trick with (cd^2)^((p-5)/8) can give you the square root, but you don’t actually need that here if you’re using an x-only ladder.

details, but this will indeed give an additional speedup for any SPEKE

and PACE implementation running on an curve for that Elligator2 is suitable.

Thank you again, Mike, for your valuable feedback.

Yours,

Björn.