Discussion:
libsecp256k1's novel(?) ECDSA verification optimization
(too old to reply)
Brian Smith
2016-03-23 12:16:03 UTC
Permalink
Hi,

[I am not sure if boring topics like ECDSA are appropriate for this list. I
hope this is interesting enough.]

ECDSA signature verification is quite expensive. A big part of why it is
expensive is the two inversions--one mod q, one mod n--that are typically
used.

A while back I stumbled across an interesting optimization [1] in
libsecp256k1. The optimization completely avoids the second inversion
during verification.

The comments in the code explain how, but here's a rough summary: Normally
we convert the Jacobian coordinates (X, Y, Z) of the point multiplication
result to affine (X, Y) so that the affine X coordinate can be compared to
the signature's R component. The conversion to affine coordinates requires
the inversion of Z. But, instead of doing that, we can simply multiply the
signature's R component by Z**2 and then compare it with the *Jacobian* X
coordinate, avoiding any inversion.

I asked Greg Maxwell, the author of that code, about it and he didn't know
of anybody else using this optimization.

The optimization has two important properties:
1. It make verification notably (but not hugely) faster.
2. It reduces the amount of code required by an enjoyable amount, if one is
writing prime- specific specialized inversion routines.

Two questions:

1. Does anybody know of prior published software or papers documenting this?

2. Does anybody know why it would be a bad idea to use this technique? I.e.
am I overlooking some reason why it doesn't actually work?

[1]
https://github.com/bitcoin/secp256k1/blob/269d4227038b188128353235a272a8f030c307b1/src/ecdsa_impl.h#L225-L253
(shortened: https://git.io/vad3K)

Thanks,
Brian
--
https://briansmith.org/
Rene Struik
2016-03-23 13:01:35 UTC
Permalink
Hi Brian:

With ECDSA, one has R:=(1/s)(eG+rQ), where e:=h(m), and r= x(R) mod n.

If R=(X, Y, Z) in Jacobian coordinates, then x(R)=X/(Z^2), where
computations are over GFp.

One has x(R) Z^2 = X, which is equivalent to r Z^2 = X only if the
modular reduction mod n does not do anything. For secp256k1, one has
n<p, so for the tiny fraction of x(R)'s in the interval [n,p-1], this
yields the wrong result.

The equation is always correct, had ECDSA been defined with r=x(R),
i.e., without the mod n reduction step to compute r.

Please note that if x(R) in the interval [n,p-1], then r=x(R) mod n is
in the interval [0,p-n-1], so one could still apply the trick in the
vast majority of cases, by simply incorporating a test on whether r >
p-n-1 and applying the trick if so.

Best regards, Rene
Post by Brian Smith
Hi,
[I am not sure if boring topics like ECDSA are appropriate for this
list. I hope this is interesting enough.]
ECDSA signature verification is quite expensive. A big part of why it
is expensive is the two inversions--one mod q, one mod n--that are
typically used.
A while back I stumbled across an interesting optimization [1] in
libsecp256k1. The optimization completely avoids the second inversion
during verification.
Normally we convert the Jacobian coordinates (X, Y, Z) of the point
multiplication result to affine (X, Y) so that the affine X coordinate
can be compared to the signature's R component. The conversion to
affine coordinates requires the inversion of Z. But, instead of doing
that, we can simply multiply the signature's R component by Z**2 and
then compare it with the *Jacobian* X coordinate, avoiding any inversion.
I asked Greg Maxwell, the author of that code, about it and he didn't
know of anybody else using this optimization.
1. It make verification notably (but not hugely) faster.
2. It reduces the amount of code required by an enjoyable amount, if
one is writing prime- specific specialized inversion routines.
1. Does anybody know of prior published software or papers documenting this?
2. Does anybody know why it would be a bad idea to use this technique?
I.e. am I overlooking some reason why it doesn't actually work?
[1]
https://github.com/bitcoin/secp256k1/blob/269d4227038b188128353235a272a8f030c307b1/src/ecdsa_impl.h#L225-L253
(shortened: https://git.io/vad3K)
Thanks,
Brian
--
https://briansmith.org/
_______________________________________________
Curves mailing list
https://moderncrypto.org/mailman/listinfo/curves
--
email: ***@gmail.com | Skype: rstruik
cell: +1 (647) 867-5658 | US: +1 (415) 690-7363
Ruggero SUSELLA
2016-03-23 13:16:31 UTC
Permalink
Hello Rene,

Please note that in the code they are checking the *two* conditions (using your notation with r already mod n):

(r * Z^2 mod p == x(R) || (r + n < p && (r + n) * Z^2 mod p == x(R))

This should work for all cases where n<p. It will work also when p>n but in this case the first condition is enough.

B.t.w. I never saw this trick.

Best Regards,
Ruggero

From: Curves [mailto:curves-***@moderncrypto.org] On Behalf Of Rene Struik
Sent: Wednesday, March 23, 2016 2:02 PM
To: Brian Smith
Cc: ***@moderncrypto.org
Subject: Re: [curves] libsecp256k1's novel(?) ECDSA verification optimization

Hi Brian:

With ECDSA, one has R:=(1/s)(eG+rQ), where e:=h(m), and r= x(R) mod n.

If R=(X, Y, Z) in Jacobian coordinates, then x(R)=X/(Z^2), where computations are over GFp.

One has x(R) Z^2 = X, which is equivalent to r Z^2 = X only if the modular reduction mod n does not do anything. For secp256k1, one has n<p, so for the tiny fraction of x(R)'s in the interval [n,p-1], this yields the wrong result.

The equation is always correct, had ECDSA been defined with r=x(R), i.e., without the mod n reduction step to compute r.

Please note that if x(R) in the interval [n,p-1], then r=x(R) mod n is in the interval [0,p-n-1], so one could still apply the trick in the vast majority of cases, by simply incorporating a test on whether r > p-n-1 and applying the trick if so.

Best regards, Rene


On 3/23/2016 8:16 AM, Brian Smith wrote:
Hi,

[I am not sure if boring topics like ECDSA are appropriate for this list. I hope this is interesting enough.]

ECDSA signature verification is quite expensive. A big part of why it is expensive is the two inversions--one mod q, one mod n--that are typically used.

A while back I stumbled across an interesting optimization [1] in libsecp256k1. The optimization completely avoids the second inversion during verification.

The comments in the code explain how, but here's a rough summary: Normally we convert the Jacobian coordinates (X, Y, Z) of the point multiplication result to affine (X, Y) so that the affine X coordinate can be compared to the signature's R component. The conversion to affine coordinates requires the inversion of Z. But, instead of doing that, we can simply multiply the signature's R component by Z**2 and then compare it with the *Jacobian* X coordinate, avoiding any inversion.

I asked Greg Maxwell, the author of that code, about it and he didn't know of anybody else using this optimization.

The optimization has two important properties:
1. It make verification notably (but not hugely) faster.
2. It reduces the amount of code required by an enjoyable amount, if one is writing prime- specific specialized inversion routines.

Two questions:

1. Does anybody know of prior published software or papers documenting this?

2. Does anybody know why it would be a bad idea to use this technique? I.e. am I overlooking some reason why it doesn't actually work?

[1] https://github.com/bitcoin/secp256k1/blob/269d4227038b188128353235a272a8f030c307b1/src/ecdsa_impl.h#L225-L253 (shortened: https://git.io/vad3K)

Thanks,
Brian
--
https://briansmith.org/





_______________________________________________

Curves mailing list

***@moderncrypto.org<mailto:***@moderncrypto.org>

https://moderncrypto.org/mailman/listinfo/curves




--

email: ***@gmail.com<mailto:***@gmail.com> | Skype: rstruik

cell: +1 (647) 867-5658 | US: +1 (415) 690-7363
Gregory Maxwell
2016-03-23 17:01:17 UTC
Permalink
Post by Brian Smith
Hi,
[I am not sure if boring topics like ECDSA are appropriate for this list. I
hope this is interesting enough.]
It's no less useful for Schnorr (just even more obvious there), and in
that case it permits a verification to be implemented completely
without any modular inversion.

For libsecp256k1 it's roughly a 4% speedup; performance would depend
on how optimized everything else is compared to your modular
inversion, of course.

As far as I know the optimization is novel, though it seems too
obvious to be new... (unless it's broken, of course, :-) )

As noted by Rene Struik the implementation must check all conditions
that result from modular reduction, and as noted by Ruggero SUSELLA,
ours does. Libsecp256k1 also includes test vectors that try the
conditions/decisions of these branches, so if any were omitted the
tests would fail. I would encourage other implementers who adopted
this optimization to also include specific tests for it passing and
failing each branch-- to avoid a later developer optimizing away an
'impossible' case. Though these cases are 'rare', the triggering
conditions can be generated by backwards computing public keys from
constructed signatures where R.x is a very high value.

For ECDSA the approach is one that loses its utility with larger
trace/co-factor; as there end up being many cases to check; so it may
not be worthwhile for some curves.

Going outside of your likely domain of interest-- One limitation we
have had with it is that for our future applications we want batchable
schnorr signatures which behave identically in batch and non-batch
verifiers. Batching require r's sign be provided or specified. If this
trick were used naively in the non-batch verifier then it would admit
signatures that the batch verifier would reject (ones with an
incorrect sign)-- violating our consistency requirement. If the sign
is checked either of the obvious ways (using either a sqrt and the
curve equation on r.x, or a modular inverse to project R.y back to
affine (which could have been batched with the projection of R.x))
then the performance gain from this improvement is lost. We do have a
cute trick to solve this now, whos efficiency specific to some fields
(basically by specifying the signer choose R.y to be a quadratic
residue the sign check can be done by the non-batch verifier in the
protective coordinates with just a legendre symbol).
Brian Smith
2016-06-14 21:49:10 UTC
Permalink
Post by Gregory Maxwell
Post by Brian Smith
[I am not sure if boring topics like ECDSA are appropriate for this list. I
hope this is interesting enough.]
It's no less useful for Schnorr (just even more obvious there), and in
that case it permits a verification to be implemented completely
without any modular inversion.
Yes, that's a good point. I will be implementing it for EdDSA,
specifically Ed25519, soon. Since Ed25519 has cofactor > 1, the trick
must be generalized to try more than two multiples of `r`.
Post by Gregory Maxwell
As noted by Rene Struik the implementation must check all conditions
that result from modular reduction, and as noted by Ruggero SUSELLA,
ours does. Libsecp256k1 also includes test vectors that try the
conditions/decisions of these branches, so if any were omitted the
tests would fail. I would encourage other implementers who adopted
this optimization to also include specific tests for it passing and
failing each branch-- to avoid a later developer optimizing away an
'impossible' case. Though these cases are 'rare', the triggering
conditions can be generated by backwards computing public keys from
constructed signatures where R.x is a very high value.
Good point. I already had to help another person fix their implementation.

I have attached some test vectors to this message for ECDSA P-256 and
P-384 that were based on what you suggested above: Choose an (r, s)
where r is in the range we want to test, and then recover the public
key from the signature. However, I wasn't able to generate test
vectors that test the case where r == n this way, since `n` is not a
perfect square. So, if anybody has test vectors for the r == n case,
and/or r == n - 1 and r == n + 1, for P-256 and P-384 in particular, I
would love to get them.
Post by Gregory Maxwell
For ECDSA the approach is one that loses its utility with larger
trace/co-factor; as there end up being many cases to check; so it may
not be worthwhile for some curves.
Also, for Ed25519, your suggested method of generating test vectors
doesn't work, because public keys can't be recovered from EdDSA
signatures. So, I am hoping somebody can share test vectors for
Ed25519 that cover all the cases: r < q - 8n, r < q - 7n, ... r < q -
n, r == n, n < r < q, or suggest a realistic method for finding some.

Thanks for sharing!

Cheers,
Brian
--
https://briansmith.org/
Rene Struik
2016-06-14 23:02:57 UTC
Permalink
Hi Brian:

With Ed25519, the signature is (r,s), where R=sG + hQ, h=H(r,Q,m), and
wher r is a compressed version of R (with y-coordinate of R showing in GFp).

Thus, if one verifies such a signature (r,s) w.r.t. public key Q, one
computes h=H(r,Q,m), R':=sG + hQ, and can do the check r ~ R' in GFp
(without corner cases that arise in ECDSA because of Zp to Zn
conversion). So, I do not see how one would end up having conditions
that check depending on whether r< q- 8n, r<q-7n, .... in the EdDSA
case. Doesn't the "projective/affine test backwards" test always work in
GFp here? (see my March 23th note below). But, perhaps, I am missing
something ...

As final note, with ECDSA and EdDSA, one can also speed-up verification
by massaging the equation R=sG + h Q and turning this into an equation
of the type v R=s0 G0 + s1 G1 + u Q, where each scalar u, v, s0, s1 is
half-size and performing a multiple-point multiplication (here, G0:=G is
base point, G1:= tG, t=2^{128}, and (u,v) can be computed from h=u/v via
the Extended Euclidean Algorithm). {Obviously, this does require some Zn
arithmetic for EEA and point decompression for R, so the roughly 30%
speed-up has some trade-offs}. Details are in an old SAC 2005 paper.
Some verification in V2V networks can use this method (P1609.2). With
the "bitcoin" curve secp256k1, one can use endomorphisms to achieve
similar acceleration as in the SAC2005 paper, via the GLV method (this
also results in half-size scalars).

Best regards, Rene

[excerpt email RS as of March 23, 2016, 9.01am EST]
The equation is always correct, had ECDSA been defined with r=x(R),
i.e., without the mod n reduction step to compute r.
Post by Brian Smith
Post by Gregory Maxwell
Post by Brian Smith
[I am not sure if boring topics like ECDSA are appropriate for this list. I
hope this is interesting enough.]
It's no less useful for Schnorr (just even more obvious there), and in
that case it permits a verification to be implemented completely
without any modular inversion.
Yes, that's a good point. I will be implementing it for EdDSA,
specifically Ed25519, soon. Since Ed25519 has cofactor > 1, the trick
must be generalized to try more than two multiples of `r`.
Post by Gregory Maxwell
As noted by Rene Struik the implementation must check all conditions
that result from modular reduction, and as noted by Ruggero SUSELLA,
ours does. Libsecp256k1 also includes test vectors that try the
conditions/decisions of these branches, so if any were omitted the
tests would fail. I would encourage other implementers who adopted
this optimization to also include specific tests for it passing and
failing each branch-- to avoid a later developer optimizing away an
'impossible' case. Though these cases are 'rare', the triggering
conditions can be generated by backwards computing public keys from
constructed signatures where R.x is a very high value.
Good point. I already had to help another person fix their implementation.
I have attached some test vectors to this message for ECDSA P-256 and
P-384 that were based on what you suggested above: Choose an (r, s)
where r is in the range we want to test, and then recover the public
key from the signature. However, I wasn't able to generate test
vectors that test the case where r == n this way, since `n` is not a
perfect square. So, if anybody has test vectors for the r == n case,
and/or r == n - 1 and r == n + 1, for P-256 and P-384 in particular, I
would love to get them.
Post by Gregory Maxwell
For ECDSA the approach is one that loses its utility with larger
trace/co-factor; as there end up being many cases to check; so it may
not be worthwhile for some curves.
Also, for Ed25519, your suggested method of generating test vectors
doesn't work, because public keys can't be recovered from EdDSA
signatures. So, I am hoping somebody can share test vectors for
Ed25519 that cover all the cases: r < q - 8n, r < q - 7n, ... r < q -
n, r == n, n < r < q, or suggest a realistic method for finding some.
Thanks for sharing!
Cheers,
Brian
_______________________________________________
Curves mailing list
https://moderncrypto.org/mailman/listinfo/curves
--
email: ***@gmail.com | Skype: rstruik
cell: +1 (647) 867-5658 | US: +1 (415) 690-7363
Brian Smith
2016-06-14 23:30:42 UTC
Permalink
Post by Rene Struik
Thus, if one verifies such a signature (r,s) w.r.t. public key Q, one
computes h=H(r,Q,m), R':=sG + hQ, and can do the check r ~ R' in GFp
(without corner cases that arise in ECDSA because of Zp to Zn conversion).
So, I do not see how one would end up having conditions that check depending
on whether r< q- 8n, r<q-7n, .... in the EdDSA case. Doesn't the
"projective/affine test backwards" test always work in GFp here? (see my
March 23th note below). But, perhaps, I am missing something ...
I think the test vectors are still interesting because they would
catch the case where people are wrongly trying to use the trick by
also testing r + in for i > 1. For example, in the case of ECDSA, one
implementation already was already wrongly testing r + n == X (mod q)
even when r > q - n, which was just wrong. You can imagine how such
bugs could happen due to people trying to generalize the logic too
much. In fact, I think such bugs are likely.
Post by Rene Struik
As final note, with ECDSA and EdDSA, one can also speed-up verification by
massaging the equation R=sG + h Q and turning this into an equation of the
type v R=s0 G0 + s1 G1 + u Q, where each scalar u, v, s0, s1 is half-size
and performing a multiple-point multiplication (here, G0:=G is base point,
G1:= tG, t=2^{128}, and (u,v) can be computed from h=u/v via the Extended
Euclidean Algorithm). {Obviously, this does require some Zn arithmetic for
EEA and point decompression for R, so the roughly 30% speed-up has some
trade-offs}. Details are in an old SAC 2005 paper.
I believe you may be referring to
http://cacr.uwaterloo.ca/techreports/2005/cacr2005-28.pdf?

Thanks,
Brian
Rene Struik
2016-06-15 00:11:36 UTC
Permalink
Apologies for not providing the link to the SAC 2005 paper myself. Your
link is correct.

I do agree that testing corner cases to catch flaws has merit. (I just
did not see how the inverse of the mapping from Zp to Zn comes into play
with EdDSA.)

Best regards, Rene
Post by Brian Smith
Post by Rene Struik
Thus, if one verifies such a signature (r,s) w.r.t. public key Q, one
computes h=H(r,Q,m), R':=sG + hQ, and can do the check r ~ R' in GFp
(without corner cases that arise in ECDSA because of Zp to Zn conversion).
So, I do not see how one would end up having conditions that check depending
on whether r< q- 8n, r<q-7n, .... in the EdDSA case. Doesn't the
"projective/affine test backwards" test always work in GFp here? (see my
March 23th note below). But, perhaps, I am missing something ...
I think the test vectors are still interesting because they would
catch the case where people are wrongly trying to use the trick by
also testing r + in for i > 1. For example, in the case of ECDSA, one
implementation already was already wrongly testing r + n == X (mod q)
even when r > q - n, which was just wrong. You can imagine how such
bugs could happen due to people trying to generalize the logic too
much. In fact, I think such bugs are likely.
Post by Rene Struik
As final note, with ECDSA and EdDSA, one can also speed-up verification by
massaging the equation R=sG + h Q and turning this into an equation of the
type v R=s0 G0 + s1 G1 + u Q, where each scalar u, v, s0, s1 is half-size
and performing a multiple-point multiplication (here, G0:=G is base point,
G1:= tG, t=2^{128}, and (u,v) can be computed from h=u/v via the Extended
Euclidean Algorithm). {Obviously, this does require some Zn arithmetic for
EEA and point decompression for R, so the roughly 30% speed-up has some
trade-offs}. Details are in an old SAC 2005 paper.
I believe you may be referring to
http://cacr.uwaterloo.ca/techreports/2005/cacr2005-28.pdf?
Thanks,
Brian
--
email: ***@gmail.com | Skype: rstruik
cell: +1 (647) 867-5658 | US: +1 (415) 690-7363
Loading...