Kyle Butt
2017-09-09 04:21:46 UTC
TLDR: How much do square roots matter for ECC primes?
I've been working on finding Granger Moss primes where t fits in a 32 bit
integer.
Along the way, I worked out tighter bounds for the min and max values after
reduction.
In the paper, they aren't explicit about how much extra room is needed to
handle additions for curve operations. For edwards curves, you need to
account for a factor of 6. This changes their formula
ceil(log_2(m / 2)) + 2k + 5 <= 2w
to:
ceil(log_2(3 * m) + 2k + 5 <= 2w
for m + 1 = 17 and 19, this requires k < 26.5
Similarly for their reduction algorithm, they work in bits, but I did the
math on the actual bounds.
Assuming z_i = 2^63 - 1, after 1 reduction the max value is 2^(63 - l) - 1
+ (t - c)
t - c = (b - 1) * c, the max value carried from z_(i+1), assuming c < b,
after 2 reductions the max value is:
2^(63 - 2*l) - 1 + c + (t - c). = 2^(63 - 2*l) + t - 1. The first c occurs
because t/b = c.
The minimum value is easier to calculate. In this case, the worst case
carry is 0.
so the min value is 2^(63 - 2 * l).
Upon multiplying these values are subtracted. If their difference is less
than 2^27.5, then we can adjust the formula above to:
ceil(log_2(3 * m) + 2k + 4 <= 2w
This does place a greater constraint on l, but it yields larger primes.
because we know that the result of the product takes 1 less bit.
This allows us to construct larger primes for m + 1 = 17, 19.
For m+1 = 19 we can construct a prime with 483 bits, specifically:
p = phi 18 t, where t = (2^3 - 1) * (2^24); b = 2^24; l = 24
Unfortunately, these primes are not fast square root primes, by
construction. It should be clear that p % (2^24) = 1. How much does this
matter? S is 24 in this case, so Tonelli Shanks should be reasonably fast.
Equality checking is also a little tricky due to the redundancy, but a
subtraction single round of modular carries can be done in parallel. Then
you can verify that all the limbs are the same value.
I found an edwards curve that satisfies the safecurves criteria, the
constant is unusual but rigid:
It is the least d in absolute value such that x^2 + y^2 = 1 + (d/b)x^2*y^2
has cofactor {4, 8} and so does its twist. The reason for choosing such a d
is that coefficient multiplication requires reduction, which means that
even for small constants, they must be in the montgomery domain, taking up
at least 2 limbs. But for a constant d/b it can be a single limb.
the least such d is: -56904
for that curve the trace
is: 1805876552616329139365429358604192390835234313183868353172046570215054410
I prototyped the code in Haskell, for quick turnaround and so that I have
something where I can print the intermediate values as test vectors for
other implementations. I can tidy it up and share it if there's interest.
It's not written for speed but for verification.
I plan to try and implement a cuda or opencl version, to try and take
advantage of the parallel nature of the primes.
Thoughts or suggestions? Would it be worthwhile to write any of this up in
a more formal setting?
I think at a minimum borrowing the half bit by computing tighter bounds is
pretty cool.
Kyle
I've been working on finding Granger Moss primes where t fits in a 32 bit
integer.
Along the way, I worked out tighter bounds for the min and max values after
reduction.
In the paper, they aren't explicit about how much extra room is needed to
handle additions for curve operations. For edwards curves, you need to
account for a factor of 6. This changes their formula
ceil(log_2(m / 2)) + 2k + 5 <= 2w
to:
ceil(log_2(3 * m) + 2k + 5 <= 2w
for m + 1 = 17 and 19, this requires k < 26.5
Similarly for their reduction algorithm, they work in bits, but I did the
math on the actual bounds.
Assuming z_i = 2^63 - 1, after 1 reduction the max value is 2^(63 - l) - 1
+ (t - c)
t - c = (b - 1) * c, the max value carried from z_(i+1), assuming c < b,
after 2 reductions the max value is:
2^(63 - 2*l) - 1 + c + (t - c). = 2^(63 - 2*l) + t - 1. The first c occurs
because t/b = c.
The minimum value is easier to calculate. In this case, the worst case
carry is 0.
so the min value is 2^(63 - 2 * l).
Upon multiplying these values are subtracted. If their difference is less
than 2^27.5, then we can adjust the formula above to:
ceil(log_2(3 * m) + 2k + 4 <= 2w
This does place a greater constraint on l, but it yields larger primes.
because we know that the result of the product takes 1 less bit.
This allows us to construct larger primes for m + 1 = 17, 19.
For m+1 = 19 we can construct a prime with 483 bits, specifically:
p = phi 18 t, where t = (2^3 - 1) * (2^24); b = 2^24; l = 24
Unfortunately, these primes are not fast square root primes, by
construction. It should be clear that p % (2^24) = 1. How much does this
matter? S is 24 in this case, so Tonelli Shanks should be reasonably fast.
Equality checking is also a little tricky due to the redundancy, but a
subtraction single round of modular carries can be done in parallel. Then
you can verify that all the limbs are the same value.
I found an edwards curve that satisfies the safecurves criteria, the
constant is unusual but rigid:
It is the least d in absolute value such that x^2 + y^2 = 1 + (d/b)x^2*y^2
has cofactor {4, 8} and so does its twist. The reason for choosing such a d
is that coefficient multiplication requires reduction, which means that
even for small constants, they must be in the montgomery domain, taking up
at least 2 limbs. But for a constant d/b it can be a single limb.
the least such d is: -56904
for that curve the trace
is: 1805876552616329139365429358604192390835234313183868353172046570215054410
I prototyped the code in Haskell, for quick turnaround and so that I have
something where I can print the intermediate values as test vectors for
other implementations. I can tidy it up and share it if there's interest.
It's not written for speed but for verification.
I plan to try and implement a cuda or opencl version, to try and take
advantage of the parallel nature of the primes.
Thoughts or suggestions? Would it be worthwhile to write any of this up in
a more formal setting?
I think at a minimum borrowing the half bit by computing tighter bounds is
pretty cool.
Kyle