Kyle Butt

2017-09-09 04:21:46 UTC

TLDR: How much do square roots matter for ECC primes?

I've been working on finding Granger Moss primes where t fits in a 32 bit

integer.

Along the way, I worked out tighter bounds for the min and max values after

reduction.

In the paper, they aren't explicit about how much extra room is needed to

handle additions for curve operations. For edwards curves, you need to

account for a factor of 6. This changes their formula

ceil(log_2(m / 2)) + 2k + 5 <= 2w

to:

ceil(log_2(3 * m) + 2k + 5 <= 2w

for m + 1 = 17 and 19, this requires k < 26.5

Similarly for their reduction algorithm, they work in bits, but I did the

math on the actual bounds.

Assuming z_i = 2^63 - 1, after 1 reduction the max value is 2^(63 - l) - 1

+ (t - c)

t - c = (b - 1) * c, the max value carried from z_(i+1), assuming c < b,

after 2 reductions the max value is:

2^(63 - 2*l) - 1 + c + (t - c). = 2^(63 - 2*l) + t - 1. The first c occurs

because t/b = c.

The minimum value is easier to calculate. In this case, the worst case

carry is 0.

so the min value is 2^(63 - 2 * l).

Upon multiplying these values are subtracted. If their difference is less

than 2^27.5, then we can adjust the formula above to:

ceil(log_2(3 * m) + 2k + 4 <= 2w

This does place a greater constraint on l, but it yields larger primes.

because we know that the result of the product takes 1 less bit.

This allows us to construct larger primes for m + 1 = 17, 19.

For m+1 = 19 we can construct a prime with 483 bits, specifically:

p = phi 18 t, where t = (2^3 - 1) * (2^24); b = 2^24; l = 24

Unfortunately, these primes are not fast square root primes, by

construction. It should be clear that p % (2^24) = 1. How much does this

matter? S is 24 in this case, so Tonelli Shanks should be reasonably fast.

Equality checking is also a little tricky due to the redundancy, but a

subtraction single round of modular carries can be done in parallel. Then

you can verify that all the limbs are the same value.

I found an edwards curve that satisfies the safecurves criteria, the

constant is unusual but rigid:

It is the least d in absolute value such that x^2 + y^2 = 1 + (d/b)x^2*y^2

has cofactor {4, 8} and so does its twist. The reason for choosing such a d

is that coefficient multiplication requires reduction, which means that

even for small constants, they must be in the montgomery domain, taking up

at least 2 limbs. But for a constant d/b it can be a single limb.

the least such d is: -56904

for that curve the trace

is: 1805876552616329139365429358604192390835234313183868353172046570215054410

I prototyped the code in Haskell, for quick turnaround and so that I have

something where I can print the intermediate values as test vectors for

other implementations. I can tidy it up and share it if there's interest.

It's not written for speed but for verification.

I plan to try and implement a cuda or opencl version, to try and take

advantage of the parallel nature of the primes.

Thoughts or suggestions? Would it be worthwhile to write any of this up in

a more formal setting?

I think at a minimum borrowing the half bit by computing tighter bounds is

pretty cool.

Kyle

I've been working on finding Granger Moss primes where t fits in a 32 bit

integer.

Along the way, I worked out tighter bounds for the min and max values after

reduction.

In the paper, they aren't explicit about how much extra room is needed to

handle additions for curve operations. For edwards curves, you need to

account for a factor of 6. This changes their formula

ceil(log_2(m / 2)) + 2k + 5 <= 2w

to:

ceil(log_2(3 * m) + 2k + 5 <= 2w

for m + 1 = 17 and 19, this requires k < 26.5

Similarly for their reduction algorithm, they work in bits, but I did the

math on the actual bounds.

Assuming z_i = 2^63 - 1, after 1 reduction the max value is 2^(63 - l) - 1

+ (t - c)

t - c = (b - 1) * c, the max value carried from z_(i+1), assuming c < b,

after 2 reductions the max value is:

2^(63 - 2*l) - 1 + c + (t - c). = 2^(63 - 2*l) + t - 1. The first c occurs

because t/b = c.

The minimum value is easier to calculate. In this case, the worst case

carry is 0.

so the min value is 2^(63 - 2 * l).

Upon multiplying these values are subtracted. If their difference is less

than 2^27.5, then we can adjust the formula above to:

ceil(log_2(3 * m) + 2k + 4 <= 2w

This does place a greater constraint on l, but it yields larger primes.

because we know that the result of the product takes 1 less bit.

This allows us to construct larger primes for m + 1 = 17, 19.

For m+1 = 19 we can construct a prime with 483 bits, specifically:

p = phi 18 t, where t = (2^3 - 1) * (2^24); b = 2^24; l = 24

Unfortunately, these primes are not fast square root primes, by

construction. It should be clear that p % (2^24) = 1. How much does this

matter? S is 24 in this case, so Tonelli Shanks should be reasonably fast.

Equality checking is also a little tricky due to the redundancy, but a

subtraction single round of modular carries can be done in parallel. Then

you can verify that all the limbs are the same value.

I found an edwards curve that satisfies the safecurves criteria, the

constant is unusual but rigid:

It is the least d in absolute value such that x^2 + y^2 = 1 + (d/b)x^2*y^2

has cofactor {4, 8} and so does its twist. The reason for choosing such a d

is that coefficient multiplication requires reduction, which means that

even for small constants, they must be in the montgomery domain, taking up

at least 2 limbs. But for a constant d/b it can be a single limb.

the least such d is: -56904

for that curve the trace

is: 1805876552616329139365429358604192390835234313183868353172046570215054410

I prototyped the code in Haskell, for quick turnaround and so that I have

something where I can print the intermediate values as test vectors for

other implementations. I can tidy it up and share it if there's interest.

It's not written for speed but for verification.

I plan to try and implement a cuda or opencl version, to try and take

advantage of the parallel nature of the primes.

Thoughts or suggestions? Would it be worthwhile to write any of this up in

a more formal setting?

I think at a minimum borrowing the half bit by computing tighter bounds is

pretty cool.

Kyle