Join us on Discord!
You can help CodeWalrus stay online by donating here.

Topic: Fast 3-digit Squaring, Toom-Cook

Started by Zeda, June 26, 2018, 06:36:55 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Zeda

Hi all, I realized that I have a lot of unshared results that I've worked out over the years! (Then again, I have a lot of shared results :P )
Today's is dealing with squaring 3-digit numbers.
If you aren't familiar with the Toom-Cook algorithm, read the spoiler! It's cool stuff. Otherwise, continue down.
[spoiler=Prerequisite]
The Toom-Cook algorithm basically generalizes multiplying integers as multiplying polynomials.For example, if we want to multiply 47*81, we can think of it as multiplying (7+4x)*(1+8x) and evaluating at x=10 (since our numbers were in base 10). This seems like a lot of unnecessary complexity, and it is for such small numbers, but let's look at a few key ideas.

If you are given 3 points on a plane, you can construct a unique quadratic function that passes through each point. The key here is that it is unique meaning there are no other degree-2 polynomials that pass through each point-- there is only one and that one must exists. On the other hand given 4 points, there might be a parabola that passes through each point, but it is not a guarantee and is in fact unlikely. As well, given three points, there exist infinitely many cubic polynomials (or quartic or quintic, etc.) that pass through each point. In general, if you have n+1 points, you are guaranteed to be able to find a unique polynomial of degree n that is the only such polynomial that passes through each point.

Guarantees and uniqueness are cool and all, but actually constructing the polynomials from those points can be tricky (but still "quickly" doable in a computational sense)! Luckily for our case, the points won't be totally arbitrary. Let's look back at our Toom-Cook example of multiplying 47*81. The polynomial product is (7+4x)*(1+8x), which we know should be a quadratic equation. Then if we evaluate this product at three easy points, we should be able to reconstruct the result. Specifically, let's evaluate at x=-1, x=0, and x=1:
(7-4)*(1-8)=-21
7*1=7
(7+4)*(1+8)=99

The easiest way to reconstruct the polynomial by hand, in my opinion, is to use matrices to represent a system of equations. In this case, with ai being the points we evaluated at -1,0, and 1:

[[  1   (-1)1    (-1)2   a0]
[  1   01       02      a1]
[  1   11       12      a2]]


   [[  1  -1   1   a0]
=   [  1   0   0   a1]
    [  1   1   1   a2]]

And now we reduce this to reduced row-eschelon form.

   [[  0   1   0   (a2-a0)/2]
=>  [  1   0   0   a1]
    [  1   0   1   (a2+a0)/2]]


   [[  0   1   0   (a2-a0)/2]
=>  [  1   0   0   a1]
    [  0   0   1   (a2+a0)/2-a1]]


Now that we know the resulting coefficients, let's construct our polynomial:
a1+(a2-a0)x/2+((a2+a0)/2-a1)x2
=7+60x+32x2
Let's evaluate this at x=10 and we have 3200+600+7 = 3807 which is indeed 47*81.

The nice thing about this is, since we've already solved the system of equations, we can reuse this and we don't need to compute it every time. We just compute the product at -1, 0, and , then plug it into our formula et voila. In fact, let's make a BASIC program that multiplies two ten-digit numbers and returns a 20-digit result. Input is passed through two 2-element lists, L1 and L2, and use those elements as 5-digit 'digits' to compute the product.

{1,-1
sum(L1Ans)sum(L2Ans->A
L1(1)L2(1->B
sum(L1)sum(L2->C
.5(C-A
{B,Ans,Ans+A-B->L1
0->C
L1e-5->L1    ;engineering E
For(K,1,3
C+L1(k->L1(K
e-5int(Ans->C
End
C->L1(4
e5fPart(L1->L1

The output is a 20-digit integer comprised of four 5-digit numbers. The first element is the lowest 5 digits, the fourth element is the upper 5 digits.
[/spoiler]
To square a 3-digit number, we'll result in a degree-4 polynomial. Consequently, we'll need to evaluate at 5 points. The points I chose were {0,1,i,-1,-i} where 'i' is the imaginary unit. It may seem like evaluating at a complex point would be, well, complex, but we'll see. First, let's write our polynomial as a0+a1x+a2x2.
Then:
b0=f(0)=a02
b1=f(1)=(a0+a1+a2)2
b2=f(i)=(a0+a1i-a2)2
b3=f(-1)=(a0-a1+a2)2
b4=f(-i)=(a0-ia1-a2)2
Squaring complex numbers is easy though-- (a+bi)2= a2-b2+2abi = (a-b)(a+b)+2abi
So simplifying:
b2=(a0-a2+a1)(a0-a2-a1)+2i(a0-a2)a1
b4=(a0-a2+a1)(a0-a2-a1)-2i(a0-a2)a1

Noticing symmetry, let's define instead:
c3=(a0-a2+a1)(a0-a2-a1)
c4=2(a0-a2)a1

Then:
b2=c3+ic4
b4=c3-ic4


    1       0       0       0       0   b0
    1       1       1       1       1   b1
    1       i      -1      -i       1   c3+ic4
    1      -1       1      -1       1   b3
    1      -i      -1       i       1   c3-ic4


    1       0       0       0       0   b0
    1       0       1       0       1   (b1+b3)/2 = c1
=>  0       1       0       1       0   (b1-b3)/2 = c2
    1       0      -1       0       1   c3
    0       1       0      -1       0   c4


    1       0       0       0       0   b0
    1       0       0       0       1   (c1+c3)/2 = d1
=>  0       0       1       0       0   (c1-c3)/2 = d2
    0       1       0       0       0   (c2+c4)/2 = d3
    0       0       0       1       0   (c2-c4)/2 = d4


Then our polynomial is:
b0+d3x+d2x2+d4x3+(d1-b0)x4


So for an example, let's find 348*348
Then we have:
a0=8
a1=4
a2=3
b0=64
b1=(8+4+3)2=225
b3=(8-4+3)2=49
c3=(8-3+4)(8-3-4)=9
c4=2(8-3)*4=40
c1=(225+49)/2 = 137
c2=c1-49 = 88
d1=(137+9)/2 = 73
d2=d1-9 = 64
d3=(88+40)/2 = 64
d4=d3-40 = 24

So then, our result is:
64+64x+64x2+24x3+9x4
Evaluating at x=10 eyelids yields:
64+640+6400+24000+90000 = 121104, which is indeed 384*384


Now let's generate some Python code:

def sqr3(a,b,c):
    d=a+b+c    #roughly 75% of cases exceed the original bit-width
    d*=d
    e=a-b+c    #roughly 1/6 of the time this will exceed the original bit-width. However, whenever this exceeds the bit-width, then d also exceeds it.
    e*=e
    d=(d+e)/2
    e=d-e
    f=(a+b-c)*(a-b-c)   #it is never the case that both multiplicands exceed  their original word width, but about 1/3 of the time, one of them exceeds it (~2/3 of the time, neither exceed).
    b*=(a-c)        #neither multiplicand exceeds the original bit-width (though the second operand may become negative).
    b+=b
    a*=a
    e=(e+b)/2
    b=e-b
    d=(d-f)/2
    f+=d-a
    return [a,e,d,b,f]

Note that three of the sub-multiplications are just squaring the input. Also note that all divisions are by a power of two, and all results and operations are integer operations, not complex (assuming all inputs are integers).

Powered by EzPortal