简体   繁体   中英

Is there any algebraic representation of natural numbers that allow for parallel addition?

Natural numbers can be represented in logarithmic space using a binary representation (here, in little-endian):

-- The type of binary numbers; little-endian; O = Zero, I = One
data Bin = O Bin | I Bin | End

Addition of a and b can then be implemented, for example, by calling the successor function ( O(log(N)) ) a times on b . The problem with this implementation is that it is inherently sequential. In order to add 2 numbers, suc calls are chained in sequence. Other implementations of add (such as using carry) suffer from the same issue. It is easy to see that addition couldn't be implemented in parallel with that representation. Is there any representation of natural numbers, using algebraic datatypes, which takes logarithm space, and on which addition can be done in parallel?

Code for illustration:

-- The usual fold  
fold :: Bin -> (t -> t) -> (t -> t) -> t -> t
fold (O bin) zero one end = zero (fold bin zero one end)
fold (I bin) zero one end = one (fold bin zero one end)
fold E zero one end       = end

-- Successor of `Bin` - `O(log(N))`
suc :: Bin -> Bin
suc (O bin) = I bin
suc (I bin) = O (suc bin)
suc E       = E

-- Calls a function `a` times
times :: Bin -> (t -> t) -> t -> t
times a f x     = fold a zero one end f where 
    one bin fs  = fs (bin (fs . fs))
    zero bin fs = bin (fs . fs)
    end fs      = x

-- Adds 2 binary numbers
add :: Bin -> Bin -> Bin
add a b = (a `times` suc) b

-- 1001 + 1000 = 0101
main = print $ add (I (O (O (I E)))) (I (O (O (O E))))

There are many parallel adder architectures. An excellent review is given in Thomas Walker Lynch's master's thesis at the university of Texas at Austin, 1996 . See section 9.1, where he summarizes the worst case path length.

The Lynch and Swartzlander adder (L&S) has a worst case path length of 2*ceil(log4(N))+2, where N is that number of bits. The architecture is presented in their paper A Spanning Tree Carry Lookahead Adder .

You can find excellent explanations about many simple architectures by googling "fast adder".

You didn't say that each natural number has to have a unique representation. So, here is another option. (I didn't invent it, but I can't remember what it's called, so I had to reconstruct how it works.)

Represent numbers as strings of digits in base 2 like in binary, except that rather than being restricted to the digits 0 and 1, we are additionally allowed to use the digit 2. So for example, the number 2 has two representations, 10 and 2.

To add two numbers represented in this way, just add them digitwise without carrying. Clearly we can do this efficiently in parallel (with a linear size, constant depth circuit). Well, now we have a problem: the resulting number has the right base-2 value, but its digits won't necessarily be 0, 1 or 2, but might be as large as 4.

So let's fix that with the following carry propagation pass: Write each digit of the result as a two-digit binary number where the second digit is 0 or 1, and the first digit can be 0, 1 or 2. (So 0 -> 00, 1 -> 01, 2 -> 10, 3 -> 11, 4 -> 20.) Now "carry" the first digit of each these two-digit numbers to the left, leaving behind the second digit, but without performing any other carrying on the result. For example, if we started with the number

 314102    (perhaps from the original problem of computing 212001 + 102101)

we perform the sum

11       = 3
 01      = 1
  20     = 4
   01    = 1
    00   = 0
     10  = 2
-------
1130110

The new number has the same base-2 value, and its digits are now in the range 0 to 3, since they are formed from the sum of a digit that was 0, 1 or 2 and a digit that was 0 or 1. Moreover each digit of the result only depends on the corresponding digit and the digit to its right in the number from the step, so this is implementable by another linear size, constant depth circuit.

This isn't quite good enough, so let's do the same carry propagation pass one more time. Now, 4 is no longer a possible input digit, so the digit that we carry can never be 2, only 0 or 1. So this time the procedure will result in a base 2 representation that uses only the digits 0, 1 and 2, which was what we wanted. In the example from above, the result will be 1210110 . (Of course, any other representation with the same base-2 value would be correct as well.)

(For other combinations of base and maximum digit, you might only need one carry propagation pass. For example if you represent numbers in base 3 using the digits 0, 1, 2, 3 and 4, the largest digit appearing in a sum is 8, so both digits involved in the carry propagation pass will be in the range 0, 1, 2 and their sum will already be at most 4.)


Now if you want to do other operations on this representation, such as comparing two numbers for equality, one option is to convert to binary, either serially in linear time, or in parallel by using a fast adder as described in Lior Kogan's answer. In fact, converting this representation to binary is essentially equivalent to the problem of adding numbers in binary, since we can consider a representation like 212001 as a "formal addition" 101000 + 111001 . However, you probably cannot test for equality in this representation in constant depth like you can for binary. I imagine that the "hardness" in this sense of either addition or equality testing is essential given your other constraints, though I don't know for sure.

One simple way how to describe parallel addition of natural numbers is this: If you look at the full adder circuit , it has 3 input bits and it outputs a 2-bit number saying how many of the input bits were 1. This is also why it's sometimes called 3:2 compressor . From them we can create a circuit that adds 3 binary numbers in parallel to produce 2 binary numbers. This is also called a carry-save adder circuit, because instead of propagating the carry bits, we keep them as another, separate number. Each number is then (redundantly) represented as a pair of binary numbers. And when adding k natural numbers, at each step we can reduce triplets to tuples, requiring only O(log k) time steps.

But the problem is that if we're only restricted to ADTs, we have a fixed set of constructors, each constructor having a finite number of records. So no matter what, the depth of such a structure will be O(log n) . And we have to perform O(log n) steps just to traverse the structure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM