简体   繁体   中英

efficient methods to do summation

Is there any efficient techniques to do the following summation ?

Given a finite set A containing n integers A={X1,X2,…,Xn} , where Xi is an integer. Now there are n subsets of A , denoted by A1, A2, ... , An . We want to calculate the summation for each subset. Are there some efficient techniques ?

(Note that n is typically larger than the average size of all the subsets of A .)

For example, if A={1,2,3,4,5,6,7,9} , A1={1,3,4,5} , A2={2,3,4} , A3= ... . A naive way of computing the summation for A1 and A2 needs 5 Flops for additions:

Sum(A1)=1+3+4+5=13

Sum(A2)=2+3+4=9

...

Now, if computing 3+4 first, and then recording its result 7, we only need 3 Flops for addtions:

Sum(A1)=1+7+5=13

Sum(A2)=2+7=9

...

What about the generalized case ? Is there any efficient methods to speed up the calculation? Thanks!

Assuming that 'addition' isn't simply an ADD operation but instead some very intensive function involving two integer operands, then an obvious approach would be to cache the results.

You could achieve that via a suitable data structure, for example a key-value dictionary containing keys formed by the two operands and the answers as the value.

But as you specified C in the question, then the simplest approach would be an n by n array of integers, where the solution to x + y is stored at array[x][y] .

You can then repeatedly iterate over the subsets, and for each pair of operands you check the appropriate position in the array. If no value is present then it must be calculated and placed in the array. The value then replaces the two operands in the subset and you iterate.

If the operation is commutative then the operands should be sorted prior to looking up the array (ie so that the first index is always the smallest of the two operands) as this will maximise "cache" hits.

For some choices of subsets there are ways to speed up the computation, if you don't mind doing some (potentially expensive) precomputation, but not for all. For instance, suppose your subsets are {1,2}, {2,3}, {3,4}, {4,5}, ..., {n-1,n}, {n,1}; then the naive approach uses one arithmetic operation per subset, and you obviously can't do better than that. On the other hand, if your subsets are {1}, {1,2}, {1,2,3}, {1,2,3,4}, ..., {1,2,...,n} then you can get by with n-1 arithmetic ops, whereas the naive approach is much worse.

Here's one way to do the precomputation. It will not always find optimal results. For each pair of subsets, define the transition cost to be min(size of symmetric difference, size of Y - 1). (The symmetric difference of X and Y is the set of things that are in X or Y but not both.) So the transition cost is the number of arithmetic operations you need to do to compute the sum of Y's elements, given the sum of X's. Add the empty set to your list of subsets, and compute a minimum-cost directed spanning tree using Edmonds' algorithm (http://en.wikipedia.org/wiki/Edmonds%27_algorithm) or one of the faster but more complicated variations on that theme. Now make sure that when your spanning tree has an edge X -> Y you compute X before Y. (This is a "topological sort" and can be done efficiently.)

This will give distinctly suboptimal results when, eg, you have {1,2}, {3,4}, {1,2,3,4}, {5,6}, {7,8}, {5,6,7,8}. After deciding your order of operations using the procedure above you could then do an optimization pass where you find cheaper ways to evaluate each set's sum given the sums already computed, and this will probably give fairly decent results in practice.

I suspect, but have made no attempt to prove, that finding an optimal procedure for a given set of subsets is NP-hard or worse. (It is certainly computable ; the set of possible computations you might do is finite. But, on the face of it, it may be awfully expensive; potentially you might be keeping track of about 2^n partial sums, be adding any one of them to any other at each step, and have up to about n^2 steps, for a super-naive cost of (2^2n)^(n^2) = 2^(2n^3) operations to try every possibility.)

A common optimization technique is to pre-compute intermediate results. In your case, you might pre-compute all sums with 2 summands from A and store them in a lookup table. This will result in |A|*|A+1|/2 table entries, where |A| is the cardinality of A .

In order to compute the element sum of Ai, you:

  • look up the sum of the first two elements of Ai and save them in tmp
  • while there is an element x left in Ai:
  • look up the sum of tmp and x

In order to compute the element sum of A1 = {1,3,4,5} from your example, you do the following:

  • lookup(1,3) = 4
  • lookup(4,4) = 8
  • lookup(8,5) = 13

Note that computing the sum of any given Ai doesn't require summation, since all the work has already been conducted while pre-computing the lookup table.

If you store the lookup table in a hash table, then lookup() is in O(1).


Possible optimizations to this approach:

  • construct the lookup table while computing the summation results; hence, you only compute those summations that you actually need. Your lookup table is now a cache.
  • if your addition operation is commutative, you can save half of your cache size by storing only those summations where the smaller summand comes first. Then modify lookup() such that lookup(a,b) = lookup(b,a) if a > b .

If assuming summation is time consuming action you can find LCS of every pair of subsets (by assuming they are sorted as mentioned in comments, or if they are not sorted sort them), after that calculate sum of LCS of maximum length (over all LCS in pairs), then replace it's value in related arrays with related numbers, update their LCS and continue this way till there is no LCS with more than one number. Sure this is not optimum, but it's better than naive algorithm (smaller number of summation). However you can do backtracking to find best solution.

eg For your sample input:

A1={1,3,4,5} , A2={2,3,4}

LCS (A_1,A_2) = {3,4} ==>7 ==>replace it:

A1={1,5,7}, A2={2,7} ==> LCS = {7}, maximum LCS length is `1`, so calculate sums.

Still you can improve it by calculation sum of two random numbers, then again taking LCS, ...

NO. There is no efficient techique.

Because it is NP complete problem. and there are no efficient solutions for such problem

why is it NP-complete?
We could use algorithm for this problem to solve set cover problem , just by putting extra set in set, conatining all elements.

Example: We have sets of elements
A1={1,2}, A2={2,3}, A3 = {3,4} We want to solve set cover problem.

we add to this set, set of numbers containing all elements A4 = {1,2,3,4}

We use algorhitm that John Smith is aking for and we check solution A4 is represented whit. We solved NP-Complete problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM