简体   繁体   English

有效的方法来进行求和

[英]efficient methods to do summation

Is there any efficient techniques to do the following summation ? 是否有任何有效的技术可以进行以下求和?

Given a finite set A containing n integers A={X1,X2,…,Xn} , where Xi is an integer. 给定有限集A,其包含n个整数A = {X1,X2,...,Xn} ,其中Xi是整数。 Now there are n subsets of A , denoted by A1, A2, ... , An . 现在 N个子集,由A1,A2,...,一个表示。 We want to calculate the summation for each subset. 我们想要计算每个子集的总和。 Are there some efficient techniques ? 有一些有效的技术吗?

(Note that n is typically larger than the average size of all the subsets of A .) (注意, n通常大于A的所有子集的平均大小。)

For example, if A={1,2,3,4,5,6,7,9} , A1={1,3,4,5} , A2={2,3,4} , A3= ... . 例如,如果A = {1,2,3,4,5,6,7,9}A1 = {1,3,4,5}A2 = {2,3,4} ,则A3 = .. 。 A naive way of computing the summation for A1 and A2 needs 5 Flops for additions: 一种计算A1A2总和的简单方法需要5个Flops来添加:

Sum(A1)=1+3+4+5=13 总和(A1)= 1 + 3 + 4 + 5 = 13

Sum(A2)=2+3+4=9 总和(A2)= 2 + 3 + 4 = 9

... ...

Now, if computing 3+4 first, and then recording its result 7, we only need 3 Flops for addtions: 现在,如果首先计算3 + 4,然后记录其结果7,我们只需要3个Flops进行添加:

Sum(A1)=1+7+5=13 总和(A1)= 1 + 7 + 5 = 13

Sum(A2)=2+7=9 总和(A2)= 2 + 7 = 9

... ...

What about the generalized case ? 广义案例怎么样? Is there any efficient methods to speed up the calculation? 有没有有效的方法来加速计算? Thanks! 谢谢!

Assuming that 'addition' isn't simply an ADD operation but instead some very intensive function involving two integer operands, then an obvious approach would be to cache the results. 假设'addition'不仅仅是一个ADD操作,而是一些涉及两个整数操作数的非常密集的函数,那么一个明显的方法是缓存结果。

You could achieve that via a suitable data structure, for example a key-value dictionary containing keys formed by the two operands and the answers as the value. 您可以通过合适的数据结构来实现这一点,例如包含由两个操作数组成的键的键值字典以及作为值的答案。

But as you specified C in the question, then the simplest approach would be an n by n array of integers, where the solution to x + y is stored at array[x][y] . 但是当你在问题中指定C时,最简单的方法是nn整数数组,其中x + y的解存储在array[x][y]

You can then repeatedly iterate over the subsets, and for each pair of operands you check the appropriate position in the array. 然后,您可以重复遍历子集,并为每对操作数检查数组中的适当位置。 If no value is present then it must be calculated and placed in the array. 如果没有值,则必须计算它并将其放入数组中。 The value then replaces the two operands in the subset and you iterate. 然后,该值将替换子集中的两个操作数,并进行迭代。

If the operation is commutative then the operands should be sorted prior to looking up the array (ie so that the first index is always the smallest of the two operands) as this will maximise "cache" hits. 如果操作是可交换的,则应在查找数组之前对操作数进行排序(即,使得第一个索引始终是两个操作数中的最小值),因为这将最大化“高速缓存”命中。

For some choices of subsets there are ways to speed up the computation, if you don't mind doing some (potentially expensive) precomputation, but not for all. 对于某些子集选择,有一些方法可以加快计算速度,如果你不介意做一些(可能很昂贵的)预计算,但并不是所有人都可以。 For instance, suppose your subsets are {1,2}, {2,3}, {3,4}, {4,5}, ..., {n-1,n}, {n,1}; 例如,假设您的子集是{1,2},{2,3},{3,4},{4,5},...,{n-1,n},{n,1}; then the naive approach uses one arithmetic operation per subset, and you obviously can't do better than that. 那么天真的方法每个子集使用一个算术运算,你显然不能做得更好。 On the other hand, if your subsets are {1}, {1,2}, {1,2,3}, {1,2,3,4}, ..., {1,2,...,n} then you can get by with n-1 arithmetic ops, whereas the naive approach is much worse. 另一方面,如果您的子集是{1},{1,2},{1,2,3},{1,2,3,4},...,{1,2,...,那么你可以用n-1算术运算得到,而天真的方法则要糟糕得多。

Here's one way to do the precomputation. 这是进行预计算的一种方法。 It will not always find optimal results. 它并不总能找到最佳结果。 For each pair of subsets, define the transition cost to be min(size of symmetric difference, size of Y - 1). 对于每对子集,将转换成本定义为min(对称差异的大小,Y-1的大小)。 (The symmetric difference of X and Y is the set of things that are in X or Y but not both.) So the transition cost is the number of arithmetic operations you need to do to compute the sum of Y's elements, given the sum of X's. (X和Y的对称差异是X或Y中的事物集合,但不是两者。)因此,转移成本是计算Y元素总和所需的算术运算数,给定总和X的。 Add the empty set to your list of subsets, and compute a minimum-cost directed spanning tree using Edmonds' algorithm (http://en.wikipedia.org/wiki/Edmonds%27_algorithm) or one of the faster but more complicated variations on that theme. 将空集添加到子集列表中,并使用Edmonds算法(http://en.wikipedia.org/wiki/Edmonds%27_algorithm)或更快但更复杂的变体之一计算最小成本定向生成树那个主题。 Now make sure that when your spanning tree has an edge X -> Y you compute X before Y. (This is a "topological sort" and can be done efficiently.) 现在确保当生成树有一个边X - > Y时,你在Y之前计算X.(这是一个“拓扑排序”,可以有效地完成。)

This will give distinctly suboptimal results when, eg, you have {1,2}, {3,4}, {1,2,3,4}, {5,6}, {7,8}, {5,6,7,8}. 例如,当你有{1,2},{3,4},{1,2,3,4},{5,6},{7,8},{5,6时,这将给出明显不理想的结果,7,8}。 After deciding your order of operations using the procedure above you could then do an optimization pass where you find cheaper ways to evaluate each set's sum given the sums already computed, and this will probably give fairly decent results in practice. 在使用上面的过程确定您的操作顺序之后,您可以执行优化过程,在这里您可以找到更便宜的方法来评估已经计算出的总和的每个集合的总和,这可能会在实践中给出相当不错的结果。

I suspect, but have made no attempt to prove, that finding an optimal procedure for a given set of subsets is NP-hard or worse. 我怀疑,但没有试图证明,找到一组给定子集的最佳程序是NP难或更差。 (It is certainly computable ; the set of possible computations you might do is finite. But, on the face of it, it may be awfully expensive; potentially you might be keeping track of about 2^n partial sums, be adding any one of them to any other at each step, and have up to about n^2 steps, for a super-naive cost of (2^2n)^(n^2) = 2^(2n^3) operations to try every possibility.) (它当然是可计算的 ;你可能做的一组可能的计算是有限的。但是,从它的表面来看,它可能非常昂贵;可能你可能会跟踪大约2 ^ n个部分总和,添加任何一个它们在每个步骤中对任何其他步骤,并且具有高达大约n ^ 2步,对于超级天真的成本(2 ^ 2n)^(n ^ 2)= 2 ^(2n ^ 3)操作来尝试每种可能性。 )

A common optimization technique is to pre-compute intermediate results. 常见的优化技术是预先计算中间结果。 In your case, you might pre-compute all sums with 2 summands from A and store them in a lookup table. 在您的情况下,您可以使用A 2个加数预先计算所有总和,并将它们存储在查找表中。 This will result in |A|*|A+1|/2 table entries, where |A| 这将导致|A|*|A+1|/2表条目,其中|A| is the cardinality of A . A的基数。

In order to compute the element sum of Ai, you: 为了计算Ai的元素总和,你:

  • look up the sum of the first two elements of Ai and save them in tmp 查找Ai的前两个元素的总和并将它们保存在tmp中
  • while there is an element x left in Ai: 在Ai中有一个元素x:
  • look up the sum of tmp and x 查找tmp和x的总和

In order to compute the element sum of A1 = {1,3,4,5} from your example, you do the following: 要从您的示例计算A1 = {1,3,4,5}的元素总和,请执行以下操作:

  • lookup(1,3) = 4 lookup(1,3)= 4
  • lookup(4,4) = 8 lookup(4,4)= 8
  • lookup(8,5) = 13 lookup(8,5)= 13

Note that computing the sum of any given Ai doesn't require summation, since all the work has already been conducted while pre-computing the lookup table. 注意,计算任何给定Ai的总和不需要求和,因为在预先计算查找表时已经进行了所有工作。

If you store the lookup table in a hash table, then lookup() is in O(1). 如果将查找表存储在哈希表中,则lookup()位于O(1)中。


Possible optimizations to this approach: 可能优化此方法:

  • construct the lookup table while computing the summation results; 在计算求和结果的同时构造查找表; hence, you only compute those summations that you actually need. 因此,您只计算实际需要的那些总和。 Your lookup table is now a cache. 您的查找表现在是一个缓存。
  • if your addition operation is commutative, you can save half of your cache size by storing only those summations where the smaller summand comes first. 如果您的加法操作是可交换的,则可以通过仅存储较小的加数首先出现的那些求和来节省一半的高速缓存大小。 Then modify lookup() such that lookup(a,b) = lookup(b,a) if a > b . 然后修改lookup()使lookup(a,b) = lookup(b,a)如果a > b

If assuming summation is time consuming action you can find LCS of every pair of subsets (by assuming they are sorted as mentioned in comments, or if they are not sorted sort them), after that calculate sum of LCS of maximum length (over all LCS in pairs), then replace it's value in related arrays with related numbers, update their LCS and continue this way till there is no LCS with more than one number. 如果假设求和是耗时的动作,你可以找到每对子集的LCS (通过假设它们按照注释中的提及进行排序,或者如果它们没有排序排序),之后计算最大长度的LCS总和(在所有LCS上)成对),然后用相关数字替换它在相关数组中的值,更新它们的LCS并继续这种方式,直到没有多个数字的LCS。 Sure this is not optimum, but it's better than naive algorithm (smaller number of summation). 当然这不是最佳的,但它比天真的算法(总和的数量更少)更好。 However you can do backtracking to find best solution. 但是,您可以进行回溯以找到最佳解决方案。

eg For your sample input: 例如,您的样本输入:

A1={1,3,4,5} , A2={2,3,4}

LCS (A_1,A_2) = {3,4} ==>7 ==>replace it:

A1={1,5,7}, A2={2,7} ==> LCS = {7}, maximum LCS length is `1`, so calculate sums.

Still you can improve it by calculation sum of two random numbers, then again taking LCS, ... 你仍然可以通过两个随机数的计算总和来改进它,然后再次采用LCS,......

NO. 没有。 There is no efficient techique. 没有高效的技术。

Because it is NP complete problem. 因为它是NP完全问题。 and there are no efficient solutions for such problem 而这种问题没有有效的解决方案

why is it NP-complete? 为什么NP完全?
We could use algorithm for this problem to solve set cover problem , just by putting extra set in set, conatining all elements. 我们可以使用算法来解决集合覆盖问题 ,只需在集合中添加额外的集合,包含所有元素。

Example: We have sets of elements 示例:我们有多组元素
A1={1,2}, A2={2,3}, A3 = {3,4} We want to solve set cover problem. A1 = {1,2},A2 = {2,3},A3 = {3,4}我们想解决集合覆盖问题。

we add to this set, set of numbers containing all elements A4 = {1,2,3,4} 我们添加到这个集合中,包含所有元素的数字集合A4 = {1,2,3,4}

We use algorhitm that John Smith is aking for and we check solution A4 is represented whit. 我们使用约翰史密斯正在寻求的algorhitm,我们检查解决方案A4代表whit。 We solved NP-Complete problem. 我们解决了NP-Complete问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM