简体   繁体   English

子集和问题

[英]Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset.最近我对在超集中寻找零和子集的子集和问题感兴趣。 I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach.我在 SO 上找到了一些解决方案,此外,我遇到了一个使用动态编程方法的特定解决方案 I translated his solution in python based on his qualitative descriptions.我根据他的定性描述用 python 翻译了他的解决方案。 I'm trying to optimize this for larger lists which eats up a lot of my memory.我正在尝试为更大的列表优化这个,这会占用我很多内存。 Can someone recommend optimizations or other techniques to solve this particular problem?有人可以推荐优化或其他技术来解决这个特定问题吗? Here's my attempt in python:这是我在 python 中的尝试:

import random
from time import time
from itertools import product

time0 = time()

# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
    return [[0]*b for x in xrange(a)]

# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
    return [random.randrange(lower,upper+1) for i in range(num)]

# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
    N_list = []
    P_list = []
    for x in A:
        if x < 0:
            N_list.append(x)
        elif x > 0:
            P_list.append(x)
    return [sum(N_list), sum(P_list)]

# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
    if n < 0:
        return 0
    try:
        return table[n][m - N]
    except:
        return 0

# same definition as above
def set_element(table, n, m, N, value):
    table[n][m - N] = value

# input array
#A = [1, -3, 2, 4]
A = random_ints(200)

[N, P] = split_sum(A)

# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)

# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)

# iterate through each table element
#for i in xrange(1, m): #row
#    for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
    if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
        #set_element(table, i, s, N, 1)
        table[i][s - N] = 1

# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
    if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
        s = s - A[i]
        solution.append(A[i])

print "Solution: ",solution

time1 = time()

print "Time execution: ", time1 - time0

I'm not quite sure if your solution is exact or a PTA (poly-time approximation).我不太确定您的解决方案是准确的还是 PTA(多时间近似)。

But, as someone pointed out, this problem is indeed NP-Complete.但是,正如有人指出的,这个问题确实是 NP-Complete。

Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.意思是,每个已知的(精确的)算法在输入的大小上都有指数时间行为。

Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:意思是,如果您可以在 0.01 纳秒内处理 1 个操作,那么对于包含 59 个元素的列表,它将需要:

2^59 ops -->     2^59     seconds -->     2^26      years -->      1 year
            --------------           ---------------
            10.000.000.000           3600 x 24 x 365

You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.您可以找到启发式方法,这让您有机会在多项式时间内找到精确解。

On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time.另一方面,如果您使用集合中数字值的边界来限制问题(到另一个问题),那么问题的复杂度就会降低到多项式时间。 But even then the memory space consumed will be a polynomial of VERY High Order.但即便如此,消耗的内存空间也将是非常高阶的多项式。
The memory consumed will be much larger than the few gigabytes you have in memory.消耗的内存将比您在内存中拥有的几 GB 大得多。 And even much larger than the few tera-bytes on your hard drive.甚至比硬盘驱动器上的几 TB 大得多。

( That's for small values of the bound for the value of the elements in the set ) (这是针对集合中元素值的小值)

May be this is the case of your Dynamic programing algorithm.可能这就是您的动态编程算法的情况。

It seemed to me that you were using a bound of 1000 when building your initialization matrix.在我看来,您在构建初始化矩阵时使用了 1000 的界限。

You can try a smaller bound.您可以尝试较小的界限。 That is... if your input is consistently consist of small values.也就是说......如果您的输入始终由小值组成。

Good Luck!祝你好运!

Someone on Hacker News came up with the following solution to the problem, which I quite liked. Hacker News 上有人提出了以下解决方案,我非常喜欢。 It just happens to be in python :):它恰好在 python 中:):

def subset_summing_to_zero (activities):
  subsets = {0: []}
  for (activity, cost) in activities.iteritems():
      old_subsets = subsets
      subsets = {}
      for (prev_sum, subset) in old_subsets.iteritems():
          subsets[prev_sum] = subset
          new_sum = prev_sum + cost
          new_subset = subset + [activity]
          if 0 == new_sum:
              new_subset.sort()
              return new_subset
          else:
              subsets[new_sum] = new_subset
  return []

I spent a few minutes with it and it worked very well.我花了几分钟的时间,它工作得很好。

An interesting article on optimizing python code is available here .此处提供一篇关于优化 Python 代码的有趣文章。 Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.基本上主要的结果是你应该内联你的频繁循环,所以在你的情况下,这意味着不是每个循环调用两次get_element而是将该函数的实际代码放在循环中以避免函数调用开销。

Hope that helps!希望有帮助! Cheers干杯

, 1st eye catch , 第一眼

def split_sum(A):
  N_list = 0
  P_list = 0
  for x in A:
    if x < 0:
        N_list+=x
    elif x > 0:
        P_list+=x
  return [N_list, P_list]

Some advices:一些建议:

  1. Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon.尝试使用一维列表并使用 bitarray 来减少内存占用(http://pypi.python.org/pypi/bitarray),因此您只需更改 get / set 函数。 This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)这应该会减少你的内存占用至少 64(列表中的整数是指向整数类型的指针,所以它可以是因子 3*32)

  2. Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.避免使用 try - catch,但在开始时找出合适的范围,您可能会发现您将获得巨大的速度。

The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.以下代码适用于 Python 3.3+ ,我使用了 Python 中的 itertools 模块,它有一些很棒的方法可供使用。

from itertools import chain, combinations
def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

nums = input("Enter the Elements").strip().split() inputSum = int(input("Enter the Sum You want"))

for i, combo in enumerate(powerset(nums), 1): sum = 0 for num in combo: sum += int(num) if sum == inputSum: print(combo)

The Input Output is as Follows:输入输出如下:

Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')

Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).只需更改您的集合 w 中的值,并相应地使数组 x 与 w 的 len 一样大,然后将子集和函数中的最后一个值作为您想要子集的总和,然后您 wl bw 完成(如果您想通过给出自己的价值观)。

def subsetsum(cs,k,r,x,w,d):
    x[k]=1
    if(cs+w[k]==d):
        for i in range(0,k+1):

            if x[i]==1:
                print (w[i],end=" ")
        print()

    elif cs+w[k]+w[k+1]<=d :
        subsetsum(cs+w[k],k+1,r-w[k],x,w,d)

    if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
        x[k]=0
        subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]

subsetsum(0,0,sum(w),x,w,7)     

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM