如何从表中找到每列总和特定数字（或范围）的行组合？

Question

I have a table with three columns.我有一个包含三列的表格。 Let's say the first row is filled with some people names.假设第一行填充了一些人名。 The second and the third are numbers representing the value they spent.第二个和第三个是代表他们花费的价值的数字。 I want to build another table with a subset of those people where the sum from each column of this new table gives a specific value.我想用这些人的子集构建另一个表，其中这个新表的每一列的总和给出一个特定的值。 How can I do that in Python?我怎样才能在 Python 中做到这一点？

Example: This is my table示例：这是我的桌子

Col1       Col2   Col3
John       10     100
Andrew     5      50
Martha     8      20
Ana        2      5

Let's say I wanted a combination where the second column sum is 20 and the third is 125. The result would be:假设我想要一个组合，其中第二列总和为 20，第三列总和为 125。结果将是：

Col1       Col2   Col3
John       10     100
Martha     8      20
Ana        2      5

Note: Of course, sometimes might be impossible to get exactly the sum.注意：当然，有时可能无法准确获得总和。 If the code accepts some approximation, like from 0,9X to 1,1X, being X the sum I want, it would be just fine.如果代码接受一些近似值，例如从 0.9X 到 1.1X，即 X 是我想要的总和，那就没问题了。 Also, I don't need to get a specific number of rows.另外，我不需要获得特定数量的行。 It can be a combination of 2, 3,...,n.它可以是 2、3、...、n 的组合。

Answer 1

This is the algorithmic task - to find the combination of values that match the needed criteria.这是算法任务 - 找到与所需标准匹配的值的组合。 For not complex tasks you can use the following script which removes row by row in the dataframe and checks if the column's sum combination matches needed criteria.对于不复杂的任务，您可以使用以下脚本在数据框中逐行删除并检查列的总和组合是否符合所需的条件。 However, the script should be elaborated in case you want to continue removing rows (ie removing two rows if after trying to remove one row the match was not found).但是，如果您想继续删除行（即，如果在尝试删除一行后未找到匹配项，则删除两行），应详细说明脚本。 Here the specific algorithm should be implemented (ie which exact two rows to remove and in which order?) and there could be a very large number of combinations depending on the complexity of your data.在这里应该实现特定的算法（即要删除哪两行以及以什么顺序删除？）并且可能有大量的组合，具体取决于数据的复杂性。



#sample dataframe
d = {'Column1': ["John", "Andrew", "Martha", "Ana"], 'Column2': [10, 5, 8, 2], 'Column3': [100, 50, 20, 5]}
df = pd.DataFrame(data=d)

#count the sum of each column
totalColumn2 = df['Column2'].sum()
totalColumn3 = df['Column3'].sum()

#function to check if sums of columns match the requrements
def checkRequirements():
  if totalColumn2 == 20 and totalColumn3 == 125:  #vsums of each column
    return True
  else:
    return False

#iterating through dataframe, removing rows and checking the match
ind = 0
for i, row in df.iterrows():
  df1 = df.drop(df.index[ind])
  totalColumn2 = df1['Column2'].sum()
  totalColumn3 = df1['Column3'].sum()
  checkRequirements()
  if checkRequirements() is True:
    print(df1)
    break
  ind = ind+1

Answer 2

Extending on @stanna's solution: We can create all possible combinations of the rows to be dropped using iterables.combinations() and check if our requirements are satisfied扩展@stanna 的解决方案：我们可以使用iterables.combinations()创建要删除的行的所有可能组合，并检查是否满足我们的要求

def checkRequirements(sum1, sum2):
  if sum1 == 20 and sum2 == 125:
    return True
  else:
    return False

# first check if the df as a whole satisfy the requirement
if checkRequirements(df['Col2'].sum(), df['Col3'].sum()) == True:
    print(df)
else:
    # create multiple combination of rows and drop them and check if they satisfy the requriement
    for r in range(1, len(df.index)):
        drop_list = list(combinations(list(df.index), r))
        for idx in drop_list:
            temp_df = df.drop(list(idx))
            if checkRequirements(temp_df['Col2'].sum(), temp_df['Col3'].sum()) == True:
                print(temp_df)
                break

Output:输出：

     Col1  Col2  Col3
0    John    10   100
2  Martha     8    20
3     Ana     2     5

Remove the break stmt at the end if you want to print all the matching subsets如果要打印所有匹配的子集，请删除最后的break stmt

如何从表中找到每列总和特定数字（或范围）的行组合？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-03-13 10:17:26

解决方案2
1 2020-03-13 11:56:32

如何从表中找到每列总和特定数字（或范围）的行组合？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-03-13 10:17:26

解决方案2 1 2020-03-13 11:56:32

解决方案1
1 已采纳 2020-03-13 10:17:26

解决方案2
1 2020-03-13 11:56:32