如何從表中找到每列總和特定數字（或范圍）的行組合？

Question

我有一個包含三列的表格。 假設第一行填充了一些人名。 第二個和第三個是代表他們花費的價值的數字。 我想用這些人的子集構建另一個表，其中這個新表的每一列的總和給出一個特定的值。 我怎樣才能在 Python 中做到這一點？

示例：這是我的桌子

Col1       Col2   Col3
John       10     100
Andrew     5      50
Martha     8      20
Ana        2      5

假設我想要一個組合，其中第二列總和為 20，第三列總和為 125。結果將是：

Col1       Col2   Col3
John       10     100
Martha     8      20
Ana        2      5

注意：當然，有時可能無法准確獲得總和。 如果代碼接受一些近似值，例如從 0.9X 到 1.1X，即 X 是我想要的總和，那就沒問題了。 另外，我不需要獲得特定數量的行。 它可以是 2、3、...、n 的組合。

Answer 1

這是算法任務 - 找到與所需標准匹配的值的組合。 對於不復雜的任務，您可以使用以下腳本在數據框中逐行刪除並檢查列的總和組合是否符合所需的條件。 但是，如果您想繼續刪除行（即，如果在嘗試刪除一行后未找到匹配項，則刪除兩行），應詳細說明腳本。 在這里應該實現特定的算法（即要刪除哪兩行以及以什么順序刪除？）並且可能有大量的組合，具體取決於數據的復雜性。



#sample dataframe
d = {'Column1': ["John", "Andrew", "Martha", "Ana"], 'Column2': [10, 5, 8, 2], 'Column3': [100, 50, 20, 5]}
df = pd.DataFrame(data=d)

#count the sum of each column
totalColumn2 = df['Column2'].sum()
totalColumn3 = df['Column3'].sum()

#function to check if sums of columns match the requrements
def checkRequirements():
  if totalColumn2 == 20 and totalColumn3 == 125:  #vsums of each column
    return True
  else:
    return False

#iterating through dataframe, removing rows and checking the match
ind = 0
for i, row in df.iterrows():
  df1 = df.drop(df.index[ind])
  totalColumn2 = df1['Column2'].sum()
  totalColumn3 = df1['Column3'].sum()
  checkRequirements()
  if checkRequirements() is True:
    print(df1)
    break
  ind = ind+1

Answer 2

擴展@stanna 的解決方案：我們可以使用iterables.combinations()創建要刪除的行的所有可能組合，並檢查是否滿足我們的要求

def checkRequirements(sum1, sum2):
  if sum1 == 20 and sum2 == 125:
    return True
  else:
    return False

# first check if the df as a whole satisfy the requirement
if checkRequirements(df['Col2'].sum(), df['Col3'].sum()) == True:
    print(df)
else:
    # create multiple combination of rows and drop them and check if they satisfy the requriement
    for r in range(1, len(df.index)):
        drop_list = list(combinations(list(df.index), r))
        for idx in drop_list:
            temp_df = df.drop(list(idx))
            if checkRequirements(temp_df['Col2'].sum(), temp_df['Col3'].sum()) == True:
                print(temp_df)
                break

輸出：

     Col1  Col2  Col3
0    John    10   100
2  Martha     8    20
3     Ana     2     5

如果要打印所有匹配的子集，請刪除最后的break stmt

如何從表中找到每列總和特定數字（或范圍）的行組合？

問題描述

2 個解決方案

解決方案1
1 已采納 2020-03-13 10:17:26

解決方案2
1 2020-03-13 11:56:32

如何從表中找到每列總和特定數字（或范圍）的行組合？

問題描述

2 個解決方案

解決方案1 1 已采納 2020-03-13 10:17:26

解決方案2 1 2020-03-13 11:56:32

解決方案1
1 已采納 2020-03-13 10:17:26

解決方案2
1 2020-03-13 11:56:32