How can I find a combination of rows from a table where each column sums a specific number (or range)?

Question

I have a table with three columns. Let's say the first row is filled with some people names. The second and the third are numbers representing the value they spent. I want to build another table with a subset of those people where the sum from each column of this new table gives a specific value. How can I do that in Python?

Example: This is my table

Col1       Col2   Col3
John       10     100
Andrew     5      50
Martha     8      20
Ana        2      5

Let's say I wanted a combination where the second column sum is 20 and the third is 125. The result would be:

Col1       Col2   Col3
John       10     100
Martha     8      20
Ana        2      5

Note: Of course, sometimes might be impossible to get exactly the sum. If the code accepts some approximation, like from 0,9X to 1,1X, being X the sum I want, it would be just fine. Also, I don't need to get a specific number of rows. It can be a combination of 2, 3,...,n.

Answer 1

This is the algorithmic task - to find the combination of values that match the needed criteria. For not complex tasks you can use the following script which removes row by row in the dataframe and checks if the column's sum combination matches needed criteria. However, the script should be elaborated in case you want to continue removing rows (ie removing two rows if after trying to remove one row the match was not found). Here the specific algorithm should be implemented (ie which exact two rows to remove and in which order?) and there could be a very large number of combinations depending on the complexity of your data.



#sample dataframe
d = {'Column1': ["John", "Andrew", "Martha", "Ana"], 'Column2': [10, 5, 8, 2], 'Column3': [100, 50, 20, 5]}
df = pd.DataFrame(data=d)

#count the sum of each column
totalColumn2 = df['Column2'].sum()
totalColumn3 = df['Column3'].sum()

#function to check if sums of columns match the requrements
def checkRequirements():
  if totalColumn2 == 20 and totalColumn3 == 125:  #vsums of each column
    return True
  else:
    return False

#iterating through dataframe, removing rows and checking the match
ind = 0
for i, row in df.iterrows():
  df1 = df.drop(df.index[ind])
  totalColumn2 = df1['Column2'].sum()
  totalColumn3 = df1['Column3'].sum()
  checkRequirements()
  if checkRequirements() is True:
    print(df1)
    break
  ind = ind+1

Answer 2

Extending on @stanna's solution: We can create all possible combinations of the rows to be dropped using iterables.combinations() and check if our requirements are satisfied

def checkRequirements(sum1, sum2):
  if sum1 == 20 and sum2 == 125:
    return True
  else:
    return False

# first check if the df as a whole satisfy the requirement
if checkRequirements(df['Col2'].sum(), df['Col3'].sum()) == True:
    print(df)
else:
    # create multiple combination of rows and drop them and check if they satisfy the requriement
    for r in range(1, len(df.index)):
        drop_list = list(combinations(list(df.index), r))
        for idx in drop_list:
            temp_df = df.drop(list(idx))
            if checkRequirements(temp_df['Col2'].sum(), temp_df['Col3'].sum()) == True:
                print(temp_df)
                break

Output:

     Col1  Col2  Col3
0    John    10   100
2  Martha     8    20
3     Ana     2     5

Remove the break stmt at the end if you want to print all the matching subsets

How can I find a combination of rows from a table where each column sums a specific number (or range)?

Question

2 answers

solution1
1 ACCPTED 2020-03-13 10:17:26

solution2
1 2020-03-13 11:56:32

How can I find a combination of rows from a table where each column sums a specific number (or range)?

Question

2 answers

solution1 1 ACCPTED 2020-03-13 10:17:26

solution2 1 2020-03-13 11:56:32

solution1
1 ACCPTED 2020-03-13 10:17:26

solution2
1 2020-03-13 11:56:32