I have a table with three columns. Let's say the first row is filled with some people names. The second and the third are numbers representing the value they spent. I want to build another table with a subset of those people where the sum from each column of this new table gives a specific value. How can I do that in Python?
Example: This is my table
Col1 Col2 Col3
John 10 100
Andrew 5 50
Martha 8 20
Ana 2 5
Let's say I wanted a combination where the second column sum is 20 and the third is 125. The result would be:
Col1 Col2 Col3
John 10 100
Martha 8 20
Ana 2 5
Note: Of course, sometimes might be impossible to get exactly the sum. If the code accepts some approximation, like from 0,9X to 1,1X, being X the sum I want, it would be just fine. Also, I don't need to get a specific number of rows. It can be a combination of 2, 3,...,n.
This is the algorithmic task - to find the combination of values that match the needed criteria. For not complex tasks you can use the following script which removes row by row in the dataframe and checks if the column's sum combination matches needed criteria. However, the script should be elaborated in case you want to continue removing rows (ie removing two rows if after trying to remove one row the match was not found). Here the specific algorithm should be implemented (ie which exact two rows to remove and in which order?) and there could be a very large number of combinations depending on the complexity of your data.
#sample dataframe
d = {'Column1': ["John", "Andrew", "Martha", "Ana"], 'Column2': [10, 5, 8, 2], 'Column3': [100, 50, 20, 5]}
df = pd.DataFrame(data=d)
#count the sum of each column
totalColumn2 = df['Column2'].sum()
totalColumn3 = df['Column3'].sum()
#function to check if sums of columns match the requrements
def checkRequirements():
if totalColumn2 == 20 and totalColumn3 == 125: #vsums of each column
return True
else:
return False
#iterating through dataframe, removing rows and checking the match
ind = 0
for i, row in df.iterrows():
df1 = df.drop(df.index[ind])
totalColumn2 = df1['Column2'].sum()
totalColumn3 = df1['Column3'].sum()
checkRequirements()
if checkRequirements() is True:
print(df1)
break
ind = ind+1
Extending on @stanna's solution: We can create all possible combinations of the rows to be dropped using iterables.combinations()
and check if our requirements are satisfied
def checkRequirements(sum1, sum2):
if sum1 == 20 and sum2 == 125:
return True
else:
return False
# first check if the df as a whole satisfy the requirement
if checkRequirements(df['Col2'].sum(), df['Col3'].sum()) == True:
print(df)
else:
# create multiple combination of rows and drop them and check if they satisfy the requriement
for r in range(1, len(df.index)):
drop_list = list(combinations(list(df.index), r))
for idx in drop_list:
temp_df = df.drop(list(idx))
if checkRequirements(temp_df['Col2'].sum(), temp_df['Col3'].sum()) == True:
print(temp_df)
break
Output:
Col1 Col2 Col3
0 John 10 100
2 Martha 8 20
3 Ana 2 5
Remove the break
stmt at the end if you want to print all the matching subsets
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.