[英]How can I find a combination of rows from a table where each column sums a specific number (or range)?
I have a table with three columns.我有一个包含三列的表格。 Let's say the first row is filled with some people names.
假设第一行填充了一些人名。 The second and the third are numbers representing the value they spent.
第二个和第三个是代表他们花费的价值的数字。 I want to build another table with a subset of those people where the sum from each column of this new table gives a specific value.
我想用这些人的子集构建另一个表,其中这个新表的每一列的总和给出一个特定的值。 How can I do that in Python?
我怎样才能在 Python 中做到这一点?
Example: This is my table示例:这是我的桌子
Col1 Col2 Col3
John 10 100
Andrew 5 50
Martha 8 20
Ana 2 5
Let's say I wanted a combination where the second column sum is 20 and the third is 125. The result would be:假设我想要一个组合,其中第二列总和为 20,第三列总和为 125。结果将是:
Col1 Col2 Col3
John 10 100
Martha 8 20
Ana 2 5
Note: Of course, sometimes might be impossible to get exactly the sum.注意:当然,有时可能无法准确获得总和。 If the code accepts some approximation, like from 0,9X to 1,1X, being X the sum I want, it would be just fine.
如果代码接受一些近似值,例如从 0.9X 到 1.1X,即 X 是我想要的总和,那就没问题了。 Also, I don't need to get a specific number of rows.
另外,我不需要获得特定数量的行。 It can be a combination of 2, 3,...,n.
它可以是 2、3、...、n 的组合。
This is the algorithmic task - to find the combination of values that match the needed criteria.这是算法任务 - 找到与所需标准匹配的值的组合。 For not complex tasks you can use the following script which removes row by row in the dataframe and checks if the column's sum combination matches needed criteria.
对于不复杂的任务,您可以使用以下脚本在数据框中逐行删除并检查列的总和组合是否符合所需的条件。 However, the script should be elaborated in case you want to continue removing rows (ie removing two rows if after trying to remove one row the match was not found).
但是,如果您想继续删除行(即,如果在尝试删除一行后未找到匹配项,则删除两行),应详细说明脚本。 Here the specific algorithm should be implemented (ie which exact two rows to remove and in which order?) and there could be a very large number of combinations depending on the complexity of your data.
在这里应该实现特定的算法(即要删除哪两行以及以什么顺序删除?)并且可能有大量的组合,具体取决于数据的复杂性。
#sample dataframe
d = {'Column1': ["John", "Andrew", "Martha", "Ana"], 'Column2': [10, 5, 8, 2], 'Column3': [100, 50, 20, 5]}
df = pd.DataFrame(data=d)
#count the sum of each column
totalColumn2 = df['Column2'].sum()
totalColumn3 = df['Column3'].sum()
#function to check if sums of columns match the requrements
def checkRequirements():
if totalColumn2 == 20 and totalColumn3 == 125: #vsums of each column
return True
else:
return False
#iterating through dataframe, removing rows and checking the match
ind = 0
for i, row in df.iterrows():
df1 = df.drop(df.index[ind])
totalColumn2 = df1['Column2'].sum()
totalColumn3 = df1['Column3'].sum()
checkRequirements()
if checkRequirements() is True:
print(df1)
break
ind = ind+1
Extending on @stanna's solution: We can create all possible combinations of the rows to be dropped using iterables.combinations()
and check if our requirements are satisfied扩展@stanna 的解决方案:我们可以使用
iterables.combinations()
创建要删除的行的所有可能组合,并检查是否满足我们的要求
def checkRequirements(sum1, sum2):
if sum1 == 20 and sum2 == 125:
return True
else:
return False
# first check if the df as a whole satisfy the requirement
if checkRequirements(df['Col2'].sum(), df['Col3'].sum()) == True:
print(df)
else:
# create multiple combination of rows and drop them and check if they satisfy the requriement
for r in range(1, len(df.index)):
drop_list = list(combinations(list(df.index), r))
for idx in drop_list:
temp_df = df.drop(list(idx))
if checkRequirements(temp_df['Col2'].sum(), temp_df['Col3'].sum()) == True:
print(temp_df)
break
Output:输出:
Col1 Col2 Col3
0 John 10 100
2 Martha 8 20
3 Ana 2 5
Remove the break
stmt at the end if you want to print all the matching subsets如果要打印所有匹配的子集,请删除最后的
break
stmt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.