简体   繁体   English

如何从表中找到每列总和特定数字(或范围)的行组合?

[英]How can I find a combination of rows from a table where each column sums a specific number (or range)?

I have a table with three columns.我有一个包含三列的表格。 Let's say the first row is filled with some people names.假设第一行填充了一些人名。 The second and the third are numbers representing the value they spent.第二个和第三个是代表他们花费的价值的数字。 I want to build another table with a subset of those people where the sum from each column of this new table gives a specific value.我想用这些人的子集构建另一个表,其中这个新表的每一列的总和给出一个特定的值。 How can I do that in Python?我怎样才能在 Python 中做到这一点?

Example: This is my table示例:这是我的桌子

Col1       Col2   Col3
John       10     100
Andrew     5      50
Martha     8      20
Ana        2      5

Let's say I wanted a combination where the second column sum is 20 and the third is 125. The result would be:假设我想要一个组合,其中第二列总和为 20,第三列总和为 125。结果将是:

Col1       Col2   Col3
John       10     100
Martha     8      20
Ana        2      5

Note: Of course, sometimes might be impossible to get exactly the sum.注意:当然,有时可能无法准确获得总和。 If the code accepts some approximation, like from 0,9X to 1,1X, being X the sum I want, it would be just fine.如果代码接受一些近似值,例如从 0.9X 到 1.1X,即 X 是我想要的总和,那就没问题了。 Also, I don't need to get a specific number of rows.另外,我不需要获得特定数量的行。 It can be a combination of 2, 3,...,n.它可以是 2、3、...、n 的组合。

This is the algorithmic task - to find the combination of values that match the needed criteria.这是算法任务 - 找到与所需标准匹配的值的组合。 For not complex tasks you can use the following script which removes row by row in the dataframe and checks if the column's sum combination matches needed criteria.对于不复杂的任务,您可以使用以下脚本在数据框中逐行删除并检查列的总和组合是否符合所需的条件。 However, the script should be elaborated in case you want to continue removing rows (ie removing two rows if after trying to remove one row the match was not found).但是,如果您想继续删除行(即,如果在尝试删除一行后未找到匹配项,则删除两行),应详细说明脚本。 Here the specific algorithm should be implemented (ie which exact two rows to remove and in which order?) and there could be a very large number of combinations depending on the complexity of your data.在这里应该实现特定的算法(即要删除哪两行以及以什么顺序删除?)并且可能有大量的组合,具体取决于数据的复杂性。



#sample dataframe
d = {'Column1': ["John", "Andrew", "Martha", "Ana"], 'Column2': [10, 5, 8, 2], 'Column3': [100, 50, 20, 5]}
df = pd.DataFrame(data=d)

#count the sum of each column
totalColumn2 = df['Column2'].sum()
totalColumn3 = df['Column3'].sum()

#function to check if sums of columns match the requrements
def checkRequirements():
  if totalColumn2 == 20 and totalColumn3 == 125:  #vsums of each column
    return True
  else:
    return False

#iterating through dataframe, removing rows and checking the match
ind = 0
for i, row in df.iterrows():
  df1 = df.drop(df.index[ind])
  totalColumn2 = df1['Column2'].sum()
  totalColumn3 = df1['Column3'].sum()
  checkRequirements()
  if checkRequirements() is True:
    print(df1)
    break
  ind = ind+1

Extending on @stanna's solution: We can create all possible combinations of the rows to be dropped using iterables.combinations() and check if our requirements are satisfied扩展@stanna 的解决方案:我们可以使用iterables.combinations()创建要删除的行的所有可能组合,并检查是否满足我们的要求

def checkRequirements(sum1, sum2):
  if sum1 == 20 and sum2 == 125:
    return True
  else:
    return False

# first check if the df as a whole satisfy the requirement
if checkRequirements(df['Col2'].sum(), df['Col3'].sum()) == True:
    print(df)
else:
    # create multiple combination of rows and drop them and check if they satisfy the requriement
    for r in range(1, len(df.index)):
        drop_list = list(combinations(list(df.index), r))
        for idx in drop_list:
            temp_df = df.drop(list(idx))
            if checkRequirements(temp_df['Col2'].sum(), temp_df['Col3'].sum()) == True:
                print(temp_df)
                break

Output:输出:

     Col1  Col2  Col3
0    John    10   100
2  Martha     8    20
3     Ana     2     5

Remove the break stmt at the end if you want to print all the matching subsets如果要打印所有匹配的子集,请删除最后的break stmt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Function 找到总和为给定数字的值组合 - Function that find combination of values that sums to a given number 如何通过为每列选择特定范围来消除 dataframe 中的行? - Pandas - How to eliminate rows in a dataframe by selecting a specific range for each column? - Pandas 如何找到拆分和彼此接近的拆分? - How to find split where the split sums are close to each other? SQL:从表中选择行,其中列的每个元素都是矩阵? - SQL: select rows from table where each element of a column is a matrix? 如何找到矩阵中每一列的绝对值之和的最大值 - How to find the max of the sums of the absolute values of each column in a matrix 如何在特定范围内拆分列? - How can I split a column in a specific range? 如何创建一个根据条件对特定行求和的变量并将其用于其他公式? - how can i create a variable that sums specific rows based on a condition and use it in other formulas? 如何从特定表中抓取所有行? - How can I scrape all rows from a specific table? 如何从带有文本的列中提取特定数字 - How Can I Extract a Specific Number From a Column With Text 在python中,如何删除特定列为空白的整行? - In python how can I remove entire rows where a specific column is blank?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM