简体   繁体   English

Pandas - 在多列中查找并迭代具有匹配值的行,并在另一列中乘以值

[英]Pandas - find and iterate rows with matching values in multiple columns and multiply value in another column

This question is a step further to my previous one : 这个问题比我之前的问题更进了一步:

I edited the table so it will cause less confusion 我编辑了表格,这样可以减少混淆

First suppose we have a dataframe below: 首先假设我们有一个数据框如下:

data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8','9','10'], 
                 'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo','foo','bar'],  
                 'C':['10','10','10','50','50','50','50','8','10','20'], 
                 'D':['10','9','8','7','6','5','4','3','2','1']})

As below: 如下:

      A  C   D  id
0   foo 10  10  1
1   bar 10  9   2
2   foo 10  8   3
3   bar 50  7   4
4   foo 50  6   5
5   bar 50  5   6
6   foo 50  4   7
7   foo 8   3   8
8   foo 10  2   9
9   bar 20  1   10

What I would like to do is find match rows and then do some calculation. 我想做的是找到匹配行,然后进行一些计算。

for any two ids(idx, idy) in data.iterrows():
       if idx.A == idy.A and idx.C = idy.C:
       result = idx.D * idy.D

and then generate a new dataframe with three columns ['id'] , ['A'] and ['result'] . 然后生成一个包含三列['id']['A']['result']的新数据帧。

@Jon Clements♦ answered my previous question with a very neat code below: @Jon Clements♦使用下面非常简洁的代码回答了我之前的问题:

   df.merge(
        df.groupby(['A', 'C']).D.agg(['prod', 'count'])
        [lambda r: r['count'] > 1],
        left_on=['A', 'C'],
        right_index=True
    )

New goal: 新目标:

Now I am wondering is there a method to not iterate the row_a again once it matched with row_b. 现在我想知道有没有一种方法在与row_b匹配时不再迭代row_a。 In other word, I am consider these two matching rows as a pair. 换句话说,我认为这两个匹配的行是一对。 Once row_a and row_b became a pair, the further loop will ignore row_a (not row_b until row_b match to another row). 一旦row_a和row_b成为一对,另一个循环将忽略row_a(而不是row_b,直到row_b与另一行匹配)。

Take groupby().agg('prod', 'count') function as an example, I hope the 'count' of all results generated are 2 (not just a filter with ['count'] == 2 ). groupby().agg('prod', 'count')函数为例,我希望生成的所有结果的'count'为2(不仅仅是['count'] == 2的过滤器)。 I don't think this is going to work using groupby() So I am thinking mechanism like for-loop may solve this question? 我认为这不会使用groupby()所以我认为像for循环这样的机制可以解决这个问题吗? or is there any better method? 或者有更好的方法吗?

So the expected result now is (because id1 and id3 has become a pair so it will not aggregate to id9, and for the rest iteration id3 will not match with id1. So for the following table the result of row one is 80 but not 160, and row two is not either): 所以现在的预期结果是(因为id1和id3已成为一对,因此它不会聚合到id9,而对于其余迭代,id3将不会与id1匹配。因此,对于下表,第一行的结果为80但不是160 ,第二行也不是):

     id   A   result   
0    1   foo   80   
1    3   foo   16
2    4   bar   35
3    5   foo   24

My English is not that good so I am not sure if I am explaining my question clearly. 我的英语不太好,所以我不确定我是否清楚地解释了我的问题。 Ask me anything if you are not clear. 如果你不清楚,问我什么。

Thanks for any help. 谢谢你的帮助。

A bit of a long-winded solution and nowhere near as elegant as the original solution by Jon Clements for you first problem. 这是一个冗长的解决方案,远不及Jon Clements为您解决的第一个问题的原始解决方案。 But I have come up with a solution without the need for a for-loop. 但我提出了一个解决方案,而不需要for-loop。

# sort values by A,C,id
df = df.sort_values(['A','C','id'])
# find where A and C are equal when shifted down by 1
s=(df[['A','C']] == df[['A','C']].shift()).T.apply(lambda x: x.A and x.C)

# create a new series where we take the value of D of whe A and C are equal
# and multiply it with the next value - since it's sorted it should be next A,C match
new_d = (df.iloc[df[s].index].reset_index().D * df.iloc[df[s].index+1].reset_index().D)
new_d.index = df.iloc[df[s].index].index
new_d.name = 'results'

print(new_d)
Output >
0    80
3    35
4    24
2    16
Name: results, dtype: int64

Taking the above we simply create a new column in df and assign it to new_d : 考虑到上述情况,我们只需在df创建一个新列并将其分配给new_d

# create a new column in df and assign it to new_d
df['results'] = new_d

df.dropna()[['id','A','results']].sort_values('id')

Output: 输出:

    id  A   results
0   1   foo 80.0
2   3   foo 16.0
3   4   bar 35.0
4   5   foo 24.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas - 在两列中查找具有匹配值的行,并在另一列中查找值 - Pandas - find rows with matching values in two columns and multiply value in another column Pandas 将 1 列值与另一个数据框列进行比较,找到匹配的行 - Pandas compare 1 columns values to another dataframe column, find matching rows 如何将多列乘以另一列pandas - how to multiply multiple columns by another column pandas 通过匹配Python / Pandas中的列名和行名,将列与行相乘 - Multiply columns with rows by matching column name and row name in Python / Pandas 有没有一种方法可以遍历Pandas中的一列以从另一个数据框中找到匹配的索引值? - Is there a way to iterate over a column in Pandas to find matching index values from another dataframe? 如何迭代行以查找熊猫中列的常量值 - How to Iterate on rows to find the constant values of columns in pandas 如果其他两个列在Pandas中具有匹配的值,如何用另一个数据框的值填充空列的值? - How to fill empty column values with another dataframe's value if two other columns have matching values in Pandas? Pandas:在多列中查找具有匹配值的行的 Pythonic 方法(分层条件) - Pandas: Pythonic way to find rows with matching values in multiple columns (hierarchical conditions) pandas通过将列条目与多个其他列中的条目进行匹配来选择行 - pandas select rows by matching a column entry to entries in multiple other columns 在列中查找匹配值并创建另一列 pandas dataframe - Find matching value in column and create another column pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM