Pandas - 在多列中查找并迭代具有匹配值的行，并在另一列中乘以值

Question

This question is a step further to my previous one : 这个问题比我之前的问题更进了一步：

I edited the table so it will cause less confusion 我编辑了表格，这样可以减少混淆

First suppose we have a dataframe below: 首先假设我们有一个数据框如下：

data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8','9','10'], 
                 'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo','foo','bar'],  
                 'C':['10','10','10','50','50','50','50','8','10','20'], 
                 'D':['10','9','8','7','6','5','4','3','2','1']})

As below: 如下：

      A  C   D  id
0   foo 10  10  1
1   bar 10  9   2
2   foo 10  8   3
3   bar 50  7   4
4   foo 50  6   5
5   bar 50  5   6
6   foo 50  4   7
7   foo 8   3   8
8   foo 10  2   9
9   bar 20  1   10

What I would like to do is find match rows and then do some calculation. 我想做的是找到匹配行，然后进行一些计算。

for any two ids(idx, idy) in data.iterrows():
       if idx.A == idy.A and idx.C = idy.C:
       result = idx.D * idy.D

and then generate a new dataframe with three columns ['id'] , ['A'] and ['result'] . 然后生成一个包含三列['id'] ， ['A']和['result']的新数据帧。

@Jon Clements♦ answered my previous question with a very neat code below: @Jon Clements♦使用下面非常简洁的代码回答了我之前的问题：

   df.merge(
        df.groupby(['A', 'C']).D.agg(['prod', 'count'])
        [lambda r: r['count'] > 1],
        left_on=['A', 'C'],
        right_index=True
    )

New goal: 新目标：

Now I am wondering is there a method to not iterate the row_a again once it matched with row_b. 现在我想知道有没有一种方法在与row_b匹配时不再迭代row_a。 In other word, I am consider these two matching rows as a pair. 换句话说，我认为这两个匹配的行是一对。 Once row_a and row_b became a pair, the further loop will ignore row_a (not row_b until row_b match to another row). 一旦row_a和row_b成为一对，另一个循环将忽略row_a（而不是row_b，直到row_b与另一行匹配）。

Take groupby().agg('prod', 'count') function as an example, I hope the 'count' of all results generated are 2 (not just a filter with ['count'] == 2 ). 以groupby().agg('prod', 'count')函数为例，我希望生成的所有结果的'count'为2（不仅仅是['count'] == 2的过滤器）。 I don't think this is going to work using groupby() So I am thinking mechanism like for-loop may solve this question? 我认为这不会使用groupby()所以我认为像for循环这样的机制可以解决这个问题吗？ or is there any better method? 或者有更好的方法吗？

So the expected result now is (because id1 and id3 has become a pair so it will not aggregate to id9, and for the rest iteration id3 will not match with id1. So for the following table the result of row one is 80 but not 160, and row two is not either): 所以现在的预期结果是（因为id1和id3已成为一对，因此它不会聚合到id9，而对于其余迭代，id3将不会与id1匹配。因此，对于下表，第一行的结果为80但不是160 ，第二行也不是）：

     id   A   result   
0    1   foo   80   
1    3   foo   16
2    4   bar   35
3    5   foo   24

My English is not that good so I am not sure if I am explaining my question clearly. 我的英语不太好，所以我不确定我是否清楚地解释了我的问题。 Ask me anything if you are not clear. 如果你不清楚，问我什么。

Thanks for any help. 谢谢你的帮助。

Answer 1

A bit of a long-winded solution and nowhere near as elegant as the original solution by Jon Clements for you first problem. 这是一个冗长的解决方案，远不及Jon Clements为您解决的第一个问题的原始解决方案。 But I have come up with a solution without the need for a for-loop. 但我提出了一个解决方案，而不需要for-loop。

# sort values by A,C,id
df = df.sort_values(['A','C','id'])
# find where A and C are equal when shifted down by 1
s=(df[['A','C']] == df[['A','C']].shift()).T.apply(lambda x: x.A and x.C)

# create a new series where we take the value of D of whe A and C are equal
# and multiply it with the next value - since it's sorted it should be next A,C match
new_d = (df.iloc[df[s].index].reset_index().D * df.iloc[df[s].index+1].reset_index().D)
new_d.index = df.iloc[df[s].index].index
new_d.name = 'results'

print(new_d)
Output >
0    80
3    35
4    24
2    16
Name: results, dtype: int64

Taking the above we simply create a new column in df and assign it to new_d : 考虑到上述情况，我们只需在df创建一个新列并将其分配给new_d ：

# create a new column in df and assign it to new_d
df['results'] = new_d

df.dropna()[['id','A','results']].sort_values('id')

Output: 输出：

    id  A   results
0   1   foo 80.0
2   3   foo 16.0
3   4   bar 35.0
4   5   foo 24.0

Pandas - 在多列中查找并迭代具有匹配值的行，并在另一列中乘以值

问题描述

1 个解决方案

解决方案1
1 2018-08-13 17:12:39

Pandas - 在多列中查找并迭代具有匹配值的行，并在另一列中乘以值

问题描述

1 个解决方案

解决方案1 1 2018-08-13 17:12:39

解决方案1
1 2018-08-13 17:12:39