Pandas - 在多列中查找並迭代具有匹配值的行，並在另一列中乘以值

Question

這個問題比我之前的問題更進了一步：

我編輯了表格，這樣可以減少混淆

首先假設我們有一個數據框如下：

data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8','9','10'], 
                 'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo','foo','bar'],  
                 'C':['10','10','10','50','50','50','50','8','10','20'], 
                 'D':['10','9','8','7','6','5','4','3','2','1']})

如下：

      A  C   D  id
0   foo 10  10  1
1   bar 10  9   2
2   foo 10  8   3
3   bar 50  7   4
4   foo 50  6   5
5   bar 50  5   6
6   foo 50  4   7
7   foo 8   3   8
8   foo 10  2   9
9   bar 20  1   10

我想做的是找到匹配行，然后進行一些計算。

for any two ids(idx, idy) in data.iterrows():
       if idx.A == idy.A and idx.C = idy.C:
       result = idx.D * idy.D

然后生成一個包含三列['id'] ， ['A']和['result']的新數據幀。

@Jon Clements♦使用下面非常簡潔的代碼回答了我之前的問題：

   df.merge(
        df.groupby(['A', 'C']).D.agg(['prod', 'count'])
        [lambda r: r['count'] > 1],
        left_on=['A', 'C'],
        right_index=True
    )

新目標：

現在我想知道有沒有一種方法在與row_b匹配時不再迭代row_a。 換句話說，我認為這兩個匹配的行是一對。 一旦row_a和row_b成為一對，另一個循環將忽略row_a（而不是row_b，直到row_b與另一行匹配）。

以groupby().agg('prod', 'count')函數為例，我希望生成的所有結果的'count'為2（不僅僅是['count'] == 2的過濾器）。 我認為這不會使用groupby()所以我認為像for循環這樣的機制可以解決這個問題嗎？ 或者有更好的方法嗎？

所以現在的預期結果是（因為id1和id3已成為一對，因此它不會聚合到id9，而對於其余迭代，id3將不會與id1匹配。因此，對於下表，第一行的結果為80但不是160 ，第二行也不是）：

     id   A   result   
0    1   foo   80   
1    3   foo   16
2    4   bar   35
3    5   foo   24

我的英語不太好，所以我不確定我是否清楚地解釋了我的問題。 如果你不清楚，問我什么。

謝謝你的幫助。

Answer 1

這是一個冗長的解決方案，遠不及Jon Clements為您解決的第一個問題的原始解決方案。 但我提出了一個解決方案，而不需要for-loop。

# sort values by A,C,id
df = df.sort_values(['A','C','id'])
# find where A and C are equal when shifted down by 1
s=(df[['A','C']] == df[['A','C']].shift()).T.apply(lambda x: x.A and x.C)

# create a new series where we take the value of D of whe A and C are equal
# and multiply it with the next value - since it's sorted it should be next A,C match
new_d = (df.iloc[df[s].index].reset_index().D * df.iloc[df[s].index+1].reset_index().D)
new_d.index = df.iloc[df[s].index].index
new_d.name = 'results'

print(new_d)
Output >
0    80
3    35
4    24
2    16
Name: results, dtype: int64

考慮到上述情況，我們只需在df創建一個新列並將其分配給new_d ：

# create a new column in df and assign it to new_d
df['results'] = new_d

df.dropna()[['id','A','results']].sort_values('id')

輸出：

    id  A   results
0   1   foo 80.0
2   3   foo 16.0
3   4   bar 35.0
4   5   foo 24.0

Pandas - 在多列中查找並迭代具有匹配值的行，並在另一列中乘以值

問題描述

1 個解決方案

解決方案1
1 2018-08-13 17:12:39

Pandas - 在多列中查找並迭代具有匹配值的行，並在另一列中乘以值

問題描述

1 個解決方案

解決方案1 1 2018-08-13 17:12:39

解決方案1
1 2018-08-13 17:12:39