简体   繁体   中英

Pandas - find rows with matching values in two columns and multiply value in another column

First suppose we have a dataframe below:

import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 
                     'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  
                     'C':['10','10','10','30','50','60','50','8'], 
                     'D':['9','8','7','6','5','4','3','2']})
print(data)

    A   C   D   id
0   foo 10  9   1
1   bar 10  8   2
2   foo 10  7   3
3   bar 30  6   4
4   foo 50  5   5
5   bar 60  4   6
6   foo 50  3   7
7   foo 8   2   8

What I would like to do is find match rows and then do some calculation.

for any two ids(idx, idy) in data.iterrows():
       if idx.A == idy.A and idx.C = idy.C:
       result = idx.D * idy.D

and then generate a new dataframe with three columns ['id'] , ['A'] and ['result'] .

So a few rows of expected result is:

     id   A   result   
0    1   foo   63   
1    3   foo   63   
2    5   foo   15
3    7   foo   15

I have tried but the results are either wrong logic or wrong code/data format. Can someone give me a hand please?

One way is to groupby A + C, take the product and count, filter out those that only have a single item in the group, then inner merge back on A + C to your original frame, eg:

df.merge(
    df.groupby(['A', 'C']).D.agg(['prod', 'count'])
    [lambda r: r['count'] > 1],
    left_on=['A', 'C'],
    right_index=True
)

Gives you:

     A   C  D  id  prod  count
0  foo  10  9   1    63      2
2  foo  10  7   3    63      2
4  foo  50  5   5    15      2
6  foo  50  3   7    15      2

Then drop/rename columns as appropriate.

You can use self-join technique:

data[['id', 'C', 'D']] = data[['id', 'C', 'D']].apply(pd.to_numeric)
joint = pd.merge(data, data, on=('A', 'C'))
joint = joint.loc[join['id_x'] != join['id_y']]
joint['result'] = joint['D_x'] * joint['D_y']
result = joint[['id_x', 'A', 'result']]
result.columns = ['id', 'A', 'result']

Result :

   id    A  result
1   1  foo      63
2   3  foo      63
7   5  foo      15
8   7  foo      15
import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 
                     'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  
                     'C':['10','10','10','30','50','60','50','8'], 
                     'D':['9','8','7','6','5','4','3','2']})

First convert relevant columns to numeric

data[['C', 'D', 'id']] = data[['C', 'D', 'id']].apply(pd.to_numeric)

Create empty DataFrame to append to

finalDataFrame = pd.DataFrame()

groupby two columns, and then find product of column D within group and append it.

group = data.groupby(['A', 'C'])
for x, y in group:


    product = (y[["D"]].product(axis=0).values[0])


    for row in y.index:
        y.at[row, 'D'] = product

    finalDataFrame = finalDataFrame.append(y, ignore_index=True)

output = finalDataFrame[['id', 'A', 'D']]
output = output.rename(columns = {'D': 'result'})
print(output)

gives you

   id    A  result
0   2  bar       8
1   4  bar       6
2   6  bar       4
3   8  foo       2
4   1  foo      63
5   3  foo      63
6   5  foo      15
7   7  foo      15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM