简体   繁体   中英

Compare columns in two dataframe

I have two dataframes, df2012 and df2013, in each dataframe I have 2 columns the ID and Total, I need to compare the ID column for both dataframe and if the ID are equal, I need to compare the value of Total: know if the ratio of (df2012['Total']/df2013['Total]) < 0.8 i should drop this value for example:

df2012:            df2013:       
ID  Total          ID Total
01   10            04  36
02   28            01  13
03   2             06  45

In this case, i should drop 01 from df2012

First set the index of df2013 to the ID

df2013 = df2013.set_index('ID')

Next do a inner join:

df = df2912.join(df2013, on='ID', how='inner')

Now, you can filter and compare columns. Depending on your needs you might opt for left, right or outer.

IIUC, you could do:

import pandas as pd

# setup
df2012 = pd.DataFrame(data=[['01', 10], ['02', 28], ['03', 2]], columns=['id', 'total'])
df2013 = pd.DataFrame(data=[['04', 36], ['01', 13], ['06', 45]], columns=['id', 'total'])

# merge on id, keep the left values and fill with same, to make it 1.0 when dividing
m = df2012.merge(df2013, on='id', how='left').ffill(axis=1)

# filter
res = df2012[(m['total_x'] / m['total_y']) >= 0.8]
print(res)

Output

   id  total
1  02     28
2  03      2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM