简体   繁体   中英

pandas retrieve values from one dataframe and do calculation on another dataframe

I have two df ,

df1
cur    dec_pl
JPY    2
HKD    1
GBP    0

df2
cur    amount
JPY    10
HKD    5
USD    100
GBP    10

I like to see if any cur of df2 exists in cur of df1 , and get the corresponding dec_pl values; for example, for JPY the dec_pl is 2, it means 10 to the power of 2, ie 10 * 10; multiplied it to its amount in df2 , will get 1000; the result will be put in a new column converted_amount ; if cur in df2 cannot find a corresponding value in cur in df1 then its converted_amount == amount ; so the result will look like,

cur    amount    converted_amount
JPY    10        1000
HKD    5         50
USD    100       100
GBP    10        10 

I am wondering whats the best way to do it.

This should be a merge(join) plus a conditional calculation, see code below,

df1 = pd.Series([2,1],index=['JPY','HKD'],name='dec_pl')
df2 = pd.DataFrame({'amount':[10,5,100]}, pd.Index(['JPY','HKD','USD'],name='cur'))

Set up df1 and df2 , which are set up in this way to make join easier

Method 1

df2['converted_amount'] = (df2['amount']**df1).fillna(df2['amount'],downcast='infer')

You are able to do df2['amount']**df1 even if they don't have the same shape. Pandas will try to align the index.

Method 2

(Left) join two datasets, and calculate power, if missing, default power is 1.

(df2.join(df1)
    .assign(converted_amount=lambda x:x.amount.pow(x.dec_pl.fillna(1,downcast='infer'))))

Output

cur amount  dec_pl  converted_amount
JPY 10       2.0    100
HKD 5        1.0    5
USD 100      NaN    100

I didn't drop dec_pl , to drop this, add .drop('dec_pl',axis=1) after that.

An interesting note is that if you're joining a series to a dataframe on indexes, you can actually use column assignment,

df2['dec_pl'] = df1

which works the same as df2.join(df1)

You need to first add a column amount that has pow of 10 for dec_pl . Perform a right outer join with df1 . Then simply multiply the amount from both the dataframes with .fillna(1) . You have res that you want. Last step is to rename and drop unnecessary columns.

df1['amount'] = pd.Series([10]*len(df1)).pow(df1['dec_pl'])
res = df1.merge(df2, on='cur', how='right')
res['converted_amount'] = res['amount_x'].fillna(1).mul(res['amount_y'])
res = res.rename(columns={'amount_y': 'amount'}).drop(['dec_pl', 'amount_x'],1)

Output:

    cur amount  converted_amount
0   JPY 10      1000.0
1   HKD 5       50.0
2   USD 100     100.0

You could do a left join on df2 and df1, then replace NaN under dec_pl column with 0. Here is the code to do that

df = pd.merge(df2, df1, how='left')
df['dec_pl'] = df['dec_pl'].fillna(0)
df['converted_amount'] = df['amount'] * 10 ** df['dec_pl']
df.drop(['dec_pl'], axis=1, inplace=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM