简体   繁体   中英

Pandas: Faster way to iterate through a dataframe and add new data based on operations

I want to pandas to look into values in 2 columns in each row of df1, look for the match in another df2, and paste this in a new column in df1 in the same row and continue

alp=list("ABCDEFGHIJKLMNOPQRTSUVQXYZ")
df1['NewCol'] = (np.random.choice(alp)) #create new col and input random values

for i in range(len(df1['code1'])):
    a = df1['code2'].iloc[i].upper()
    b = df1['code1'].str[-3:].iloc[i]
    df1['NewCol'].iloc[i] = df2.loc[b,a]
df1['code3'] = df1[['code3','NewCol']].max(axis=1)
df1 =df1.drop('NewCol',axis=1)

My inputs as below: df1:

    code1 code2  code3
0  XXXHYG     a     12
1  XXXTBG     a     23
2  XXXECT     b     34
3  XXXKOL     b     45
4  XXXBTW     c     56

df2:

    A   B   C   D   E
HYG 33  38  40  41  30
TBG 20  46  41  43  45
ECT 53  42  39  34  45
KOL 45  51  54  47  30
BTW 37  36  49  48  58

output needed:

    code1 code2  code3
0  XXXHYG     a     33
1  XXXTBG     a     23
2  XXXECT     b     42
3  XXXKOL     b     51
4  XXXBTW     c     56

When I do this over just 4200 rows in df1, it takes 222 seconds for just the loop.. there has got to be a way to utilize the power of pandas to do this faster?

thanks a lot for your time!

You could use apply ( docs ), but a faster way to do this if you have a lot of data would be to create a version of df2 with a MultiIndex ( docs ) using stack ( docs ) and look up values on this new DataFrame.

df3 = df1.copy()
tups = list(zip(df1['code1'].str[-3:], df1['code2'].str.upper()))
df3['code3'] = df2.stack()[tups].values
print(df3)

outputs

    code1 code2  code3
0  XXXHYG     a     33
1  XXXTBG     a     20
2  XXXECT     b     42
3  XXXKOL     b     51
4  XXXBTW     c     49

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM