What is the most efficient approach in Python to create a new data frame column df1['Description']
obtained when df1['a']
string is equal df2['b']
string? When condition is met, then a separate string df2['Description']
would be assigned to df1['Description']
. df1
and df2
are large data frames (~1/2 Million rows) of unequal sizes.
df1
:
a b
Z0 1
Z1 2
A7 3
df2
:
b Description
W2 asadsde
Z0 evrverve
A7 eveveerv
I would like:
df1
a b Description
Z0 1 evrverve
Z1 2 jsbdbcje
A7 3 eveveerv
pandas.merge
import pandas as pd
df1 = pd.DataFrame([['Z0', 1],['Z1', 2], ['A7', 3]], columns=['a', 'b'])
a b
0 Z0 1
1 Z1 2
2 A7 3
df2 = pd.DataFrame([['W2', 'asadsde'], ['Z0', 'evrverve'], ['A7', 'eveveerv'], ['Z1', 'jsbdbcje']], columns=['a', 'Description'])
a Description
0 W2 asadsde
1 Z0 evrverve
2 A7 eveveerv
3 Z1 jsbdbcje
df3 = pd.merge(left=df1, right=df2, on='a', )
a b Description
0 Z0 1 evrverve
1 Z1 2 jsbdbcje
2 A7 3 eveveerv
Considering these are your dataframes:
df1
a b
Z0 1
Z1 2
A7 3
df2
b Description
W2 asadsde
Z0 evrverve
A7 eveveerv
Z1 jsbdbcje
Code to achieve your desired output using map
and assign
:
df1.assign(description = df1['a'].map(dict(df2.values)))
a b description
0 Z0 1 evrverve
1 Z1 2 jsbdbcje
2 A7 3 eveveerv
In df1 if you want only matched rows then use dropna
:
df1.dropna(inplace=True)
import pandas as pd
df1 = pd.DataFrame([['Z0', 1],['Z1', 2], ['A7', 3]], columns=['a', 'b'])
df2 = pd.DataFrame([['W2', 'asadsde'], ['Z0', 'evrverve'], ['A7', 'eveveerv'], ['Z1', 'jsbdbcje']], columns=['b', 'Description'])
After the initialization you can join your dataframes based on your a
column and setting your other datafrmae's b
column as index . So the code will be-
df1.join(df2.set_index('b'),on='a')
And your desire output will be -
a b description
0 Z0 1 evrverve
1 Z1 2 jsbdbcje
2 A7 3 eveveerv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.