[英]How to compare two Dataframes on a column and replace with other column value
I am having two data frames that are df1 and df2 我有两个数据帧,分别是df1和df2
id first last size
A 1978-01-01 1979-01-01 2
B 2000-01-01 2000-01-01 1
C 1998-01-01 2000-01-01 3
D 1998-01-01 1998-01-01 1
E 1999-01-01 2000-01-01 2
id token
A ZA.00
B As.11
C SD.34
output 产量
id first last size
ZA.00 1978-01-01 1979-01-01 2
As.11 2000-01-01 2000-01-01 1
SD.34 1998-01-01 2000-01-01 3
D 1998-01-01 1998-01-01 1
E 1999-01-01 2000-01-01 2
If df1 id is present in df2 then token value is to set df1 id value. 如果df2中存在df1 id,则令牌值将设置df1 id值。 How can i achieve this.
我怎样才能做到这一点。
Using Merge
and combine_first
: 使用
Merge
和combine_first
:
df = df1.merge(df2,how='outer')
df['id'] = df['token'].combine_first(df['id'] )
df.drop('token',inplace=True,axis=1)
Another way is to use replace
with dictionary of df2.values
, here the df1 dataframe changes.: 另一种方法是使用
replace
用的字典df2.values
,这里的DF1数据帧的变化:
df1.id.replace(dict(df2.values),inplace=True)
id first last size
0 ZA.00 1978-01-01 1979-01-01 2
1 As.11 2000-01-01 2000-01-01 1
2 SD.34 1998-01-01 2000-01-01 3
3 D 1998-01-01 1998-01-01 1
4 E 1999-01-01 2000-01-01 2
Use map
and fillna
: 使用
map
和fillna
:
df1['id'] = df1['id'].map(df2.set_index('id')['token']).fillna(df1['id'])
df1
Output: 输出:
id first last size
0 ZA.00 1978-01-01 1979-01-01 2
1 As.11 2000-01-01 2000-01-01 1
2 SD.34 1998-01-01 2000-01-01 3
3 D 1998-01-01 1998-01-01 1
4 E 1999-01-01 2000-01-01 2
You can use map
with a series as an argument. 您可以使用带有系列的
map
作为参数。
If you do not wish to merge your DataFrame, you could use apply function to solve this. 如果您不希望合并您的DataFrame,则可以使用apply函数来解决此问题。 Change your small DataFrame to dictionary and map it to the other DataFrame.
将您的小型DataFrame更改为字典并将其映射到另一个DataFrame。
from io import StringIO #used to get string to df
import pandas as pd
id_ =list('ABC')
token = 'ZA.00 As.11 SD.34'.split()
dt = pd.DataFrame(list(zip(id_,token)),columns=['id','token'])
a ='''
id first last size
A 1978-01-01 1979-01-01 2
B 2000-01-01 2000-01-01 1
C 1998-01-01 2000-01-01 3
D 1998-01-01 1998-01-01 1
E 1999-01-01 2000-01-01 2
'''
df =pd.read_csv(StringIO(a), sep=' ')
# This last two lines are all you need
mp= {x:y for x,y in zip(dt.id.tolist(),dt.token.tolist())}
df.id.apply(lambda x: mp[x] if x in mp.keys() else x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.