[英]Compare two dataframes and return common values
我有 2 个数据框,需要根据 dataframe 的“名称”列获取 system_type 列。
我有 500000 行 df1 作为格式
Name Timestamp usage AXCS 2022-01-01 5 BGXD 2022-02-01 70 HFSD 2022-03-01 45 AEVC 2022-01-01 25 BHRF 2022-02-01 12
和 550000 行 df2 作为
Name System_Type HFSD Dev BHRF Test BGXD Prod AEVC Prod AXCS Test
我使用了以下编码
pd.merge(df1, df2, on="Name")
处理它需要很多时间,是否有另一种方式/方法来处理它。 请指教
您可以使用df2
作为 dict 映射:
df1['System_Type'] = df1['Name'].map(df2.set_index('Name')['System_Type'])
print(df1)
# Output
Name Timestamp usage System_Type
0 AXCS 2022-01-01 5 Test
1 BGXD 2022-02-01 70 Prod
2 HFSD 2022-03-01 45 Dev
3 AEVC 2022-01-01 25 Prod
4 BHRF 2022-02-01 12 Test
你可以这样做:
import pandas as pd
df1 = pd.DataFrame({
'System_Name':['AXCS','BGXD','HFSD','AEVC', 'BHRF'],
'Timestamp':['2022-01-01','2022-02-01','2022-03-01','2022-01-01', '2022-01-01'],
'usuage ':[5,70,45,25,12],
})
df2 = pd.DataFrame({
'System_Name':['HFSD','BHRF','BGXD','AEVC', 'AXCS'],
'System_Type':['Dev','Test','Prod','Prod', 'Test'],
})
# Get all diferent values
df3 = pd.merge(df1, df2, how='outer', indicator='Exist')
df3 = df3.loc[df3['Exist'] == 'both']
# If you like to filter by a System_Name
df3 = pd.merge(df1, df2, on="System_Name", how='outer', indicator='Exist')
df3 = df3.loc[df3['Exist'] == 'both']
print(df3)
#输出:
System_Name Timestamp usuage System_Type Exist
0 AXCS 2022-01-01 5 Test both
1 BGXD 2022-02-01 70 Prod both
2 HFSD 2022-03-01 45 Dev both
3 AEVC 2022-01-01 25 Prod both
4 BHRF 2022-01-01 12 Test both
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.