[英]Compare two dataframes based off of key
我有两个数据帧,df1和df2,它们具有完全相同的列,并且大多数时候每个键的值相同。
Country ABCDEFGH Key Argentina xylo 262 4632 0 0 26.12 2 0 Argentinaxylo Argentina phone 6860 155811 48 0 4375.87 202 0 Argentinaphone Argentina land 507 1803728 2 117 7165.810566 3 154 Argentinaland Australia xylo 7650 139472 69 0 16858.42 184 0 Australiaxylo Australia mink 1284 2342788 1 0 39287.71 53 0 Australiamink Country ABCDEFGH Key Argentina xylo 262 4632 0 0 26.12 2 0 Argentinaxylo Argentina phone 6860 155811 48 0 4375.87 202 0 Argentinaphone Argentina land 507 1803728 2 117 7165.810566 3 154 Argentinaland Australia xylo 7650 139472 69 0 16858.42 184 0 Australiaxylo Australia mink 1284 2342788 1 0 39287.71 53 0 Australiamink
我想要一个片段,将每个数据帧中的键(键=列Country +列A)相互比较,并计算每列BH的百分比差异(如果有的话)。 如果没有,则不输出任何内容。
希望,下面给出的代码可以帮助您解决问题。 我根据Key列数据比较了两个数据集,并分别生成它们(BH)列的差异。 此后,具有百分比差异,我只是在Key列上的两个数据集上合并,比较差异并在df3数据集的df3diff列中具有最终输出。
import pandas as pd
df1 = pd.DataFrame([['Argentina', 'xylo', 262 ,4632, 0 , 0 , 26.12 , 2 , 0 , 'Argentinaxylo']
,['Argentina', 'phone',6860,155811 , 48 , 0 ,4375.87 ,202, 0 , 'Argentinaphone']
,['Argentina', 'land', 507 ,1803728, 2 , 117 ,7165.810,566, 3 , '154 Argentinaland']
,['Australia', 'xylo', 7650,139472 , 69 , 0 ,16858.42,184, 0 , 'Australiaxylo']
,['Australia', 'mink', 1284,2342788, 1 , 0 ,39287.71, 53, 0 , 'Australiamink']]
,columns=['Country', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'Key'])
df1['df1BH'] = (df1['B']-df1['H'])/100.00
print(df1)
df2 = pd.DataFrame([['Argentina', 'xylo', 262 ,4632 , 0 , 0 ,26.12 ,2 , 0 ,'Argentinaxylo']
,['Argentina', 'phone',6860,155811 , 48, 0 ,4375.87 ,202, 0 ,'Argentinaphone']
,['Argentina', 'land', 507 ,1803728, 2 , 117 ,7165.810,566, 3 ,'154 Argentinaland']
,['Australia', 'xylo', 97650,139472 , 69, 0 ,96858.42,184, 0 ,'Australiaxylo']
,['Australia', 'mink', 1284,2342788, 1 , 0 ,39287.71, 53, 0 ,'Australiamink']]
,columns=['Country', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'Key'])
df2['df2BH'] = (df2['B']-df2['H'])/100.00
print(df2)
df3 = pd.merge(df1[['Key','df1BH']],df2[['Key','df2BH']], on=['Key'],how='outer')
df3['df3diff'] = df3['df1BH'] - df3['df2BH']
print(df3)
输出:
Key df1BH df2BH df3diff
0 Argentinaxylo 2.62 2.62 0.0
1 Argentinaphone 68.60 68.60 0.0
2 154 Argentinaland 5.04 5.04 0.0
3 Australiaxylo 76.50 976.50 -900.0
4 Australiamink 12.84 12.84 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.