[英]Using VLOOKUP with merge in Python
I have this pandas DataFrame with almost 540000 rows: 我有几乎540000行的这个熊猫DataFrame:
df1.head()
username hour totalCount
0 lowi 00:00 12
1 klark 00:00 0
2 sturi 00:00 2
3 nukr 00:00 10
4 irore 00:00 2
I also have this other pandas DataFrame with almost 52000 rows and with some duplicated rows: 我也有其他熊猫DataFrame,它具有近52000行和一些重复的行:
df2.head()
username community
0 klark 0
1 irore 2
2 sturi 2
3 sturi 2
4 sturi 2
I want to merge the column of 'community' of df2 into the df1, but in the corresponding row according to the username. 我想将df2的'community'列合并到df1,但要根据用户名在相应的行中合并。 I have used this code:
我使用了以下代码:
df_merge = df_hu.merge(df_comm, on='username')
df_merge
But I get the following DataFrame with almost 1205880 rows and duplicated ones: 但是我得到了以下具有几乎1205880行和重复行的DataFrame:
username hour totalCount community
0 lowi 00:00 12 2
1 lowi 00:00 12 2
2 lowi 00:00 12 2
3 lowi 01:00 9 2
4 lowi 01:00 9 2
The expected output would be this: 预期的输出将是这样的:
df_merge.head()
username hour totalCount community
0 lowi 00:00 12 2
1 klark 00:00 0 0
2 sturi 00:00 2 2
3 nukr 00:00 10 1 (not showed in the example)
4 irore 00:00 2 1 (not showed in the example)
Using pandas.Series.map
: 使用
pandas.Series.map
:
df2 = df2.drop_duplicates().set_index('username')
df1['community'] = df1['username'].map(df2['community'])
print(df1)
Output: 输出:
username hour totalCount community
0 lowi 00:00 12 NaN
1 klark 00:00 0 0.0
2 sturi 00:00 2 2.0
3 nukr 00:00 10 NaN
4 irore 00:00 2 2.0
Note that lowi
and nukr
weren't in the example df2
so NaN
. 请注意,在示例
df2
, lowi
和nukr
不是NaN
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.