简体   繁体   English

在Pandas Vlookup - 加入还是合并?

[英]Vlookup in Pandas - Join or Merge?

I am trying to replicate a Vlookup Excel function using pandas. 我正在尝试使用pandas复制Vlookup Excel功能。 I have used Join and Merge and both methods give me wrong results. 我使用了Join和Merge,这两种方法都给我错误的结果。

Df1 has 15 columns with integers and text values and Df2 has 6 columns with mostly text. Df1有15列,带有整数和文本值,Df2有6列,主要是文本。

I am trying to bring User details from Df2 into Df1 using the column label 'Created By'. 我试图使用列标签'Created By'将用户详细信息从Df2带入Df1。

Df1 looks like this: Df1看起来像这样:

 CA#   CreatedBy  $
9xxx12  User 1      10
9xxx13  User 2      20
9xxx14  User 3      25

Df2 looks like this: Df2看起来像这样:

CreatedBy     Role
User 1         Sales
User 2         Maintenance
User 3         Operations

My expected results would be: DfMerged 我的预期结果将是:DfMerged

CA#     CreatedBy  $   User Role
9xxx12  User 1      10  Sales
9xxx13  User 2      20  Maintenance
9xxx14  User 3      25  Operations

I tried the following code variations, but they don't match all user IDs leaving some blanks in Df1 when there is data in Df2 that matches. 我尝试了以下代码变体,但是当Df2中的数据匹配时,它们不匹配所有用户ID,在Df1中留下一些空白。

   merged= data_fr1.merge(data_fr2, on=['Created By'], how='left')

   merged2= pd.merge(data_fr1, data_fr2, left_on='Created By', 
   right_on='Created By', how='left')

Someone pointed to this post for an answer: Pandas Merging 101 有人指出这篇文章的答案: 熊猫合并101

But I'm still not getting the right results. 但我仍然没有得到正确的结果。 The 'CreatedBy' field is not populating for all users in Df1. “CreatedBy”字段未填充Df1中的所有用户。 This field is a mix of text and numbers, eg: User1, User2, etc. I wonder if the datatype is interfering with the results. 该字段是文本和数字的混合,例如:User1,User2等。我想知道数据类型是否干扰了结果。

Does this not get you what you want doing merge? 这不能让你得到你想做的合并吗? I'm unsure why you have the null column for role and everything under user but you can rename columns. 我不确定为什么你有角色的空列和用户下的所有内容,但你可以重命名列。

print('df')
print(df)
print('df2')
print(df2)
print('out_df')
print(out_df)

df.merge(df2[['By', 'Role']], on='By')
df
      CA# Created  By   $
0  9xxx12    User   1  10
1  9xxx13    User   2  20
2  9xxx14    User   3  25
df2
  Created  By         Role
0    User   1        Sales
1    User   2  Maintenance
2    User   3   Operations
out_df
      CA# Created  By   $         User  Role
0  9xxx12    User   1  10        Sales   NaN
1  9xxx13    User   2  20  Maintenance   NaN
2  9xxx14    User   3  25   Operations   NaN
Out[40]: 
      CA# Created  By   $         Role
0  9xxx12    User   1  10        Sales
1  9xxx13    User   2  20  Maintenance
2  9xxx14    User   3  25   Operations

Edit: Sorry, some of the issue is the clipboard parsing. 编辑:对不起,一些问题是剪贴板解析。 The logic applies. 逻辑适用。 If you're still having issues can you provide examples of "lines" that are not joining properly? 如果您仍然遇到问题,可否提供未正确连接的“线路”示例?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM