Pandas - 匹配两个数据帧中的两列，并在df1中创建新列

Question

我有两个数据帧

DF1

Srlno id  image
1      3    image1.jpg
2      3    image2.jpg
3      3    image2.jpg

DF2

Srlno  id   image
1       1   image1.jpg
2       2   image2.jpg
3       3   image3.jpg

我想基于列Image匹配两个数据帧，并将Id从df2返回到df1作为新列。 df2中的图像名称是唯一的，而df1中的图像名称有很多重复。 我想保留重复的图像名称，但是为df2中的每个图像填写正确的id。

预期的产出是：

Srlno id  image          newids
1      3    image1.jpg     1
2      3    image2.jpg     2
3      3    image2.jpg     2

我试过了

df1['newids'] = df1['image'].map(df2.set_index('image')['id'])

这会返回错误InvalidInvexError（'重新索引仅对具有唯一值的索引对象有效'）我理解df1中的重复项正在创建此错误...但不知道如何解决。

Answer 1

dict(zip())另一种解决方案dict(zip())

df1['newids']=df1.image.map(dict(zip(df2.image,df2.id)))
print(df1)

   Srlno  id       image  newids
0      1   3  image1.jpg       1
1      2   3  image2.jpg       2
2      3   3  image2.jpg       2

Answer 2

使用drop_duplicates只获取map唯一image值：

#default keep first dupe
s = df2.drop_duplicates('image').set_index('image')['id']
df1['newids'] = df1['image'].map(s)

#keep last dupe
s = df2.drop_duplicates('image', keep='last').set_index('image')['id']
df1['newids'] = df1['image'].map(s)

#keep last dupe
d = dict(zip(df2['image'], df2['id']))
df1['newids'] = df1['image'].map(d)

Pandas - 匹配两个数据帧中的两列，并在df1中创建新列

问题描述

2 个解决方案

解决方案1
5 2019-02-21 07:14:13

解决方案2
3 已采纳 2019-02-21 07:10:50

Pandas - 匹配两个数据帧中的两列，并在df1中创建新列

问题描述

2 个解决方案

解决方案1 5 2019-02-21 07:14:13

解决方案2 3 已采纳 2019-02-21 07:10:50

解决方案1
5 2019-02-21 07:14:13

解决方案2
3 已采纳 2019-02-21 07:10:50