基于在DataFrame中找到的字符串的Pandas New Column

Question

尝试将一个DataFrame中的ID值与另一个DataFrame中的字符串列进行匹配，以创建一个新的ID字段。

我有两个数据框，一个只有文本ID列：

DF1

ID
elf
orc
panda

另一个具有不同ID的数据框，但一个文本列包含第一个DataFrame（DF1）中的ID值：

DF2

AltID Text
1     The orc killed the dwarf
2     The elf lives in the woods
3     The panda eats bamboo

这样，我可以在第二个数据框（DF2）中创建“新ID”列，如果找到该文本，它将看起来像这样：

NewID
orc
elf
panda

我应该使用lambda函数还是np.where（）？

提前致谢。

编辑：

如果需要完全匹配怎么办？ 例如，我有这行文字，但不想匹配'orc'

AltID  Text
4      The orchestra played too long

并希望它为NewID输出“无”，N / A或类似性质的东西？

Answer 1

直接使用str.extract ：

df2['New ID'] = df2.Text.str.extract('({})'.format('|'.join(df1.ID)), expand=False)

df2

   AltID                        Text New ID
0      1    The orc killed the dwarf    orc
1      2  The elf lives in the woods    elf
2      3       The panda eats bamboo  panda

Answer 2

一个小把戏。

df2.Text.replace(dict(zip(df1.ID,df1.index)),regex=True).map(df1.ID)
Out[1004]: 
0      orc
1      elf
2    panda
Name: Text, dtype: object

基于在DataFrame中找到的字符串的Pandas New Column

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-02-20 19:34:14

解决方案2
2 2018-02-20 19:40:49

基于在DataFrame中找到的字符串的Pandas New Column

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-02-20 19:34:14

解决方案2 2 2018-02-20 19:40:49

解决方案1
2 已采纳 2018-02-20 19:34:14

解决方案2
2 2018-02-20 19:40:49