[英]Pandas New Column based on found string in a DataFrame
Trying to match an ID value in one DataFrame to a string column in another DataFrame to create a new ID field. 尝试将一个DataFrame中的ID值与另一个DataFrame中的字符串列进行匹配,以创建一个新的ID字段。
I have two dataframes, one with an text ID column only: 我有两个数据框,一个只有文本ID列:
DF1 DF1
ID
elf
orc
panda
And another dataframe with a different ID but a text column that would contain the ID value from the first DataFrame (DF1): 另一个具有不同ID的数据框,但一个文本列包含第一个DataFrame(DF1)中的ID值:
DF2 DF2
AltID Text
1 The orc killed the dwarf
2 The elf lives in the woods
3 The panda eats bamboo
That way I can create New ID column in the second Dataframe (DF2) that would look like this if the text is found: 这样,我可以在第二个数据框(DF2)中创建“新ID”列,如果找到该文本,它将看起来像这样:
NewID
orc
elf
panda
Should I use a lambda function or an np.where()? 我应该使用lambda函数还是np.where()?
Thanks in advance. 提前致谢。
EDIT: 编辑:
What if it needs to be an exact match? 如果需要完全匹配怎么办? For instance I have this row of text but don't want to match 'orc'
例如,我有这行文字,但不想匹配'orc'
AltID Text
4 The orchestra played too long
and wanted it to output 'None', N/A or something of that nature for the NewID? 并希望它为NewID输出“无”,N / A或类似性质的东西?
Straightforward using str.extract
: 直接使用
str.extract
:
df2['New ID'] = df2.Text.str.extract('({})'.format('|'.join(df1.ID)), expand=False)
df2
AltID Text New ID
0 1 The orc killed the dwarf orc
1 2 The elf lives in the woods elf
2 3 The panda eats bamboo panda
A small trick . 一个小把戏。
df2.Text.replace(dict(zip(df1.ID,df1.index)),regex=True).map(df1.ID)
Out[1004]:
0 orc
1 elf
2 panda
Name: Text, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.