[英]Python Pandas: check if Series contains a string from list
I am trying to identify if a column Blaze[Info]
contains within the text a string from a list (and create a new Boolean column with that information).我试图确定Blaze[Info]
列是否在文本中包含来自列表的字符串(并使用该信息创建一个新的 Boolean 列)。
The DataFrame looks like: DataFrame 看起来像:
Word Info
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)
When I state the term directly I get the answer I want:当我直接使用 state 这个词时,我得到了我想要的答案:
Blaze['Noun'] = np.where((Blaze['Info'].str.contains('n.')),True,False)
Blaze['Verb'] = np.where((Blaze['Info'].str.contains('v.')),True,False)
Blaze['Noun'] = np.where((Blaze['Info'].str.contains('n.')),True,False)
Blaze['Verb'] = np.where((Blaze['Info'].str.contains('v.')),True,False)
Word Info Noun Verb
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham... True False
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.) True False
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.) True False
but this is not scalable as I have 100+ features to search for.但这不可扩展,因为我有 100 多个要搜索的功能。
When I iterate through the list abbreviation
:当我遍历列表abbreviation
时:
abbreviation=['n'., 'v.']
col_name=['Noun','Verb']
for i in range(len(abbreviation)):
Blaze[col_name[i]] = np.where((Blaze['Info'].str.contains(abbreviation[i])), True, False)
I am returned DataFrame full of 'FALSE' entries:我被退回 DataFrame 充满了'FALSE'条目:
Word Info Noun Verb
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham... False False
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.) False False
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.) False False
I can see several answers for doing something similar but grouping the answer in a single row: Check if each row in a pandas series contains a string from a list using apply?我可以看到几个做类似事情的答案,但将答案分组在一行中: 检查 pandas 系列中的每一行是否包含使用应用的列表中的字符串?
Scalable solution for str.contains with list of strings in pandas str.contains 的可扩展解决方案,包含 pandas 中的字符串列表
but I don't think these solve the above.但我认为这些不能解决上述问题。
Is anyone able to explain how I am going wrong?有人能解释我怎么错了吗?
You can loop through the lists simultaneously with zip
.您可以使用zip
同时遍历列表。 Make sure to pass regex=False
to str.contains
as .
确保将regex=False
传递给str.contains
as .
is a regex character.是一个正则表达式字符。
abbreviation=['n.', 'v.']
col_name=['Noun','Verb']
for a, col in zip(abbreviation, col_name):
Blaze[col] = np.where(Blaze['Info'].str.contains(a, regex=False),True,False)
Blaze
Out[1]:
Word Info Noun Verb
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham... True False
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.) True False
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.) True False
If required, str.contains
also has a case
parameter, so you can specify case=False
to search case-insensitively.如果需要, str.contains
也有一个case
参数,因此您可以指定case=False
以不区分大小写搜索。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.