Python Pandas：检查系列是否包含列表中的字符串

Question

I am trying to identify if a column Blaze[Info] contains within the text a string from a list (and create a new Boolean column with that information).我试图确定Blaze[Info]列是否在文本中包含来自列表的字符串（并使用该信息创建一个新的 Boolean 列）。

The DataFrame looks like: DataFrame 看起来像：

       Word          Info
0      Aam           Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...
1      aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)
2      aard-wolf     Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)

When I state the term directly I get the answer I want:当我直接使用 state 这个词时，我得到了我想要的答案：

Blaze['Noun'] = np.where((Blaze['Info'].str.contains('n.')),True,False) Blaze['Verb'] = np.where((Blaze['Info'].str.contains('v.')),True,False) Blaze['Noun'] = np.where((Blaze['Info'].str.contains('n.')),True,False) Blaze['Verb'] = np.where((Blaze['Info'].str.contains('v.')),True,False)

       Word          Info                                                Noun   Verb
0      Aam           Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...   True   False
1      aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)      True   False
2      aard-wolf     Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)       True   False

but this is not scalable as I have 100+ features to search for.但这不可扩展，因为我有 100 多个要搜索的功能。

When I iterate through the list abbreviation :当我遍历列表abbreviation时：

abbreviation=['n'., 'v.']
col_name=['Noun','Verb']

for i in range(len(abbreviation)):
    Blaze[col_name[i]] = np.where((Blaze['Info'].str.contains(abbreviation[i])), True, False)

I am returned DataFrame full of 'FALSE' entries:我被退回 DataFrame 充满了'FALSE'条目：

       Word          Info                                                Noun   Verb
0      Aam           Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...   False  False
1      aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)      False  False
2      aard-wolf     Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)       False  False

I can see several answers for doing something similar but grouping the answer in a single row: Check if each row in a pandas series contains a string from a list using apply?我可以看到几个做类似事情的答案，但将答案分组在一行中：检查 pandas 系列中的每一行是否包含使用应用的列表中的字符串？

Scalable solution for str.contains with list of strings in pandas str.contains 的可扩展解决方案，包含 pandas 中的字符串列表

but I don't think these solve the above.但我认为这些不能解决上述问题。

Is anyone able to explain how I am going wrong?有人能解释我怎么错了吗？

Answer 1

You can loop through the lists simultaneously with zip .您可以使用zip同时遍历列表。 Make sure to pass regex=False to str.contains as .确保将regex=False传递给str.contains as . is a regex character.是一个正则表达式字符。

abbreviation=['n.', 'v.']
col_name=['Noun','Verb']
for a, col in zip(abbreviation, col_name):
    Blaze[col] = np.where(Blaze['Info'].str.contains(a, regex=False),True,False)
Blaze
Out[1]: 
        Word                                               Info  Noun   Verb
0        Aam  Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...  True  False
1  aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)  True  False
2  aard-wolf      Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)  True  False

If required, str.contains also has a case parameter, so you can specify case=False to search case-insensitively.如果需要， str.contains也有一个case参数，因此您可以指定case=False以不区分大小写搜索。

Python Pandas：检查系列是否包含列表中的字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-10 23:01:31

Python Pandas：检查系列是否包含列表中的字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-10 23:01:31

解决方案1
1 已采纳 2020-12-10 23:01:31