str.contains() 用列表过滤 df 时，但某些列表项包含 2 个或更多单词

Question

I am filtering a column using a list and I have been using我正在使用列表过滤列并且我一直在使用

str.contains("".format("|".join(towns))

This works on towns like "Atlanta", but not "New York" as it is searching for New and York seperately.这适用于像“亚特兰大”这样的城镇，但不适用于“纽约”，因为它正在分别搜索纽约和纽约。 Is there a way around this?有没有解决的办法？

Reproducible example - They all return True:可重现的例子——它们都返回 True：

array = ["New Jersey", "Atlanta", "New York", "Washington"]
df = pd.DataFrame({"col1": array})

towns = ["Atlanta", "New York"]

df["col1"].str.contains("".format("|".join(towns)))

Answer 1

For your example data Series.isin works.对于您的示例数据 Series.isin 有效。

>>> df["col1"].isin(towns)
0    False
1     True
2     True
3    False
Name: col1, dtype: bool

If The Series is a bit different and you need to use a regular expression:如果 The Series 有点不同，您需要使用正则表达式：

>>> dg = pd.DataFrame({"col1": ["New Jersey","Atlanta","New York",
                                "Washington", "The New York Times"]})
>>> dg
                 col1
0          New Jersey
1             Atlanta
2            New York
3          Washington
4  The New York Times
>>>
>>> rex = "|".join(towns)
>>> dg['col1'].str.contains(rex)
0    False
1     True
2     True
3    False
4     True
Name: col1, dtype: bool

>>> df
         col1
0  New Jersey
1     Atlanta
2    New York
3  Washington

Answer 2

Try this;尝试这个;

import pandas as pd 
array = ["New Jersey", "Atlanta", "New York", "Washington","New York City"]
df = pd.DataFrame({"col1": array})

towns = ["Atlanta", "New York"]

df["Town Check"] = df['col1'].apply(lambda x: len([i for i in towns if i in x]))
df1 = df[df["Town Check"] > 0]
del df1["Town Check"]
df1.index = range(0,df1.shape[0])

Output of df1; df1 的 Output；

            col1
0        Atlanta
1       New York
2  New York City

str.contains() 用列表过滤 df 时，但某些列表项包含 2 个或更多单词

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-10-03 16:46:55

解决方案2
0 2022-10-03 15:28:57

str.contains() 用列表过滤 df 时，但某些列表项包含 2 个或更多单词

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-10-03 16:46:55

解决方案2 0 2022-10-03 15:28:57

解决方案1
1 已采纳 2022-10-03 16:46:55

解决方案2
0 2022-10-03 15:28:57