[英]str.contains() when filtering a df with a list, but some of the list items contain 2 or more words
I am filtering a column using a list and I have been using我正在使用列表过滤列并且我一直在使用
str.contains("".format("|".join(towns))
This works on towns like "Atlanta", but not "New York" as it is searching for New and York seperately.这适用于像“亚特兰大”这样的城镇,但不适用于“纽约”,因为它正在分别搜索纽约和纽约。 Is there a way around this?有没有解决的办法?
Reproducible example - They all return True:可重现的例子——它们都返回 True:
array = ["New Jersey", "Atlanta", "New York", "Washington"]
df = pd.DataFrame({"col1": array})
towns = ["Atlanta", "New York"]
df["col1"].str.contains("".format("|".join(towns)))
For your example data Series.isin works.对于您的示例数据 Series.isin 有效。
>>> df["col1"].isin(towns)
0 False
1 True
2 True
3 False
Name: col1, dtype: bool
If The Series is a bit different and you need to use a regular expression:如果 The Series 有点不同,您需要使用正则表达式:
>>> dg = pd.DataFrame({"col1": ["New Jersey","Atlanta","New York",
"Washington", "The New York Times"]})
>>> dg
col1
0 New Jersey
1 Atlanta
2 New York
3 Washington
4 The New York Times
>>>
>>> rex = "|".join(towns)
>>> dg['col1'].str.contains(rex)
0 False
1 True
2 True
3 False
4 True
Name: col1, dtype: bool
>>> df
col1
0 New Jersey
1 Atlanta
2 New York
3 Washington
Try this;尝试这个;
import pandas as pd
array = ["New Jersey", "Atlanta", "New York", "Washington","New York City"]
df = pd.DataFrame({"col1": array})
towns = ["Atlanta", "New York"]
df["Town Check"] = df['col1'].apply(lambda x: len([i for i in towns if i in x]))
df1 = df[df["Town Check"] > 0]
del df1["Town Check"]
df1.index = range(0,df1.shape[0])
Output of df1; df1 的 Output;
col1
0 Atlanta
1 New York
2 New York City
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.