检查字符串列表中的字符串是否在 DataFrame Pandas

Question

我有一个关于将列表中的字符串匹配到 df 中的列的问题。

我读了这个问题Check if String in List of Strings is in Pandas DataFrame Column并理解，但我的需求略有不同。

代码：

Cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4', np.nan],
    'Price': [22000,25000,27000,35000, 29000],
    'Liscence Plate': ['ABC 123', 'XYZ 789', 'CBA 321', 'ZYX 987', 'DEF 456']}

df = pd.DataFrame(Cars,columns= ['Brand', 'Price', 'Liscence Plate'])

search_for_these_values = ['Honda', 'Toy', 'Ford Focus', 'Audi A4 2019']
pattern = '|'.join(search_for_these_values)


df['Match'] = df["Brand"].str.contains(pattern, na=False)
print (df)

Output 我得到：

            Brand  Price Liscence Plate  Match
0  Honda Civic     22000  ABC 123        True 
1  Toyota Corolla  25000  XYZ 789        True 
2  Ford Focus      27000  CBA 321        True 
3  Audi A4         35000  ZYX 987        False
4  NaN             29000  DEF 456        False

Output 我想要：

            Brand  Price Liscence Plate  Match
0  Honda Civic     22000  ABC 123        True 
1  Toyota Corolla  25000  XYZ 789        False
2  Ford Focus      27000  CBA 321        True 
3  Audi A4         35000  ZYX 987        True
4  NaN             29000  DEF 456        False

Answer 1

使用单词匹配的一种方法：

pat = "|".join(search_for_these_values).replace(" ", "|")
match = df["Brand"].str.findall(r"\b(%s)\b" % pat)

Output：

0          [Honda]
1               []
2    [Ford, Focus]
3       [Audi, A4]
4              NaN
Name: Brand, dtype: object

然后您可以将其分配回去

df["match"] = match.str.len().ge(1)

最终 output：

            Brand  Price Liscence Plate  match
0     Honda Civic  22000        ABC 123   True
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987   True
4             NaN  29000        DEF 456  False

Answer 2

如果我们使用您概述的规则“如果一个词为真，则为真”，那么这意味着如果品牌列中的一行有“2019”，那么将返回True ，我相信我们不希望这样。 所以

话虽如此，您可以使用list comprehension创建一个新列表，这是您的search_for_these_values的先前split()版本（不包括年份），并将isin与any一起使用：

# list comprehension
import re
s = [word for cars in search_for_these_values for word in cars.split() if not re.search(r'\d{4}',word)]

# Assign True / False
df['Match'] = df['Brand'].str.split(expand = True).isin(s).any(1)

打印回来：

            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123   True
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987   True
4             NaN  29000        DEF 456  False

检查字符串列表中的字符串是否在 DataFrame Pandas

问题描述

2 个解决方案

解决方案1
1 2021-12-06 08:33:52

解决方案2
0 2021-12-06 09:51:27

检查字符串列表中的字符串是否在 DataFrame Pandas

问题描述

2 个解决方案

解决方案1 1 2021-12-06 08:33:52

解决方案2 0 2021-12-06 09:51:27

解决方案1
1 2021-12-06 08:33:52

解决方案2
0 2021-12-06 09:51:27