檢查字符串列表中的字符串是否在 DataFrame Pandas

Question

我有一個關於將列表中的字符串匹配到 df 中的列的問題。

我讀了這個問題Check if String in List of Strings is in Pandas DataFrame Column並理解，但我的需求略有不同。

代碼：

Cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4', np.nan],
    'Price': [22000,25000,27000,35000, 29000],
    'Liscence Plate': ['ABC 123', 'XYZ 789', 'CBA 321', 'ZYX 987', 'DEF 456']}

df = pd.DataFrame(Cars,columns= ['Brand', 'Price', 'Liscence Plate'])

search_for_these_values = ['Honda', 'Toy', 'Ford Focus', 'Audi A4 2019']
pattern = '|'.join(search_for_these_values)


df['Match'] = df["Brand"].str.contains(pattern, na=False)
print (df)

Output 我得到：

            Brand  Price Liscence Plate  Match
0  Honda Civic     22000  ABC 123        True 
1  Toyota Corolla  25000  XYZ 789        True 
2  Ford Focus      27000  CBA 321        True 
3  Audi A4         35000  ZYX 987        False
4  NaN             29000  DEF 456        False

Output 我想要：

            Brand  Price Liscence Plate  Match
0  Honda Civic     22000  ABC 123        True 
1  Toyota Corolla  25000  XYZ 789        False
2  Ford Focus      27000  CBA 321        True 
3  Audi A4         35000  ZYX 987        True
4  NaN             29000  DEF 456        False

Answer 1

使用單詞匹配的一種方法：

pat = "|".join(search_for_these_values).replace(" ", "|")
match = df["Brand"].str.findall(r"\b(%s)\b" % pat)

Output：

0          [Honda]
1               []
2    [Ford, Focus]
3       [Audi, A4]
4              NaN
Name: Brand, dtype: object

然后您可以將其分配回去

df["match"] = match.str.len().ge(1)

最終 output：

            Brand  Price Liscence Plate  match
0     Honda Civic  22000        ABC 123   True
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987   True
4             NaN  29000        DEF 456  False

Answer 2

如果我們使用您概述的規則“如果一個詞為真，則為真”，那么這意味着如果品牌列中的一行有“2019”，那么將返回True ，我相信我們不希望這樣。 所以

話雖如此，您可以使用list comprehension創建一個新列表，這是您的search_for_these_values的先前split()版本（不包括年份），並將isin與any一起使用：

# list comprehension
import re
s = [word for cars in search_for_these_values for word in cars.split() if not re.search(r'\d{4}',word)]

# Assign True / False
df['Match'] = df['Brand'].str.split(expand = True).isin(s).any(1)

打印回來：

            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123   True
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987   True
4             NaN  29000        DEF 456  False

檢查字符串列表中的字符串是否在 DataFrame Pandas

問題描述

2 個解決方案

解決方案1
1 2021-12-06 08:33:52

解決方案2
0 2021-12-06 09:51:27

檢查字符串列表中的字符串是否在 DataFrame Pandas

問題描述

2 個解決方案

解決方案1 1 2021-12-06 08:33:52

解決方案2 0 2021-12-06 09:51:27

解決方案1
1 2021-12-06 08:33:52

解決方案2
0 2021-12-06 09:51:27