Python數據框行包含多個字符串搜索列表

Question

我有一個字符串格式的元素列表，我想在每一行中搜索並刪除其他元素。

下面的代碼工作正常。

但是，它將替換列表最后一個元素中的搜索。

我正在嘗試從列表“ l”中捕獲所有結果。

請參見下面的輸入和預期輸出。

碼：

l = ['Testing','Goals are met','Mathematics subject','tesTed prototype','Some Test']
df = pd.DataFrame(l)
df.columns = ['l']

輸入數據：

    l
0   Testing
1   Goals are met
2   Mathematics subject
3   tesTed prototype
4   Some Test

捕獲字符串的代碼包含：

select_list = ["Math",'Test']

for s in select_list:
    # keeping into a dataframe
    df1 = df[df.l.str.contains(s,case=False)]

df1

預期的輸出：注意上面的代碼沒有從上面選擇字符串'Math'。

l
0   Testing
2   Mathematics subject
3   tesTed prototype
4   Some Test

Answer 1

原因是您在for循環的每次迭代中都重新分配給df1 。

而不是這樣做，您應該使用正則表達式 ：

filtered_df = df[df['l'].str.contains('|'.join(select_list), case=False)]

輸出：

                     l
0              Testing
2  Mathematics subject
3     tesTed prototype
4            Some Test

上面的.join調用生成字符串'Math|Test' ，該字符串在傳遞給.str.contains ，告訴它查找包含'Math'和'Test'中至少一個的所有行。 如果您向select_list添加更多字符串，那么它也會尋找它們。

請注意，在某些情況下（例如，如果select_list中的字符串包含特殊字符（如“。”）），則可能需要修改此方法。

Answer 2

請嘗試這個

select_list = ["Math",'Test']
df1 =  pd.DataFrame([], columns = ['l'])
for s in select_list:
    df1 = pd.merge(df1, df[df.l.str.contains(s,case=False)], how='outer')

替代：除了在loop中使用dataframe ，還可以使用list來捕獲結果並創建dataframe

l2 = []
for s in select_list:
    l2.extend(df[df.l.str.contains(s,case=False)].values.tolist())

df3 = pd.DataFrame(l2)
df3.columns = ['l']

Python數據框行包含多個字符串搜索列表

問題描述

2 個解決方案

解決方案1
4 已采納 2019-03-23 02:13:35

解決方案2
0 2019-03-23 03:06:40

Python數據框行包含多個字符串搜索列表

問題描述

2 個解決方案

解決方案1 4 已采納 2019-03-23 02:13:35

解決方案2 0 2019-03-23 03:06:40

解決方案1
4 已采納 2019-03-23 02:13:35

解決方案2
0 2019-03-23 03:06:40