如果 substring 在數據框列的列表中，則從字符串中刪除多字 substring

Question

在這里詢問我的問題的后續問題：如果 substring 在數據框列的列表中，則從字符串中刪除 substring

我有以下數據框df1

       string             lists
0      I HAVE A PET DOG   ['fox', 'pet dog', 'cat']
1      there is a cat     ['dog', 'house', 'car']
2      hello EVERYONE     ['hi', 'hello', 'everyone']
3      hi my name is Joe  ['name', 'was', 'is Joe']

我正在嘗試返回一個看起來像這樣的數據框df2

       string             lists                         new_string
0      I HAVE A PET DOG   ['fox', 'pet dog', 'cat']     I HAVE A
1      there is a cat     ['dog', 'house', 'car']       there is a cat
2      hello everyone     ['hi', 'hello', 'everyone']   
3      hi my name is Joe  ['name', 'was', 'is Joe']     hi my

我使用的解決方案不適用於 substring 是多個單詞的情況，例如pet dog or is Joe

df['new_string'] = df['string'].apply(lambda x: ' '.join([word for word in x.split() if word.lower() not in df['lists'][df['string'] == x].values[0]]))

Answer 1

這個問題大致相似，但仍然有很大不同。

在這種情況下，我們在行軸（ axis=1 ）上使用re.sub ：

df.apply(lambda row: re.sub("|".join(row["lists"]), "", row["string"], flags=re.I), axis=1)

              string                  lists      new_string
0   I HAVE A PET DOG    [fox, pet dog, cat]       I HAVE A 
1     there is a cat      [dog, house, car]  there is a cat
2     hello EVERYONE  [hi, hello, everyone]                
3  hi my name is Joe    [name, was, is Joe]         hi my

分解它：

df.apply with axis=1將 function 應用於每一行
re.sub是str.replace的正則表達式變體
我們使用"|".join來制作一個“|” 分隔字符串，在正則表達式中充當or運算符。 所以它刪除了這些詞之一。
flags=re.I所以它忽略大小寫字母。

注意：由於我們在行軸上使用apply ，這基本上是一個后台循環，因此不是很優化。

如果 substring 在數據框列的列表中，則從字符串中刪除多字 substring

問題描述

1 個解決方案

解決方案1
1 已采納 2022-09-19 20:58:45

如果 substring 在數據框列的列表中，則從字符串中刪除多字 substring

問題描述

1 個解決方案

解決方案1 1 已采納 2022-09-19 20:58:45

解決方案1
1 已采納 2022-09-19 20:58:45