試圖從列表中刪除匹配的字符串

Question

我有兩個字符串，一個包含句子，另一個包含名稱列表。 查看代碼中的注釋以了解它們的格式。

我正在嘗試通過數據庫中的一列 go 並從句子中刪除所有名稱。

撥打function后，句子似乎沒有變化。

with open('names.txt', 'r') as f:
    NAMES = set(f.read().splitlines())
NAMES = [name.lower() for name in NAMES]

def remove_names(df, col, NAMES):
    for idx in range(df.shape[0]):
        print("\r", idx, df.shape[0], idx/df.shape[0], end="\r")
        # your list of texts
        texts=df[col][idx]
        #texts looks like
        #['explain', 'decided', 'make', 'coverage', 'area', 'rubbish', 'online', 'checker', 'correct', 'sky', 'account', 'connection']
        holder_list = []
        for word in texts:
            #NAMES looks like
            # ['pascha', 'lang', 'desaray', 'camielle', 'marquasha', 'trasha', 'shaquila',...
            for name in NAMES:
                if name == word or name == word + "'s":
                    continue
                else:
                    holder_list.append(word)
        df[col][idx] = holder_list.copy()
    return df[col]
df_norm['Full Text'] = remove_names(df_norm, 'Full Text', NAMES)

Answer 1

我更新了你的remove_names function：

def remove_names(df_list, NAMES):
    new_list = [x for x in df_list if x not in NAMES]
    return new_list


df_norm['Full Text'] = df_norm['Full Text'].apply(remove_names, args = ([NAMES]))

print(df_norm)

如果你想完全擺脫remove_names function，你可以使用lambda function，它使用一行代碼更新列：

df_norm['Full Text'] = df_norm['Full Text'].apply(lambda df_list: [x for x in df_list if x not in NAMES])

筆記：

上面的代碼假定您的df_norm['Full Text']列看起來像這樣：

Answer 2

由於您反復需要測試一個詞是否屬於NAMES ，因此您應該使NAMES成為一個集合而不是一個列表。 測試集合中的成員資格比測試列表中的成員資格快得多。

您可以使用pandas 的apply將 function 應用於 dataframe 的每一行。

如果 dataframe 的一行是單詞列表，您可以實現 function 以像這樣應用於每一行：

def remove_names(list_of_words, set_of_names):
    return [word for word in list_of_words if word not in set_of_names]

# TEST:
print( remove_names(['Alice', 'gives', 'Bob', 'an', 'apple'], {'Alice', 'Bob'}) )
# ['gives', 'an', 'apple']

如果你的 dataframe 的一行是一個句子，即一個用空格分隔的單詞的字符串，你可以實現 function 以像這樣應用於每一行：

def remove_names(sentence, set_of_names):
    return ' '.join(word for word in sentence.split() if word not in set_of_names)

# TEST:
print( remove_names('Alice gives Bob an apple', {'Alice', 'Bob'}) )
# 'gives an apple'

然后將其應用於 dataframe 的列：

import pandas as pd

df = pd.DataFrame({'id':[47, 28], 'sentence': ['Alice gives Bob an apple', 'An apple is given to Alice']})
df['nonames'] = df['sentence'].apply(remove_names, args=({'Alice', 'Bob'},))

print(df)
#    id                    sentence               nonames
# 0  47    Alice gives Bob an apple        gives an apple
# 1  28  An apple is given to Alice  An apple is given to

試圖從列表中刪除匹配的字符串

問題描述

2 個解決方案

解決方案1
0 2022-12-14 16:51:46

筆記：

解決方案2
0 2022-12-14 16:58:54

試圖從列表中刪除匹配的字符串

問題描述

2 個解決方案

解決方案1 0 2022-12-14 16:51:46

筆記：

解決方案2 0 2022-12-14 16:58:54

解決方案1
0 2022-12-14 16:51:46

解決方案2
0 2022-12-14 16:58:54