从不在列表中的每行数据框中的列中删除字符串

Question

Say I have a list of words:假设我有一个单词列表：

listOfWords = ['Apple','Orange','Banana','Potato']

And my data frame looks like this:我的数据框如下所示：

In:

ColumnA:
['Apple','Turnip','Banana','Potato']
['Apple','Orange','Banana','Potato']
['Apple','Orange','Pastry','Potato']
['Melon','Orange','Banana','Potato']
['Apple','Orange','Banana','Sandwich']

I am currently running the following code to retrieve the desired output我目前正在运行以下代码来检索所需的输出

for index, row in df.iterrows():
    for word in df['Column']:
        if word not in listOfWords:
            word.replace(word,"")



Out:

ColumnA:
    ['Apple','Banana','Potato']
    ['Apple','Orange','Banana','Potato']
    ['Apple','Orange','Potato']
    ['Orange','Banana','Potato']
    ['Apple','Orange','Banana']

I am currently running this on 12,000 records and a list of length 12,000.我目前正在 12,000 条记录和长度为 12,000 的列表上运行它。 It has been running without errors for a few hours, however I am unsure if this is the most efficient way to do this.它已经运行了几个小时没有错误，但是我不确定这是否是最有效的方法。

Answer 1

Use list comprehension in apply or nested list comprehension :在apply或嵌套list comprehension中apply list comprehension ：

df['ColumnA']= df['ColumnA'].apply(lambda x: [y for y in x if y in listOfWords]) 
#another solution
#df['ColumnA'] = [[y for y in x if y in listOfWords] for x in df['ColumnA']]
print (df)
                           ColumnA
0          [Apple, Banana, Potato]
1  [Apple, Orange, Banana, Potato]
2          [Apple, Orange, Potato]
3         [Orange, Banana, Potato]
4          [Apple, Orange, Banana]

Or if order is not importat use set s with intersection:或者，如果订单不是重要的，请使用带有交集的set s：

s = set(listOfWords)
df['ColumnA']= df['ColumnA'].apply(lambda x: list(set(x) & s))
print (df)
                           ColumnA
0          [Banana, Potato, Apple]
1  [Banana, Potato, Orange, Apple]
2          [Potato, Orange, Apple]
3         [Banana, Potato, Orange]
4          [Banana, Orange, Apple]

从不在列表中的每行数据框中的列中删除字符串

问题描述

1 个解决方案

解决方案1
1 2019-03-10 19:24:51

从不在列表中的每行数据框中的列中删除字符串

问题描述

1 个解决方案

解决方案1 1 2019-03-10 19:24:51

解决方案1
1 2019-03-10 19:24:51