简体   繁体   English

从列表(如白名单)中删除除某些单词外的所有单词

[英]Remove all but certain words from a list (like a white list)

Imagine a list that has random words: 想象一个包含随机单词的列表:

words = ['elephant', 'dog', 'blue', 'sam', 'white', 'red', 'sun', 'moon']

And I want to remove all but the following words (like a whitelist): 我想删除以下所有单词(例如白名单):

colors = ['red', 'green', 'blue', 'orange', 'white']

And I want to produce the following list (order matters): 我想生成以下列表(订单事项):

filtered = ['blue', 'white', 'red']

I've thought about something like this (which works fine): 我考虑过这样的事情(效果很好):

filtered = filter (lambda a: a == 'red' or a == 'green' or a == 'blue' or a == 'orange' or a == 'white', words)

But is this really the best / most efficient way? 但这真的是最好/最有效的方法吗?

If you want to keep order and efficiently filter out non-colors, create a set of colors, so that in checking is faster and then you can just go thru all words and filter out non-colors 如果要保持秩序并有效地滤除非颜色,请创建一set颜色,以便更快地in检查,然后可以遍历所有单词并滤除非颜色。

words = ['elephant', 'dog', 'blue', 'sam', 'white', 'red', 'sun', 'moon']
colors = set(['red', 'green', 'blue', 'orange', 'white'])
print [word for word in words if word in colors]

output: 输出:

['blue', 'white', 'red']
words = ['elephant', 'dog', 'blue', 'sam', 'white', 'red', 'sun', 'moon']
filterset = frozenset(['red', 'green', 'blue', 'orange', 'white'])
filtered = [x for x in words if x in filterset]

This solution has the advantage that even for a relatively large filterset it will be relatively fast, and it doesn't assume that the words list contains only unique entries. 该解决方案的优势在于,即使对于相对较大的filterset它也将相对较快,并且不假定words列表仅包含唯一条目。

You could leave the filterset as just your filterlist , but this will hurt performance, especially if the list is large. 你可以离开filterset的只是你的filterlist ,但这样会伤害的性能,特别是如果列表很大。

filtered = filter(lambda a: a in whitelis, words)

should do the trick 应该做到的

this can also be written as a list comprehension 这也可以写成列表理解

filtered = [x for x in letters if x in whitelist]

as mentioned below, you can use the set type to make sure that every word in the whitelist is unique. 如下所述,您可以使用集合类型来确保白名单中的每个单词都是唯一的。 This is useful when your whitelist is not hardcoded, but somehow generated, for example from records in a database. 当您的白名单不是经过硬编码而是通过某种方式生成的(例如从数据库中的记录生成)时,这很有用。

use set operations: 使用集合操作:

words = ['elephant', 'dog', 'blue', 'sam', 'white', 'red', 'sun', 'moon']
colors = ['red', 'green', 'blue', 'orange', 'white']
filtered = set(words).difference(colors)

Although list comprehension are often considered as more pythonic, I do like functional filter if we can write it without lambda : 尽管列表理解通常被认为是更pythonic的,但是如果我们可以不使用lambda编写它,我确实喜欢函数filter

>> filter(set(colors).__contains__, words)
['blue', 'white', 'red']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM