简体   繁体   中英

How do I remove common elements from two lists?

I have two lists such as the examples below (in reality, a is longer) and I would like to remove all common elements, in this case the punctuation given in list punctuation .

a = [['A', 'man,', 'view,', 'becomes', 'mankind', ';', 'mankind', 'member', 'comical', 'family', 'Intelligences', '.'],['Jeans', 'lengthen', 'legs', ',', 'hug', 'hips', ',', 'turn', 'heads', '.']]
punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]

如果需要保留订单,请逐个单词逐项删除和测试收容措施。

cleaned = [word for word in words if word not in blacklist] 

When the order is not important:

You can do a set() operation on it, but first you have to flatten the nested list a (taken from Making a flat list out of list of lists in Python ):

b = [item for sublist in a for item in sublist]
cleaned = list(set(b) - set(punctuation))

cleaned is a list that looks like ['A', 'hug', 'heads', 'family', 'Intelligences', 'becomes', 'Jeans', 'lengthen', 'member', 'turn', 'mankind', 'view,', 'legs', 'man,', 'hips', 'comical']

When the order is important:

Simply a list comprehension, which is probably slower

cleaned = [x for x in b if x not in punctuation]

cleaned looks like ['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

You can do this, but the list order might change.

[list(set(sublist)-set(punctuation)) for sublist in a]

Using sets, you can remove the punctuation entries, and cast the result to a list again. Use list comprehension to do it for each sublist in the list.


If keeping the order is important, you can do this:

[[x for x in sublist if not (x in punctuation)] for sublist in a]

You can do:

>>> from itertools import chain
>>> filter(lambda e: e not in punctuation, chain(*a))
['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

Or, if you want to maintain you sublist structure:

>>> [filter(lambda e: e not in punctuation, sub) for sub in a]
[['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences'], ['Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM