[英]How to filter out list of lists which doesn't contain elements from other list?
I'm trying to exclude lists which doesn't contain specific POS tags from the below small list, but couldn't do so. 我正在尝试从下面的小列表中排除不包含特定POS标签的列表,但不能这样做。
a = ['VBG', 'RB', 'NNP']
I only want lists which contain above tags from the below list of list of tuples in output: (below tags may not be correct, but for representation purpose) 我只希望包含输出中元组列表的以下列表中包含上述标签的列表:(以下标签可能不正确,但仅用于表示目的)
data = [[('User', 'NNP'),
('is', 'VBG'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('Arfter', 'NNP'),
('transferring', 'VBG'),
('the', 'DT'),
('articles', 'NNS'),
('from', 'IN'),
('COALA', 'NNP'),
('to', 'TO'),
('SRM', 'VB'),
('the', 'DT'),
('Category', 'NNP'),
('S9901', 'NNP'),
('Dummy', 'NNP'),
('is', 'VBZ'),
('maintained', 'VBN')],
[('Due', 'JJ'),
('to', 'TO'),
('this', 'DT'),
('the', 'DT'),
('user', 'NN'),
('is', 'VBZ'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('the', 'DT'),
('product', 'NN')],
[('All', 'DT'),
('other', 'JJ'),
('users', 'NNS'),
('can', 'MD'),
('order', 'NN'),
('these', 'DT'),
('articles', 'NNS')],
[('She', 'PRP'),
('can', 'MD'),
('order', 'NN'),
('other', 'JJ'),
('products', 'NNS'),
('from', 'IN'),
('a', 'DT'),
('POETcatalog', 'NNP'),
('without', 'IN'),
('any', 'DT'),
('problems', 'NNS')],
[('Furtheremore', 'IN'),
('she', 'PRP'),
('is', 'VBZ'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('the', 'DT'),
('Vendor', 'NNP'),
('VWR', 'NNP'),
('through', 'IN'),
('COALA', 'NNP')],
[('But', 'CC'),
('articles', 'NNP'),
('from', 'VBG'),
('all', 'RB'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'NNP'),
('not', 'VBG'),
('orderable', 'RB')],
[('I', 'PRP'),
('already', 'RB'),
('spoke', 'VBD'),
('to', 'TO'),
('anic', 'VB'),
('who', 'WP'),
('maintain', 'VBP'),
('the', 'DT'),
('catalog', 'NN'),
('COALA', 'NNP'),
('and', 'CC'),
('they', 'PRP'),
('said', 'VBD'),
('that', 'IN'),
('the', 'DT'),
('reason', 'NN'),
('should', 'MD'),
('be', 'VB'),
('the', 'DT'),
('assignment', 'NN'),
('of', 'IN'),
('the', 'DT'),
('plant', 'NN')],
[('User', 'NNP'),
('is', 'VBZ'),
('a', 'DT'),
('assinged', 'JJ'),
('to', 'TO'),
('Universitaet', 'NNP'),
('Regensburg', 'NNP'),
('in', 'IN'),
('Scout', 'NNP'),
('but', 'CC'),
('in', 'IN'),
('P17', 'NNP'),
('table', 'NN'),
('YESRMCDMUSER01', 'NNP'),
('she', 'PRP'),
('is', 'VBZ'),
('assigned', 'VBN'),
('to', 'TO'),
('company', 'NN'),
('001500', 'CD'),
('Merck', 'NNP'),
('KGaA', 'NNP')],
[('Please', 'NNP'),
('find', 'VB'),
('attached', 'JJ'),
('some', 'DT'),
('screenshots', 'NNS')]]
My expected output is: 我的预期输出是:
data = [[('User', 'NNP'),
('is', 'VBG'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('But', 'CC'),
('articles', 'NNP'),
('from', 'VBG'),
('all', 'RB'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'NNP'),
('not', 'VBG'),
('orderable', 'RB')]
I tried to do this by writing the below code, but unable to do so: 我试图通过编写以下代码来做到这一点,但无法做到:
list1=[]
for i in data:
list2 = []
a = ['VBG', 'RB', 'NNP']
for j in i:
if all(i in j[1] for i in a):
list2.append(j)
list1.append(list2)
list1
which is returning empty list of lists. 这将返回列表的空列表。 Can anybody provide a simple understandable code to get my expected output. 任何人都可以提供一个简单易懂的代码来获得我的预期输出。 Thanks. 谢谢。
Your condition here: 您的条件在这里:
if all(i in j[1] for i in a):
Is asking if all of the tags in a are in j[1]
!, And then appending only that item . 正在询问a中的所有标记是否都在j[1]
!中,然后仅追加该项目 。 but at most one will be (given your data), which is why you are getting an empty list. 但是最多只能有一个 (根据您的数据),这就是为什么您得到一个空列表的原因。 Rather, you want: 相反,您想要:
In [32]: from operator import itemgetter
...: list1=[]
...: a = ['VBG', 'RB', 'NNP']
...: for sub in data:
...: tags = set(map(itemgetter(1), sub))
...: if all(s in tags for s in a):
...: list1.append(sub)
...:
This checks if *all the items in a
are in the set of tags
form the sublist... 此检查,如果*所有的项目a
是一组tags
构成的子列表...
In [33]: list1
Out[33]:
[[('User', 'NNP'),
('is', 'VBG'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('But', 'CC'),
('articles', 'NNP'),
('from', 'VBG'),
('all', 'RB'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'NNP'),
('not', 'VBG'),
('orderable', 'RB')]]
This solution may look totally weird, but it works: 该解决方案可能看起来很奇怪,但是它可以工作:
a = set(a)
def match(x):
words,tags = zip(*x)
return set(tags) & a == a
list(filter(match,data))
#[[('User', 'NNP'), ('is', 'VBG'), ('not', 'RB'), ('Coala', 'NNP'),
# ('excluding', 'VBG'), ('VWR', 'NNP')], [('Arfter', 'NNP'),
# ('transferring', 'VBG'), ('COALA', 'NNP'), ('Category', 'NNP'),
# ('S9901', 'NNP'), ('Dummy', 'NNP')], [('not', 'RB')], [],
# [('POETcatalog', 'NNP')], [('Vendor', 'NNP'), ('VWR', 'NNP'),
# ('COALA', 'NNP')], [('articles', 'NNP'), ('from', 'VBG'), ('all', 'RB'),
# ('are', 'NNP'), ('not', 'VBG'), ('orderable', 'RB')], [('already', 'RB'),
# ('COALA', 'NNP')], [('User', 'NNP'), ('Universitaet', 'NNP'),
# ('Regensburg', 'NNP'), ('Scout', 'NNP'), ('P17', 'NNP'),
# ('YESRMCDMUSER01', 'NNP'), ('Merck', 'NNP'), ('KGaA', 'NNP')],
# [('Please', 'NNP')]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.