[英]How to filter out list elements based on if they contain a substring from another list in Python
[英]How to filter out list of lists which doesn't contain elements from other list?
我正在嘗試從下面的小列表中排除不包含特定POS標簽的列表,但不能這樣做。
a = ['VBG', 'RB', 'NNP']
我只希望包含輸出中元組列表的以下列表中包含上述標簽的列表:(以下標簽可能不正確,但僅用於表示目的)
data = [[('User', 'NNP'),
('is', 'VBG'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('Arfter', 'NNP'),
('transferring', 'VBG'),
('the', 'DT'),
('articles', 'NNS'),
('from', 'IN'),
('COALA', 'NNP'),
('to', 'TO'),
('SRM', 'VB'),
('the', 'DT'),
('Category', 'NNP'),
('S9901', 'NNP'),
('Dummy', 'NNP'),
('is', 'VBZ'),
('maintained', 'VBN')],
[('Due', 'JJ'),
('to', 'TO'),
('this', 'DT'),
('the', 'DT'),
('user', 'NN'),
('is', 'VBZ'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('the', 'DT'),
('product', 'NN')],
[('All', 'DT'),
('other', 'JJ'),
('users', 'NNS'),
('can', 'MD'),
('order', 'NN'),
('these', 'DT'),
('articles', 'NNS')],
[('She', 'PRP'),
('can', 'MD'),
('order', 'NN'),
('other', 'JJ'),
('products', 'NNS'),
('from', 'IN'),
('a', 'DT'),
('POETcatalog', 'NNP'),
('without', 'IN'),
('any', 'DT'),
('problems', 'NNS')],
[('Furtheremore', 'IN'),
('she', 'PRP'),
('is', 'VBZ'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('the', 'DT'),
('Vendor', 'NNP'),
('VWR', 'NNP'),
('through', 'IN'),
('COALA', 'NNP')],
[('But', 'CC'),
('articles', 'NNP'),
('from', 'VBG'),
('all', 'RB'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'NNP'),
('not', 'VBG'),
('orderable', 'RB')],
[('I', 'PRP'),
('already', 'RB'),
('spoke', 'VBD'),
('to', 'TO'),
('anic', 'VB'),
('who', 'WP'),
('maintain', 'VBP'),
('the', 'DT'),
('catalog', 'NN'),
('COALA', 'NNP'),
('and', 'CC'),
('they', 'PRP'),
('said', 'VBD'),
('that', 'IN'),
('the', 'DT'),
('reason', 'NN'),
('should', 'MD'),
('be', 'VB'),
('the', 'DT'),
('assignment', 'NN'),
('of', 'IN'),
('the', 'DT'),
('plant', 'NN')],
[('User', 'NNP'),
('is', 'VBZ'),
('a', 'DT'),
('assinged', 'JJ'),
('to', 'TO'),
('Universitaet', 'NNP'),
('Regensburg', 'NNP'),
('in', 'IN'),
('Scout', 'NNP'),
('but', 'CC'),
('in', 'IN'),
('P17', 'NNP'),
('table', 'NN'),
('YESRMCDMUSER01', 'NNP'),
('she', 'PRP'),
('is', 'VBZ'),
('assigned', 'VBN'),
('to', 'TO'),
('company', 'NN'),
('001500', 'CD'),
('Merck', 'NNP'),
('KGaA', 'NNP')],
[('Please', 'NNP'),
('find', 'VB'),
('attached', 'JJ'),
('some', 'DT'),
('screenshots', 'NNS')]]
我的預期輸出是:
data = [[('User', 'NNP'),
('is', 'VBG'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('But', 'CC'),
('articles', 'NNP'),
('from', 'VBG'),
('all', 'RB'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'NNP'),
('not', 'VBG'),
('orderable', 'RB')]
我試圖通過編寫以下代碼來做到這一點,但無法做到:
list1=[]
for i in data:
list2 = []
a = ['VBG', 'RB', 'NNP']
for j in i:
if all(i in j[1] for i in a):
list2.append(j)
list1.append(list2)
list1
這將返回列表的空列表。 任何人都可以提供一個簡單易懂的代碼來獲得我的預期輸出。 謝謝。
您的條件在這里:
if all(i in j[1] for i in a):
正在詢問a中的所有標記是否都在j[1]
!中,然后僅追加該項目 。 但是最多只能有一個 (根據您的數據),這就是為什么您得到一個空列表的原因。 相反,您想要:
In [32]: from operator import itemgetter
...: list1=[]
...: a = ['VBG', 'RB', 'NNP']
...: for sub in data:
...: tags = set(map(itemgetter(1), sub))
...: if all(s in tags for s in a):
...: list1.append(sub)
...:
此檢查,如果*所有的項目a
是一組tags
構成的子列表...
In [33]: list1
Out[33]:
[[('User', 'NNP'),
('is', 'VBG'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('But', 'CC'),
('articles', 'NNP'),
('from', 'VBG'),
('all', 'RB'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'NNP'),
('not', 'VBG'),
('orderable', 'RB')]]
該解決方案可能看起來很奇怪,但是它可以工作:
a = set(a)
def match(x):
words,tags = zip(*x)
return set(tags) & a == a
list(filter(match,data))
#[[('User', 'NNP'), ('is', 'VBG'), ('not', 'RB'), ('Coala', 'NNP'),
# ('excluding', 'VBG'), ('VWR', 'NNP')], [('Arfter', 'NNP'),
# ('transferring', 'VBG'), ('COALA', 'NNP'), ('Category', 'NNP'),
# ('S9901', 'NNP'), ('Dummy', 'NNP')], [('not', 'RB')], [],
# [('POETcatalog', 'NNP')], [('Vendor', 'NNP'), ('VWR', 'NNP'),
# ('COALA', 'NNP')], [('articles', 'NNP'), ('from', 'VBG'), ('all', 'RB'),
# ('are', 'NNP'), ('not', 'VBG'), ('orderable', 'RB')], [('already', 'RB'),
# ('COALA', 'NNP')], [('User', 'NNP'), ('Universitaet', 'NNP'),
# ('Regensburg', 'NNP'), ('Scout', 'NNP'), ('P17', 'NNP'),
# ('YESRMCDMUSER01', 'NNP'), ('Merck', 'NNP'), ('KGaA', 'NNP')],
# [('Please', 'NNP')]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.