簡體   English   中英

如何過濾出不包含其他列表元素的列表列表?

[英]How to filter out list of lists which doesn't contain elements from other list?

我正在嘗試從下面的小列表中排除不包含特定POS標簽的列表,但不能這樣做。

a = ['VBG', 'RB', 'NNP']

我只希望包含輸出中元組列表的以下列表中包含上述標簽的列表:(以下標簽可能不正確,但僅用於表示目的)

  data = [[('User', 'NNP'),
      ('is', 'VBG'),
      ('not', 'RB'),
      ('able', 'JJ'),
      ('to', 'TO'),
      ('order', 'NN'),
      ('products', 'NNS'),
      ('from', 'IN'),
      ('iShopCatalog', 'NN'),
      ('Coala', 'NNP'),
      ('excluding', 'VBG'),
      ('articles', 'NNS'),
      ('from', 'IN'),
      ('VWR', 'NNP')],
     [('Arfter', 'NNP'),
      ('transferring', 'VBG'),
      ('the', 'DT'),
      ('articles', 'NNS'),
      ('from', 'IN'),
      ('COALA', 'NNP'),
      ('to', 'TO'),
      ('SRM', 'VB'),
      ('the', 'DT'),
      ('Category', 'NNP'),
      ('S9901', 'NNP'),
      ('Dummy', 'NNP'),
      ('is', 'VBZ'),
      ('maintained', 'VBN')],
     [('Due', 'JJ'),
      ('to', 'TO'),
      ('this', 'DT'),
      ('the', 'DT'),
      ('user', 'NN'),
      ('is', 'VBZ'),
      ('not', 'RB'),
      ('able', 'JJ'),
      ('to', 'TO'),
      ('order', 'NN'),
      ('the', 'DT'),
      ('product', 'NN')],
     [('All', 'DT'),
      ('other', 'JJ'),
      ('users', 'NNS'),
      ('can', 'MD'),
      ('order', 'NN'),
      ('these', 'DT'),
      ('articles', 'NNS')],
     [('She', 'PRP'),
      ('can', 'MD'),
      ('order', 'NN'),
      ('other', 'JJ'),
      ('products', 'NNS'),
      ('from', 'IN'),
      ('a', 'DT'),
      ('POETcatalog', 'NNP'),
      ('without', 'IN'),
      ('any', 'DT'),
      ('problems', 'NNS')],
     [('Furtheremore', 'IN'),
      ('she', 'PRP'),
      ('is', 'VBZ'),
      ('able', 'JJ'),
      ('to', 'TO'),
      ('order', 'NN'),
      ('products', 'NNS'),
      ('from', 'IN'),
      ('the', 'DT'),
      ('Vendor', 'NNP'),
      ('VWR', 'NNP'),
      ('through', 'IN'),
      ('COALA', 'NNP')],
     [('But', 'CC'),
      ('articles', 'NNP'),
      ('from', 'VBG'),
      ('all', 'RB'),
      ('other', 'JJ'),
      ('suppliers', 'NNS'),
      ('are', 'NNP'),
      ('not', 'VBG'),
      ('orderable', 'RB')],
     [('I', 'PRP'),
      ('already', 'RB'),
      ('spoke', 'VBD'),
      ('to', 'TO'),
      ('anic', 'VB'),
      ('who', 'WP'),
      ('maintain', 'VBP'),
      ('the', 'DT'),
      ('catalog', 'NN'),
      ('COALA', 'NNP'),
      ('and', 'CC'),
      ('they', 'PRP'),
      ('said', 'VBD'),
      ('that', 'IN'),
      ('the', 'DT'),
      ('reason', 'NN'),
      ('should', 'MD'),
      ('be', 'VB'),
      ('the', 'DT'),
      ('assignment', 'NN'),
      ('of', 'IN'),
      ('the', 'DT'),
      ('plant', 'NN')],
     [('User', 'NNP'),
      ('is', 'VBZ'),
      ('a', 'DT'),
      ('assinged', 'JJ'),
      ('to', 'TO'),
      ('Universitaet', 'NNP'),
      ('Regensburg', 'NNP'),
      ('in', 'IN'),
      ('Scout', 'NNP'),
      ('but', 'CC'),
      ('in', 'IN'),
      ('P17', 'NNP'),
      ('table', 'NN'),
      ('YESRMCDMUSER01', 'NNP'),
      ('she', 'PRP'),
      ('is', 'VBZ'),
      ('assigned', 'VBN'),
      ('to', 'TO'),
      ('company', 'NN'),
      ('001500', 'CD'),
      ('Merck', 'NNP'),
      ('KGaA', 'NNP')],
     [('Please', 'NNP'),
      ('find', 'VB'),
      ('attached', 'JJ'),
      ('some', 'DT'),
      ('screenshots', 'NNS')]]

我的預期輸出是:

data = [[('User', 'NNP'),
  ('is', 'VBG'),
  ('not', 'RB'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('iShopCatalog', 'NN'),
  ('Coala', 'NNP'),
  ('excluding', 'VBG'),
  ('articles', 'NNS'),
  ('from', 'IN'),
  ('VWR', 'NNP')],
  [('But', 'CC'),
  ('articles', 'NNP'),
  ('from', 'VBG'),
  ('all', 'RB'),
  ('other', 'JJ'),
  ('suppliers', 'NNS'),
  ('are', 'NNP'),
  ('not', 'VBG'),
  ('orderable', 'RB')]

我試圖通過編寫以下代碼來做到這一點,但無法做到:

list1=[]
for i in data:
    list2 = []
    a = ['VBG', 'RB', 'NNP']
    for j in i:
        if all(i in j[1] for i in a):
            list2.append(j)
    list1.append(list2)
list1

這將返回列表的空列表。 任何人都可以提供一個簡單易懂的代碼來獲得我的預期輸出。 謝謝。

您的條件在這里:

if all(i in j[1] for i in a):

正在詢問a中的所有標記是否都在j[1] !中,然后僅追加該項目 但是最多只能有一個 (根據您的數據),這就是為什么您得到一個空列表的原因。 相反,您想要:

In [32]: from operator import itemgetter
    ...: list1=[]
    ...: a = ['VBG', 'RB', 'NNP']
    ...: for sub in data:
    ...:     tags = set(map(itemgetter(1), sub))
    ...:     if all(s in tags for s in a):
    ...:         list1.append(sub)
    ...:

此檢查,如果*所有的項目a是一組tags構成的子列表...

In [33]: list1
Out[33]:
[[('User', 'NNP'),
  ('is', 'VBG'),
  ('not', 'RB'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('iShopCatalog', 'NN'),
  ('Coala', 'NNP'),
  ('excluding', 'VBG'),
  ('articles', 'NNS'),
  ('from', 'IN'),
  ('VWR', 'NNP')],
 [('But', 'CC'),
  ('articles', 'NNP'),
  ('from', 'VBG'),
  ('all', 'RB'),
  ('other', 'JJ'),
  ('suppliers', 'NNS'),
  ('are', 'NNP'),
  ('not', 'VBG'),
  ('orderable', 'RB')]]

該解決方案可能看起來很奇怪,但是它可以工作:

a = set(a)
def match(x):
  words,tags = zip(*x)
  return set(tags) & a == a
list(filter(match,data))
#[[('User', 'NNP'), ('is', 'VBG'), ('not', 'RB'), ('Coala', 'NNP'), 
#  ('excluding', 'VBG'), ('VWR', 'NNP')], [('Arfter', 'NNP'),     
#  ('transferring', 'VBG'), ('COALA', 'NNP'), ('Category', 'NNP'), 
#  ('S9901', 'NNP'), ('Dummy', 'NNP')], [('not', 'RB')], [], 
#  [('POETcatalog', 'NNP')], [('Vendor', 'NNP'), ('VWR', 'NNP'), 
#  ('COALA', 'NNP')], [('articles', 'NNP'), ('from', 'VBG'), ('all', 'RB'), 
#  ('are', 'NNP'), ('not', 'VBG'), ('orderable', 'RB')], [('already', 'RB'), 
#  ('COALA', 'NNP')], [('User', 'NNP'), ('Universitaet', 'NNP'), 
#  ('Regensburg', 'NNP'), ('Scout', 'NNP'), ('P17', 'NNP'), 
#  ('YESRMCDMUSER01', 'NNP'), ('Merck', 'NNP'), ('KGaA', 'NNP')], 
#  [('Please', 'NNP')]]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM