如何过滤出不包含其他列表元素的列表列表？

Question

I'm trying to exclude lists which doesn't contain specific POS tags from the below small list, but couldn't do so. 我正在尝试从下面的小列表中排除不包含特定POS标签的列表，但不能这样做。

a = ['VBG', 'RB', 'NNP']

I only want lists which contain above tags from the below list of list of tuples in output: (below tags may not be correct, but for representation purpose) 我只希望包含输出中元组列表的以下列表中包含上述标签的列表：（以下标签可能不正确，但仅用于表示目的）

  data = [[('User', 'NNP'),
      ('is', 'VBG'),
      ('not', 'RB'),
      ('able', 'JJ'),
      ('to', 'TO'),
      ('order', 'NN'),
      ('products', 'NNS'),
      ('from', 'IN'),
      ('iShopCatalog', 'NN'),
      ('Coala', 'NNP'),
      ('excluding', 'VBG'),
      ('articles', 'NNS'),
      ('from', 'IN'),
      ('VWR', 'NNP')],
     [('Arfter', 'NNP'),
      ('transferring', 'VBG'),
      ('the', 'DT'),
      ('articles', 'NNS'),
      ('from', 'IN'),
      ('COALA', 'NNP'),
      ('to', 'TO'),
      ('SRM', 'VB'),
      ('the', 'DT'),
      ('Category', 'NNP'),
      ('S9901', 'NNP'),
      ('Dummy', 'NNP'),
      ('is', 'VBZ'),
      ('maintained', 'VBN')],
     [('Due', 'JJ'),
      ('to', 'TO'),
      ('this', 'DT'),
      ('the', 'DT'),
      ('user', 'NN'),
      ('is', 'VBZ'),
      ('not', 'RB'),
      ('able', 'JJ'),
      ('to', 'TO'),
      ('order', 'NN'),
      ('the', 'DT'),
      ('product', 'NN')],
     [('All', 'DT'),
      ('other', 'JJ'),
      ('users', 'NNS'),
      ('can', 'MD'),
      ('order', 'NN'),
      ('these', 'DT'),
      ('articles', 'NNS')],
     [('She', 'PRP'),
      ('can', 'MD'),
      ('order', 'NN'),
      ('other', 'JJ'),
      ('products', 'NNS'),
      ('from', 'IN'),
      ('a', 'DT'),
      ('POETcatalog', 'NNP'),
      ('without', 'IN'),
      ('any', 'DT'),
      ('problems', 'NNS')],
     [('Furtheremore', 'IN'),
      ('she', 'PRP'),
      ('is', 'VBZ'),
      ('able', 'JJ'),
      ('to', 'TO'),
      ('order', 'NN'),
      ('products', 'NNS'),
      ('from', 'IN'),
      ('the', 'DT'),
      ('Vendor', 'NNP'),
      ('VWR', 'NNP'),
      ('through', 'IN'),
      ('COALA', 'NNP')],
     [('But', 'CC'),
      ('articles', 'NNP'),
      ('from', 'VBG'),
      ('all', 'RB'),
      ('other', 'JJ'),
      ('suppliers', 'NNS'),
      ('are', 'NNP'),
      ('not', 'VBG'),
      ('orderable', 'RB')],
     [('I', 'PRP'),
      ('already', 'RB'),
      ('spoke', 'VBD'),
      ('to', 'TO'),
      ('anic', 'VB'),
      ('who', 'WP'),
      ('maintain', 'VBP'),
      ('the', 'DT'),
      ('catalog', 'NN'),
      ('COALA', 'NNP'),
      ('and', 'CC'),
      ('they', 'PRP'),
      ('said', 'VBD'),
      ('that', 'IN'),
      ('the', 'DT'),
      ('reason', 'NN'),
      ('should', 'MD'),
      ('be', 'VB'),
      ('the', 'DT'),
      ('assignment', 'NN'),
      ('of', 'IN'),
      ('the', 'DT'),
      ('plant', 'NN')],
     [('User', 'NNP'),
      ('is', 'VBZ'),
      ('a', 'DT'),
      ('assinged', 'JJ'),
      ('to', 'TO'),
      ('Universitaet', 'NNP'),
      ('Regensburg', 'NNP'),
      ('in', 'IN'),
      ('Scout', 'NNP'),
      ('but', 'CC'),
      ('in', 'IN'),
      ('P17', 'NNP'),
      ('table', 'NN'),
      ('YESRMCDMUSER01', 'NNP'),
      ('she', 'PRP'),
      ('is', 'VBZ'),
      ('assigned', 'VBN'),
      ('to', 'TO'),
      ('company', 'NN'),
      ('001500', 'CD'),
      ('Merck', 'NNP'),
      ('KGaA', 'NNP')],
     [('Please', 'NNP'),
      ('find', 'VB'),
      ('attached', 'JJ'),
      ('some', 'DT'),
      ('screenshots', 'NNS')]]

My expected output is: 我的预期输出是：

data = [[('User', 'NNP'),
  ('is', 'VBG'),
  ('not', 'RB'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('iShopCatalog', 'NN'),
  ('Coala', 'NNP'),
  ('excluding', 'VBG'),
  ('articles', 'NNS'),
  ('from', 'IN'),
  ('VWR', 'NNP')],
  [('But', 'CC'),
  ('articles', 'NNP'),
  ('from', 'VBG'),
  ('all', 'RB'),
  ('other', 'JJ'),
  ('suppliers', 'NNS'),
  ('are', 'NNP'),
  ('not', 'VBG'),
  ('orderable', 'RB')]

I tried to do this by writing the below code, but unable to do so: 我试图通过编写以下代码来做到这一点，但无法做到：

list1=[]
for i in data:
    list2 = []
    a = ['VBG', 'RB', 'NNP']
    for j in i:
        if all(i in j[1] for i in a):
            list2.append(j)
    list1.append(list2)
list1

which is returning empty list of lists. 这将返回列表的空列表。 Can anybody provide a simple understandable code to get my expected output. 任何人都可以提供一个简单易懂的代码来获得我的预期输出。 Thanks. 谢谢。

Answer 1

Your condition here: 您的条件在这里：

if all(i in j[1] for i in a):

Is asking if all of the tags in a are in j[1] !, And then appending only that item . 正在询问a中的所有标记是否都在j[1] ！中，然后仅追加该项目 。 but at most one will be (given your data), which is why you are getting an empty list. 但是最多只能有一个（根据您的数据），这就是为什么您得到一个空列表的原因。 Rather, you want: 相反，您想要：

In [32]: from operator import itemgetter
    ...: list1=[]
    ...: a = ['VBG', 'RB', 'NNP']
    ...: for sub in data:
    ...:     tags = set(map(itemgetter(1), sub))
    ...:     if all(s in tags for s in a):
    ...:         list1.append(sub)
    ...:

This checks if *all the items in a are in the set of tags form the sublist... 此检查，如果*所有的项目a是一组tags构成的子列表...

In [33]: list1
Out[33]:
[[('User', 'NNP'),
  ('is', 'VBG'),
  ('not', 'RB'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('iShopCatalog', 'NN'),
  ('Coala', 'NNP'),
  ('excluding', 'VBG'),
  ('articles', 'NNS'),
  ('from', 'IN'),
  ('VWR', 'NNP')],
 [('But', 'CC'),
  ('articles', 'NNP'),
  ('from', 'VBG'),
  ('all', 'RB'),
  ('other', 'JJ'),
  ('suppliers', 'NNS'),
  ('are', 'NNP'),
  ('not', 'VBG'),
  ('orderable', 'RB')]]

Answer 2

This solution may look totally weird, but it works: 该解决方案可能看起来很奇怪，但是它可以工作：

a = set(a)
def match(x):
  words,tags = zip(*x)
  return set(tags) & a == a
list(filter(match,data))
#[[('User', 'NNP'), ('is', 'VBG'), ('not', 'RB'), ('Coala', 'NNP'), 
#  ('excluding', 'VBG'), ('VWR', 'NNP')], [('Arfter', 'NNP'),     
#  ('transferring', 'VBG'), ('COALA', 'NNP'), ('Category', 'NNP'), 
#  ('S9901', 'NNP'), ('Dummy', 'NNP')], [('not', 'RB')], [], 
#  [('POETcatalog', 'NNP')], [('Vendor', 'NNP'), ('VWR', 'NNP'), 
#  ('COALA', 'NNP')], [('articles', 'NNP'), ('from', 'VBG'), ('all', 'RB'), 
#  ('are', 'NNP'), ('not', 'VBG'), ('orderable', 'RB')], [('already', 'RB'), 
#  ('COALA', 'NNP')], [('User', 'NNP'), ('Universitaet', 'NNP'), 
#  ('Regensburg', 'NNP'), ('Scout', 'NNP'), ('P17', 'NNP'), 
#  ('YESRMCDMUSER01', 'NNP'), ('Merck', 'NNP'), ('KGaA', 'NNP')], 
#  [('Please', 'NNP')]]

如何过滤出不包含其他列表元素的列表列表？

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-05-17 23:29:03

解决方案2
2 2017-05-17 23:33:02

如何过滤出不包含其他列表元素的列表列表？

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-05-17 23:29:03

解决方案2 2 2017-05-17 23:33:02

解决方案1
3 已采纳 2017-05-17 23:29:03

解决方案2
2 2017-05-17 23:33:02