简体   繁体   English

从列表中删除子列表(如果包含给定元素)

[英]Removing sublist from list if it contains given elements

I have a list like this [1]: 我有一个这样的列表[1]:

[['a1', 'b1', 'c1'], ['a1', 'b1', 'c2'], ['a1', 'b1', 'c3'], 
 ['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'], 
 ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'], 
 ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'], 
 ['a2', 'b2', 'c1'], ['a2', 'b2', 'c2'], ['a2', 'b2', 'c3'], 
 ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'], 
 ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'], 
 ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'], 
 ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]

And one like this [2]: 像这样的人[2]:

[['a1', 'b1'], ['a2', 'b2']]

And I want to remove sublists of [1] which contain ALL elements in EITHER sublist of [2]. 我想删除[1]的子列表,其中包含[2]的EITHER子列表中的所有元素。 In other words, if a sublist of [1] contains 'a1' and 'b1' or 'a2' and 'b2' , it should be removed (only for full matches of the strings). 换句话说,如果[1]的子列表包含'a1' and 'b1''a2' and 'b2' ,则应将其删除(仅适用于字符串的完全匹配)。

List [1] should look like this: 列表[1]应该如下所示:

[['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'], 
 ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'], 
 ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'], 
 ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'], 
 ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'], 
 ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'], 
 ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]

I've tried a similar method to this: 我已经尝试过类似的方法:

https://stackoverflow.com/a/17934810/6278576 https://stackoverflow.com/a/17934810/6278576

However, I can't figure out how to adapt it to remove sublists from a list when several criteria are met. 但是,当满足多个条件时,我无法弄清楚如何对其进行调整以从列表中删除子列表。

How can this be done? 如何才能做到这一点?

You can use a function that filters your list for each group of items in your second list. 您可以使用一个功能来过滤第二个列表中每个项目组的列表。

def filterall(list_in, *filter_iterables):
    out = list_in.copy()
    for it in filter_iterables:
        out = [x for x in out if not all(i in x for i in it)]
    return out

x = [['a1', 'b1', 'c1'], ['a1', 'b1', 'c2'], ['a1', 'b1', 'c3'], 
 ['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'], 
 ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'], 
 ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'], 
 ['a2', 'b2', 'c1'], ['a2', 'b2', 'c2'], ['a2', 'b2', 'c3'], 
 ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'], 
 ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'], 
 ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'], 
 ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]

filterall(x, ['a1', 'b1'], ['a2', 'b2'])
# returns:
[['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'],
 ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'],
 ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'],
 ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'],
 ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'],
 ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'],
 ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]

You can still use list comprehension, and just nest your predicates: 您仍然可以使用列表推导,仅嵌套谓词:

list1 = [['a1', 'b1', 'c1'], ['a1', 'b1', 'c2'], ['a1', 'b1', 'c3'], 
         ['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'], 
         ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'], 
         ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'], 
         ['a2', 'b2', 'c1'], ['a2', 'b2', 'c2'], ['a2', 'b2', 'c3'], 
         ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'], 
         ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'], 
         ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'], 
         ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]
list2 = [['a1', 'b1'], ['a2', 'b2']]

print [sublist1 for sublist1 in list1 if not any([all([item2 in sublist1 for item2 in sublist2]) for sublist2 in list2])]

Prints for me: 为我打印:

[['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'],
 ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'], 
 ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'], 
 ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'], 
 ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'], 
 ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'], 
 ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]

If you want to run this on larger data (ie very long lists), you may want to turn your lists into sets to allow faster intersections. 如果要对较大的数据(即很长的列表)运行此列表,则可能需要将列表变成集合以允许更快的交集。

filter + lambda + all combinations filter + lambda + all组合

l1 = [['a1', 'b1', 'c1'], ['a1', 'b1', 'c2'], ['a1', 'b1', 'c3'], ['a1', 'b2', 'c1'], ['a1', 'b2', 'c2'], ['a1', 'b2', 'c3'], ['a1', 'b3', 'c1'], ['a1', 'b3', 'c2'], ['a1', 'b3', 'c3'], ['a2', 'b1', 'c1'], ['a2', 'b1', 'c2'], ['a2', 'b1', 'c3'], ['a2', 'b2', 'c1'], ['a2', 'b2', 'c2'], ['a2', 'b2', 'c3'], ['a2', 'b3', 'c1'], ['a2', 'b3', 'c2'], ['a2', 'b3', 'c3'], ['a3', 'b1', 'c1'], ['a3', 'b1', 'c2'], ['a3', 'b1', 'c3'], ['a3', 'b2', 'c1'], ['a3', 'b2', 'c2'], ['a3', 'b2', 'c3'], ['a3', 'b3', 'c1'], ['a3', 'b3', 'c2'], ['a3', 'b3', 'c3']]
l2 = [['a1', 'b1'], ['a2', 'b2']]
list(filter(lambda x: all(not all(j in x for j in i) for i in l2), l1))

Output: 输出:

[['a1', 'b2', 'c1'],
 ['a1', 'b2', 'c2'],
 ['a1', 'b2', 'c3'],
 ['a1', 'b3', 'c1'],
 ['a1', 'b3', 'c2'],
 ['a1', 'b3', 'c3'],
 ['a2', 'b1', 'c1'],
 ['a2', 'b1', 'c2'],
 ['a2', 'b1', 'c3'],
 ['a2', 'b3', 'c1'],
 ['a2', 'b3', 'c2'],
 ['a2', 'b3', 'c3'],
 ['a3', 'b1', 'c1'],
 ['a3', 'b1', 'c2'],
 ['a3', 'b1', 'c3'],
 ['a3', 'b2', 'c1'],
 ['a3', 'b2', 'c2'],
 ['a3', 'b2', 'c3'],
 ['a3', 'b3', 'c1'],
 ['a3', 'b3', 'c2'],
 ['a3', 'b3', 'c3']]

Well you can easily do that with for nested loops ...but my guess is that your teacher is trying to make you think how to optimize it. 好吧,您可以轻松地使用嵌套循环来做到这一点...但是我的猜测是您的老师正在尝试让您考虑如何对其进行优化。

I would sort each of the arrays. 我将对每个数组进行排序。 First each one containing strings and then the top level arrays containing arrays. 首先每个包含字符串,然后是包含数组的顶级数组。

With that the task becomes m.log(n) where m is the size of the 2nd array and n is the size of the 1st array. 这样,任务就变成了m.log(n),其中m是第二个数组的大小,n是第一个数组的大小。

Is this making sense to you ? 这对您有意义吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM