简体   繁体   English

快速删除包含其他列表元组的元组

[英]Quickly remove tuples that contain tuples of other list

I would like to remove all the tuples in List A that contain a tuple in List B.我想删除列表 A 中包含列表 B 中的元组的所有元组。

This is normally a trivial matter, but I have 10 million records in List A and 200K in List B. My current script (see below) is very slow (~10 seconds for each scan of List A).这通常是一件小事,但我在列表 A 中有 1000 万条记录,在列表 B 中有 20 万条记录。我当前的脚本(见下文)非常慢(每次扫描列表 A 约 10 秒)。

Example:例子:

# Input:
listA = [(1,2,3,4,5),(1,2,4,5,6),(1,2,3,7,55),(8,21,22,24,37),...]  # 10 million records
listB = [(1,2,4),(1,4,6),(21,24,37),...]  # 200K records

# Desired Output (filtered listA):
listA = [(1,2,3,7,55),...]

Current script that is slow:当前的脚本很慢:

listA=[(1,2,3,4,5),(1,2,4,5,6),(1,2,3,7,55),(8,21,22,24,37)]
listB=[(1,2,4),(1,4,6),(21,24,37)]
listATemp=[]

for b in listB:
  for a in listA:
    if not set(b).issubset(a) :
      listATemp.append(a)
  listA= listATemp
  listATemp= []

Using itertools.combinations and frozenset :使用itertools.combinationsfrozenset

setB = set(map(frozenset, listB))
n = len(listB[0])
listA = [a for a in listA if not any(frozenset(c) in setB for c in combinations(a, n))]

Or assuming every tuple is sorted (if not, you could of course sort them first):或者假设每个元组都已排序(如果没有,您当然可以先对它们进行排序):

setB = set(listB)
n = len(listB[0])
listA = [a for a in listA if setB.isdisjoint(combinations(a, n))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM