简体   繁体   English

从数据集中删除不匹配的项目

[英]Removing non-matching items from datasets

I have two datasets consisting of lists of nested lists such that each item in the list looks like list1[i]= [a, x, yb] and list2[j] = [c, x, y, d] and where the length of the two lists does not necessarily match. 我有两个由嵌套列表的列表组成的数据集,因此列表中的每个项目看起来像list1[i]= [a, x, yb]list2[j] = [c, x, y, d]以及长度这两个列表中的不一定匹配。 I'd like to be able to go through the lists, preserve their order, and eliminate any of the sub-lists that do not contain matching x values. 我希望能够浏览列表,保留其顺序,并消除不包含匹配x值的任何子列表。 In the end, I want to get two lists of identical length and where for each index, the x value is the same in corresponding sub lists. 最后,我想获得两个长度相同的列表,并且对于每个索引,相应的子列表中的x值相同。

Right now I have a somewhat messy code that assumes that the set of x values in list2 is a subset of those in list1 (true at the moment) and then proceeds to remove items where the x values don't match. 现在,我有一个有点混乱的代码,它假定list2x值集合是list1x值的子集(目前为true),然后继续删除x值不匹配的项。

    len_diff = len(list1) - len(list2)
    if len_diff > 0:
        removed = []
        for (counter, row) in enumerate(list2):
            while list1[counter][1] != list2[counter][1]:
                removed.append(list1.pop(counter))
        new_len_diff = len(list1) - len(list2)
        if new_len_diff < 0:
            raise IndexError('Data sets do not completely overlap')
        else:
            for i in range(new_len_diff):
                removed.append(temp_data.pop())

So basically I'm removing any items that don't x values match until they start matching again and then removing the end of list1 beyond the x values in list2 (raising an exception if I've cut too much out of list1 ). 因此,基本上,我将删除x值不匹配的所有项目,直到它们再次开始匹配,然后删除list2 x值之外的list1的末尾(如果我从list1切得太多,则会引发异常)。

Is there a better way to do this? 有一个更好的方法吗?

I don't necessarily need to relax the assumption that all x values in list2 are in list1 at the moment but it would make this code more useful to me in the future for other data manipulations. 我并不一定需要放松这样的假设,即list2中的所有x值目前都在list1中,但这将使该代码将来对我来说对其他数据操作更有用。 The biggest hole in my code now is that if there is a gap in my list1 data, I'll remove my entire list. 现在,我的代码中最大的漏洞是,如果list1数据中存在间隙,我将删除整个列表。

You should try this: 您应该尝试这样:

list1 = list2 = [x for x in list1 if x[1] in zip(*list2)[1]]

EDIT 编辑

Based on the comments below, the OP adapted this answer to do what was wanted by doing 根据以下评论,OP修改了此答案以完成所需的操作

list1 = [x for x in list1 if x[1] in zip(*list2)[1]]
list2 = [x for x in list2 if x[1] in zip(*list1)[1]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM