[英]Removing non-matching items from datasets
I have two datasets consisting of lists of nested lists such that each item in the list looks like list1[i]= [a, x, yb]
and list2[j] = [c, x, y, d]
and where the length of the two lists does not necessarily match. 我有两个由嵌套列表的列表组成的数据集,因此列表中的每个项目看起来像list1[i]= [a, x, yb]
和list2[j] = [c, x, y, d]
以及长度这两个列表中的不一定匹配。 I'd like to be able to go through the lists, preserve their order, and eliminate any of the sub-lists that do not contain matching x
values. 我希望能够浏览列表,保留其顺序,并消除不包含匹配x
值的任何子列表。 In the end, I want to get two lists of identical length and where for each index, the x
value is the same in corresponding sub lists. 最后,我想获得两个长度相同的列表,并且对于每个索引,相应的子列表中的x
值相同。
Right now I have a somewhat messy code that assumes that the set of x
values in list2
is a subset of those in list1
(true at the moment) and then proceeds to remove items where the x
values don't match. 现在,我有一个有点混乱的代码,它假定list2
的x
值集合是list1
的x
值的子集(目前为true),然后继续删除x
值不匹配的项。
len_diff = len(list1) - len(list2)
if len_diff > 0:
removed = []
for (counter, row) in enumerate(list2):
while list1[counter][1] != list2[counter][1]:
removed.append(list1.pop(counter))
new_len_diff = len(list1) - len(list2)
if new_len_diff < 0:
raise IndexError('Data sets do not completely overlap')
else:
for i in range(new_len_diff):
removed.append(temp_data.pop())
So basically I'm removing any items that don't x
values match until they start matching again and then removing the end of list1
beyond the x
values in list2
(raising an exception if I've cut too much out of list1
). 因此,基本上,我将删除x
值不匹配的所有项目,直到它们再次开始匹配,然后删除list2
x
值之外的list1
的末尾(如果我从list1
切得太多,则会引发异常)。
Is there a better way to do this? 有一个更好的方法吗?
I don't necessarily need to relax the assumption that all x
values in list2
are in list1
at the moment but it would make this code more useful to me in the future for other data manipulations. 我并不一定需要放松这样的假设,即list2
中的所有x
值目前都在list1
中,但这将使该代码将来对我来说对其他数据操作更有用。 The biggest hole in my code now is that if there is a gap in my list1 data, I'll remove my entire list. 现在,我的代码中最大的漏洞是,如果list1数据中存在间隙,我将删除整个列表。
You should try this: 您应该尝试这样:
list1 = list2 = [x for x in list1 if x[1] in zip(*list2)[1]]
EDIT 编辑
Based on the comments below, the OP adapted this answer to do what was wanted by doing 根据以下评论,OP修改了此答案以完成所需的操作
list1 = [x for x in list1 if x[1] in zip(*list2)[1]]
list2 = [x for x in list2 if x[1] in zip(*list1)[1]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.