简体   繁体   English

如何使用python查找数组中的元素列表

[英]How to find a list of elements in an array using python

I have two sets of data which both have values which refer to part of a larger set of data (points in an unstructured mesh). 我有两组数据,它们的值均引用较大数据集的一部分(非结构化网格中的点)。

The two smaller sets of data contain vectors which have the global id which references the point in the larger set of data. 两个较小的数据集包含向量,这些向量具有引用较大数据集中的点的全局ID。 Something like: 就像是:

Large set of data: 大量数据:

0 0 0
0 0 1
0 1 0
1 0 0
1 1 0
1 0 1
0 1 1
1 1 1 

Smaller sets of data: 较小的数据集:

A 一种

0 1
3 5
4 5 
6 7 
7 2 

B

0 10
4 12
7 60

The first column in the smaller sets of data is a reference to the line number in the larger set of data. 较小数据集中的第一列是对较大数据集中行号的引用。 The second column in the smaller set of data are just example data. 较小数据集中的第二列只是示例数据。

It is also worth mentioning that the first column of B is always a subset of the first column of A. 还值得一提的是,B的第一列始终是A的第一列的子集。

What I need is the row indices of A where the point ids match those in B. 我需要的是A的行索引,其中的点ID与B中的点ID相匹配。

In this case this would be: 在这种情况下,它将是:

ind = [0,2,4]

ie A[ind,0] = B[:,0] 即A [ind,0] = B [:,0]

I have managed to do this previously using a loop, but now the datasets are increasing in size to over 10 million and the loop is far too slow. 我以前使用循环设法做到了这一点,但是现在数据集的大小增加到超过1000万,循环太慢了。 Can anyone suggest any faster methods? 谁能建议任何更快的方法?

Putting the first column data of B into a set should speed things up. 将B的第一列数据放入集合中可以加快处理速度。 Assuming that A and B are lists of tuples (or lists), try this: 假设A和B是元组列表(或列表),请尝试以下操作:

>>> A
[('0', '1'), ('3', '5'), ('4', '5'), ('6', '7'), ('7', '2')]
>>> B
[('0', '10'), ('4', '12'), ('7', '60')]
>>> bkeys=set([i[0] for i in B])
>>> [i for i,v in enumerate(A) if v[0] in bkeys]
[0, 2, 4]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM