[英]PYTHON: What is the fastest way of checking and editing an element in a structured array if it exists?
I have some issues with very large data sets. 我有非常大的数据集的一些问题。 I need to find a solid and fast way to find/replace entries in my structured array.
我需要找到一种可靠且快速的方法来查找/替换结构化数组中的条目。 I am looking for an solution without looping of all entries.
我在寻找一种不循环所有条目的解决方案。 I know there are fast solutions for C but I do not know how to approach in python for that.
我知道有C的快速解决方案,但我不知道该如何在python中进行处理。 I am also wonder if there is a numpy function for that very purpose!
我也想知道是否为此目的有一个numpy函数!
I am using Python 2.7.13 and numpy 1.12.1! 我正在使用python 2.7.13和numpy 1.12.1!
TASK: Set all positions of the orphans to the positions of the data_centrals
by finding the haloid of the orphan from data_orphan
in the list of the centrals in data_centrals
. 任务:通过在
data_centrals
的中心列表中从data_orphan
查找孤儿的卤素 ,将孤儿的所有位置设置为data_centrals
的位置。
import numpy as np
data = Structured array:
class: ndarray
shape: (189258912,)
dt = [('hostid', '<u8'), ('z_pos', '<f8'), ('x_pos', '<f8'),
('y_pos', '<f8'), ('haloid', '<u8'), ('orphan', 'i1')]
EDITED: A subsamples of data with 200 objects can be downloaded here ! 编辑: 具有200个对象的数据子样本可在 此处 下载 ! It structure is given by dt : first column--> hostid , second --> z_pos , etc. It can be copy/pasted as it is into a python shell or script ...
它的结构是由dt给出的:第一列-> hostid ,第二个-> z_pos等。它可以直接复制/粘贴到python shell或脚本中。
Below you can find the code for setting the positions. 您可以在下面找到设置位置的代码。
QUESTION: Is there are smart way of searching for the haloids and setting the positions without looping over all entries of data_orphan
? 问题: 有没有一种聪明的方法可以搜索卤素并设置位置而不循环遍历
data_orphan
所有条目?
data_centrals=data[np.where(data['haloid']==data['hostid'])] # (111958237,)
data_orphans=data[np.where(data['orphan']==2)] # (61870681,)
a=0
while a<len(data_orphans):
#check where in data_centrals the haloid of the orphan can be found
position=np.where(data_centrals['haloid']==data_orphans['haloid'][a])
#find the position of data_orphan['haloid'][a] in data
position_data=np.where(data['hostid']==data_orphans['hostid'][a])
#set the positions
data['x_pos'][int(position_data[0])]=data_centrals['x_pos'][int(position[0])]
data['y_pos'][int(position_data[0])]=data_centrals['y_pos'][int(position[0])]
data['z_pos'][int(position_data[0])]=data_centrals['z_pos'][int(position[0])]
a+=1
If your data structure is a plain, unordered list or array then the answer is no. 如果您的数据结构是简单的无序列表或数组,那么答案是否定的。 It will take linear time O(n) to find a specific element.
查找特定元素需要线性时间O(n)。 If the list/array is ordered you can do a binary search in O(lg n) time.
如果列表/数组是有序的,则可以在O(lg n)时间内进行二进制搜索。 You may also consider alternative data structures like a balanced BST or python dictionary with better search times, but it depends on the structure of your data if such an approach is appropriate.
您也可以考虑使用其他数据结构,例如平衡的BST或python字典,以缩短搜索时间,但是如果合适的话,则取决于数据的结构。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.