简体   繁体   English


[英]PYTHON: What is the fastest way of checking and editing an element in a structured array if it exists?

I have some issues with very large data sets. 我有非常大的数据集的一些问题。 I need to find a solid and fast way to find/replace entries in my structured array. 我需要找到一种可靠且快速的方法来查找/替换结构化数组中的条目。 I am looking for an solution without looping of all entries. 我在寻找一种不循环所有条目的解决方案。 I know there are fast solutions for C but I do not know how to approach in python for that. 我知道有C的快速解决方案,但我不知道该如何在python中进行处理。 I am also wonder if there is a numpy function for that very purpose! 我也想知道是否为此目的有一个numpy函数!

I am using Python 2.7.13 and numpy 1.12.1! 我正在使用python 2.7.13和numpy 1.12.1!

TASK: Set all positions of the orphans to the positions of the data_centrals by finding the haloid of the orphan from data_orphan in the list of the centrals in data_centrals . 任务:通过在data_centrals中心列表中从data_orphan查找孤儿卤素 ,将孤儿的所有位置设置为data_centrals的位置。

import numpy as np

data =  Structured array:
    class:  ndarray
    shape:  (189258912,)

dt = [('hostid', '<u8'), ('z_pos', '<f8'), ('x_pos', '<f8'),
     ('y_pos', '<f8'), ('haloid', '<u8'), ('orphan', 'i1')]

EDITED: A subsamples of data with 200 objects can be downloaded here ! 编辑: 具有200个对象的数据子样本可在 此处 下载 It structure is given by dt : first column--> hostid , second --> z_pos , etc. It can be copy/pasted as it is into a python shell or script ... 它的结构是由dt给出的:第一列-> hostid ,第二个-> z_pos等。它可以直接复制/粘贴到python shell或脚本中。

Below you can find the code for setting the positions. 您可以在下面找到设置位置的代码。

QUESTION: Is there are smart way of searching for the haloids and setting the positions without looping over all entries of data_orphan ? 问题: 有没有一种聪明的方法可以搜索卤素并设置位置而不循环遍历data_orphan所有条目?

data_centrals=data[np.where(data['haloid']==data['hostid'])] # (111958237,)

data_orphans=data[np.where(data['orphan']==2)]               # (61870681,)

while a<len(data_orphans):

    #check where in data_centrals the haloid of the orphan can be found

    #find the position of data_orphan['haloid'][a] in data

    #set the positions


If your data structure is a plain, unordered list or array then the answer is no. 如果您的数据结构是简单的无序列表或数组,那么答案是否定的。 It will take linear time O(n) to find a specific element. 查找特定元素需要线性时间O(n)。 If the list/array is ordered you can do a binary search in O(lg n) time. 如果列表/数组是有序的,则可以在O(lg n)时间内进行二进制搜索。 You may also consider alternative data structures like a balanced BST or python dictionary with better search times, but it depends on the structure of your data if such an approach is appropriate. 您也可以考虑使用其他数据结构,例如平衡的BST或python字典,以缩短搜索时间,但是如果合适的话,则取决于数据的结构。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM