Python：通过大文件迭代这个的最快方法

Question

Right, I'm iterating through a large binary file 是的，我正在迭代一个大的二进制文件

I need to minimise the time of this loop: 我需要最小化这个循环的时间：

def NB2(self, ID_LEN):
    r1=np.fromfile(ReadFile.fid,dTypes.NB_HDR,1)
    num_receivers=r1[0][0]
    num_channels=r1[0][1]
    num_samples=r1[0][5]

    blockReturn = np.zeros((num_samples,num_receivers,num_channels))

    for rec in range(0,num_receivers):
        for chl in range(0,num_channels):
            for smpl in range(0,num_samples):
                r2_iq=np.fromfile(ReadFile.fid,np.int16,2)
                blockReturn[smpl,rec,chl] = np.sqrt(math.fabs(r2_iq[0])*math.fabs(r2_iq[0]) + math.fabs(r2_iq[1])*math.fabs(r2_iq[1]))

    return blockReturn

So, what's going on is as follows: r1 is the header of the file, dTypes.NB_HDR is a type I made: 所以，发生的事情如下：r1是文件的头部，dTypes.NB_HDR是我制作的类型：

NB_HDR= np.dtype([('f3',np.uint32),('f4',np.uint32),('f5',np.uint32),('f6',np.int32),('f7',np.int32),('f8',np.uint32)])

That gets all the information about the forthcoming data block, and nicely puts us in the right position within the file (the start of the data block!). 这将获得有关即将到来的数据块的所有信息，并很好地将我们放在文件中的正确位置（数据块的开头！）。

In this data block there is: 4096 samples per channel, 4 channels per receiver, 9 receivers. 在该数据块中，每个通道有4096个采样，每个接收器有4个通道，9个接收器。

So num_receivers, num_channels, num_samples will always be the same (at the moment anyway), but as you can see this is a fairly large amount of data. 所以num_receivers，num_channels，num_samples将始终是相同的（此时此刻），但正如您所看到的，这是一个相当大量的数据。 Each 'sample' is a pair of int16 values that I want to find the magnitude of (hence Pythagoras). 每个'样本'是一对int16值，我想找到它的大小（因此毕达哥拉斯）。

This NB2 code is executed for each 'Block' in the file, for a 12GB file (which is how big they are) there are about 20,900 Blocks, and I've got to iterate through 1000 of these files (so, 12TB overall). 这个NB2代码是针对文件中的每个'Block'执行的，对于一个12GB的文件（它有多大），大约有20,900个块，我必须遍历1000个这样的文件（所以，总共12TB）。 Any speed advantage even it's it's milliseconds would be massively appreciated. 任何速度优势，即使它是毫秒，将大受赞赏。

EDIT: Actually it might be of help to know how I'm moving around inside the file. 编辑：实际上，知道我如何在文件中移动可能会有所帮助。 I have a function as follows: 我有如下功能：

def navigateTo(self, blockNum, indexNum):
    ReadFile.fid.seek(ReadFile.fileIndex[blockNum][indexNum],0)
    ReadFile.currentBlock = blockNum
    ReadFile.index = indexNum

Before I run all this code I scan the file and make a list of index locations at ReadFile.fileIndex that I browse using this function and then 'seek' to the absolute location - is this efficient? 在我运行所有这些代码之前，我扫描文件并在ReadFile.fileIndex中创建索引位置列表，我使用此函数浏览，然后“搜索”到绝对位置 - 这是否有效？

Cheers 干杯

Answer 1

import numpy as np
def NB2(self, ID_LEN):
    r1=np.fromfile(ReadFile.fid,dTypes.NB_HDR,1)
    num_receivers=r1[0][0]
    num_channels=r1[0][1]
    num_samples=r1[0][5]

    # first, match your array bounds to the way you are walking the file
    blockReturn = np.zeros((num_receivers,num_channels,num_samples))

    for rec in range(0,num_receivers):
        for chl in range(0,num_channels):
            # second, read in all the samples at once if you have enough memory
            r2_iq=np.fromfile(ReadFile.fid,np.int16,2*num_samples)
            r2_iq.shape = (-1,2) # tell numpy that it is an array of two values

            # create dot product vector by squaring data elementwise, and then
            # adding those elements together.  Results is of length num_samples
            r2_iq = r2_iq * r2_iq
            r2_iq = r2_iq[:,0] + r2_iq[:,1]
            # get the distance by performing the square root "into" blockReturn
            np.sqrt(r2_iq, out=blockReturn[rec,chl,:])

    return blockReturn

This should help your performance. 这应该有助于您的表现。 Two main ideas in numpy work. numpy工作中的两个主要思想。 First, your result arrays dimensions should match how your loop dimensions are crafted, for memory locality. 首先，对于内存位置，结果数组维度应与制作循环维度的方式相匹配。
Second, Numpy is FAST . 其次，NumPy的快。 I've beaten hand coded C with numpy, simply because it uses LAPack and vector acceleration. 我用numpy手动编码C，仅仅是因为它使用了LAPack和矢量加速。 However to get that power, you have to let it manipulate more data at a time. 但是要获得这种能力，你必须让它一次操纵更多的数据。 That is why your sample loop has been collapsed to read in the full sample for the receiver and channel in one large read. 这就是为什么你的样本循环已被折叠，以便在一次大读取中读取接收器和通道的完整样本。 Then use the supreme vector powers of numpy to calculate your magnitude by dot product. 然后使用numpy的最高向量幂来计算你的数量。

There is a little more optimization to be had in the magnitude calculation, but numpy recycles buffers for you, making it less important than you might think. 在量值计算中还有一些优化，但是numpy可以为您重温缓冲，使其不如您想象的那么重要。 I hope this helps! 我希望这有帮助！

Answer 2

Because you know the length of a block after you read the header, read the whole block at once. 因为您在读取标题后知道块的长度，所以请立即读取整个块。 Then reshape the array (very fast, only affects metadata) and take use the np.hypot ufunc: 然后重塑数组（非常快，只影响元数据）并使用np.hypot ：

blockData = np.fromfile(ReadFile.fid, np.int16, num_receivers*num_channels*num_samples*2)
blockData = blockData.reshape((num_receivers, num_channes, num_samples, 2))
return np.hypot(blockData[:,:,:,0], blockData[:,:,:,1])

On my machine it runs in 11ms per block. 在我的机器上，它每块运行11毫秒。

Answer 3

I'd try to use as few loops and as much constants as possible. 我尝试使用尽可能少的循环和尽可能多的常量。 Everything that can be done in a linear fashion should be done so. 应该以线性方式完成所有工作。 If values don't change, use constants to reduce lookups and such, because that eats up cpu cycles. 如果值没有改变，使用常量来减少查找等，因为这会占用cpu周期。

This is from a theoretical point of view ;-) 这是从理论的角度来看;-)

If possible use highly optimised libraries. 如果可能，使用高度优化的库。 I don't exaclty know what you are trying to achieve but i'd rather use an existing FFT-Lib than writing it myself :> 我并不知道你想要实现什么，但我宁愿使用现有的FFT-Lib而不是自己编写：>

One more thing: http://en.wikipedia.org/wiki/Big_O_notation (can be an eye-opener) 还有一件事： http ： //en.wikipedia.org/wiki/Big_O_notation （可以大开眼界）

Answer 4

Most importantly, you shouldn't do file access at the lowest level of a triple nested loop, whether you do this in C or Python. 最重要的是，无论是在C语言还是Python中，都不应该在三重嵌套循环的最低级别进行文件访问。 You've got to read in large chunks of data at a time. 您必须一次读取大量数据。

So to speed this up, read in large chunks of data at a time, and process that data using numpy indexing (that is, vectorize your code). 因此，为了加快速度，一次读取大块数据，并使用numpy索引处理该数据（即矢量化代码）。 This is particularly easy in your case since all your data is int32. 这在您的情况下特别容易，因为您的所有数据都是int32。 Just read in big chunks of data, and reshape the data into an array that reflects the (receiver, channel, sample) structure, and then use the appropriate indexing to multiply and add things for Pythagoras, and the 'sum' command to add up the terms in the resulting array. 只需读入大块数据，然后将数据重新整形为反映（接收器，通道，样本）结构的数组，然后使用适当的索引来为Pythagoras增加和添加内容，并使用'sum'命令加总结果数组中的术语。

Answer 5

这是一个观察而不是解决方案，但是将该函数移植到C ++并使用Python API加载它将在循环优化之前开始获得大量的速度增益。

Python：通过大文件迭代这个的最快方法

问题描述

5 个解决方案

解决方案1
3 2010-02-23 18:26:42

解决方案2
3 已采纳 2010-02-23 18:34:25

解决方案3
1 2010-02-23 17:13:33

解决方案4
1 2010-02-23 18:32:03

解决方案5
0 2010-02-23 16:50:10

Python：通过大文件迭代这个的最快方法

问题描述

5 个解决方案

解决方案1 3 2010-02-23 18:26:42

解决方案2 3 已采纳 2010-02-23 18:34:25

解决方案3 1 2010-02-23 17:13:33

解决方案4 1 2010-02-23 18:32:03

解决方案5 0 2010-02-23 16:50:10

解决方案1
3 2010-02-23 18:26:42

解决方案2
3 已采纳 2010-02-23 18:34:25

解决方案3
1 2010-02-23 17:13:33

解决方案4
1 2010-02-23 18:32:03

解决方案5
0 2010-02-23 16:50:10