简体   繁体   English

Python 算法加速/性能提示

[英]Python algorithm speed up / Perfomance tips

I'm working with big file manipulation (over 2Gb) and I have a lot of processing functions to deal with the data.我正在处理大文件操作(超过 2Gb)并且我有很多处理功能来处理数据。 My problem is that it is taking a lot (A LOT) of time to finish the processing.我的问题是完成处理需要很多(很多)时间。 From all function the one that seems to take longer is this one:从所有功能来看,似乎需要更长的时间是这个:

 def BinLsb(data):
        Len = len(data)
        databin = [0] * (int(Len))
        num_of_bits = 8
        ###convert to bin the octets and LSB first
        for i in range(Len):
            newdatabin = bin(int(data[i], 16))[2:].zfill(num_of_bits)[::-1]
            databin[i] = newdatabin
        ###group the 14bit and LSB again
        databin = ''.join(databin)
        composite_list = [databin[x:x + 14] for x in range(0, len(databin), 14)]
        LenComp = len(composite_list)
        for i in range(LenComp):
            composite_list[i] = (int(str(composite_list[i])[::-1], 2))
        return composite_list

I'd really appreciate some performance tips / another approach to this algorithm in order to save me some time.我真的很感激一些性能技巧/这个算法的另一种方法,以便为我节省一些时间。 Thanks in advance!提前致谢!

You can hunt performance issues by profiling the software, but you'll probably be well-served by using logic which takes advantage of a faster language wrapped by Python.您可以通过分析软件来寻找性能问题,但是通过使用利用 Python 封装的更快语言的逻辑,您可能会得到很好的服务。 This could look like using a scientific library like numpy , using some FFI (foreign function interface) , or creating and calling a custom program.这看起来像是使用像numpy这样的科学库,使用一些FFI(外部函数接口) ,或者创建和调用自定义程序。

More specifically, Python is natively very slow in computing terms, as each operation carries a lot of baggage with it (such as the infamous GIL ).更具体地说,Python 在计算方面本身就很慢,因为每个操作都带有很多包袱(例如臭名昭​​著的GIL )。 Passing this work off to another language lets you pay this overhead cost less often, rather than at every possible point in every loop!将这项工作交给另一种语言可以让您减少支付这种开销成本,而不是在每个循环的每个可能的点上!

Scientific libraries can do this for you by at least科学图书馆至少可以为您做到这一点

  • behaving like Python logic (which is friendly for you!) while doing many known steps per action (rather than one at a time)表现得像 Python 逻辑(这对你很友好!)同时每个动作执行许多已知步骤(而不是一次一个)
  • may be able to vectorize operations (takes more advantage of processing time by performing many actions in the same processor step )可能能够矢量化操作(通过在同一处理器步骤中执行许多操作来更多地利用处理时间)

basic analysis of your function: time complexity: 3 O(n) space complexity: 3 O(n).函数的基本分析:时间复杂度:3 O(n) 空间复杂度:3 O(n)。 because your loop 3 times;因为你循环了 3 次; my suggestion is loop once, use generator, which will cost 1/3 of time and space.我的建议是循环一次,使用生成器,这将花费 1/3 的时间和空间。

I upgraded your code and remove some useless variable using a generator:我升级了您的代码并使用生成器删除了一些无用的变量:

def binLsb(data):
    databin = ""
    num_of_bits = 8
    for i in range(len(data)):
        newdatabin = bin(int(data[i], 16))[2:].zfill(num_of_bits)[::-1]
        while len(str(databin)) > 14:
            yield (int(str(databin[:14])[::-1], 2))
            databin = databin[14:]
        databin += str(newdatabin)

enjoy请享用

Oliver奥利弗

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM