简体   繁体   English

优化Python代码以将字符串列表转换为整数和浮点数

[英]Optimizing Python code for converting list of strings to integers and floats

I'm trying to optimize my Python 2.7.x code. 我正在尝试优化我的Python 2.7.x代码。 I'm going to perform one operation inside a for loop, possibly millions of times, so I want it to be as quick as possible. 我将在for循环中执行一次操作,可能要进行数百万次,所以我希望它尽可能快。

My operation is taking a list of 10 strings and converting them to 2 integers followed by 8 floats. 我的操作是获取10个字符串的列表,并将其转换为2个整数,然后转换为8个浮点数。

Here is a MWE of my attempts: 这是我的尝试的MWE:

    import timeit

    words = ["1"] * 10

    start_time = timeit.default_timer()
    for ii in range(1000000):
        values = map(float, words)
        values[0] = int(values[0])
        values[1] = int(values[1])
    print "1", timeit.default_timer() - start_time

    start_time = timeit.default_timer()
    for ii in range(1000000):
        values = map(int, words[:2]) + map(float, words[2:])
    print "2", timeit.default_timer() - start_time

    start_time = timeit.default_timer()
    local_map = map
    for ii in range(1000000):
        values = local_map(float, words)
        values[0] = int(values[0])
        values[1] = int(values[1])
    print "3", timeit.default_timer() - start_time

    1 2.86574220657
    2 3.83825802803
    3 2.86320781708

The first block of code is the fastest I've managed. 第一块代码是我管理的最快的代码。 The map function seems much quicker than using list comprehension. map功能似乎比使用列表理解要快得多。 But there's still some redundancy because I map everything to a float, then change the first two items to integers. 但是仍然存在一些冗余,因为我将所有内容都映射到一个浮点数,然后将前两项更改为整数。

Is there anything quicker than my code? 有什么比我的代码快的吗?

Why doesn't making the map function local, local_map = map , improve the speed in the third block of code? 为什么不将map函数local_map = map local, local_map = map ,却提高了第三段代码的速度?

I haven't found anything faster, but your fastest code is actually going to be wrong in some cases. 我还没有找到更快的方法,但是在某些情况下,您最快的代码实际上是错误的。 Problem is, Python float (which is a C double) has limited precision, for values beyond 2 ** 53 (IIRC; might be off by one on bit count), it can't represent all integer values. 问题是,Python float (C双精度)的精度有限,对于超过2 ** 53 (IIRC;可能在位计数上相差一个)的值,它不能表示所有整数值。 By contrast, Python int is arbitrary precision; 相反,Python int是任意精度的。 if you have the memory, it can represent effectively infinite values. 如果有内存,它可以有效表示无限值。

You'd want to change: 您想要更改:

values[0] = int(values[0])
values[1] = int(values[1])

to: 至:

values[0] = int(words[0])
values[1] = int(words[1])

to avoid that. 避免这种情况。 The reparsing would make this more dependent on the length of the string being parsed (because converting multiple times costs more for longer inputs). 重新解析将使其更多地取决于要解析的字符串的长度(因为多次转换对于较长的输入而言会花费更多)。

An alternative that at least on my Python (3.5) works fairly fast is to preconstruct the set of converters so you can call the correct function directly. 至少在我的Python(3.5)上运行相当快的另一种方法是预先构造一组转换器,以便您可以直接调用正确的函数。 For example: 例如:

words = ["1"] * 10
converters = (int,) * 2 + (float,) * 8

values = [f(v) for f, v in zip(converters, words)]

You want to test with both versions of zip to see if the list generating version of the generator based itertools.izip is faster (for short inputs like these, I really can't say). 您想测试这两个版本的zip以查看基于itertools.izip的生成器的list生成版本是否更快(对于像这样的简短输入,我真的不能说)。 In Python 3.5 (where zip is always a generator like Py2's itertools.izip ) this took about 10% longer than your fastest solution for the same inputs (I used min() of a timeit.repeat run rather than the hand-rolled version you used); 在Python 3.5( zip始终像Py2的itertools.izip这样的生成器)中,对于相同的输入(我使用timeit.repeat运行的min()而不是您手动滚动的版本),这比最快的解决方案花费了大约10%的时间用过的); it might do better if the inputs are larger (and therefore parsing twice would be more expensive). 如果输入较大,则可能会更好(因此,两次解析将更加昂贵)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM