简体   繁体   English

Python字符串格式化太慢

[英]Python string formatting too slow

I use the following code to log a map, it is fast when it only contains zeroes, but as soon as there is actual data in the map it becomes unbearably slow... Is there any way to do this faster? 我使用下面的代码来记录一个映射,当它仅包含零时它是快速的,但是一旦映射中有实际数据,它就会变得令人难以忍受的缓慢……有什么方法可以更快地这样做吗?

log_file = open('testfile', 'w')
for i, x in ((i, start + i * interval) for i in range(length)):
    log_file.write('%-5d %8.3f %13g %13g %13g %13g %13g %13g\n' % (i, x,
        map[0][i], map[1][i], map[2][i], map[3][i], map[4][i], map[5][i]))

I suggest you run your code using the cProfile module and postprocess the results as described on http://docs.python.org/library/profile.html . 建议您使用cProfile模块运行代码, cProfile结果进行后处理,如http://docs.python.org/library/profile.html所述 This will let you know exactly how much time is spent in the call to str.__mod__ for the string formatting and how much is spent doing other things, like writing the file and doing the __getitem__ lookups for map[0][i] and such. 这将使您确切知道调用str.__mod__进行字符串格式化所花费的时间,以及执行其他操作所花费的时间,例如编写文件以及对map[0][i]进行__getitem__查找等等。 。

First I checked % against backquoting. 首先,我检查了%是否有反引号。 % is faster. % 是比较快的。 THen I checked % (tuple) against 'string'.format(). 然后我检查了%(元组)是否与'string'.format()相对。 An initial bug made me think it was faster. 最初的错误使我认为它更快。 But no. 但不是。 % is faster. % 是比较快的。

So, you are already doing your massive pile of float-to-string conversions the fastest way you can do it in Python. 因此,您已经在以Python最快的方式进行了大量的浮点到字符串转换。

The Demo code below is ugly demo code. 下面的演示代码是丑陋的演示代码。 Please don't lecture me on xrange versus range or other pedantry. 请不要教我关于xrange与range或其他方法的比较。 KThxBye. KThxBye。

My ad-hoc and highly unscientific testing indicates that (a) % (1.234,) operations on Python 2.5 on linux is faster than % (1.234,...) operation Python 2.6 on linux, for the test code below, with the proviso that the attempt to use 'string'.format() won't work on python versions before 2.6. 我的即席且高度不科学的测试表明,对于下面的测试代码,带有(a)在Linux上的Python 2.5上执行%(1.234,)操作比在Linux上的%(1.234,...)操作上更快。尝试在2.6之前的python版本上使用'string'.format()无效。 And so on. 等等。

# this code should never be used in production.
# should work on linux and windows now.

import random
import timeit
import os
import tempfile


start = 0
interval = 0.1

amap = [] # list of lists
tmap = [] # list of tuples

def r():
    return random.random()*500

for i in xrange(0,10000):
        amap.append ( [r(),r(),r(),r(),r(),r()] )

for i in xrange(0,10000):
        tmap.append ( (r(),r(),r(),r(),r(),r()) )




def testme_percent():
    log_file = tempfile.TemporaryFile()
    try:
        for qmap in amap:
            s = '%g %g %g %g %g %g \n' % (qmap[0], qmap[1], qmap[2], qmap[3], qmap[4], qmap[5]) 
            log_file.write( s)
    finally:
        log_file.close();

def testme_tuple_percent():
    log_file = tempfile.TemporaryFile()
    try:    
        for qtup in tmap:
            s = '%g %g %g %g %g %g \n' % qtup
            log_file.write( s );
    finally:
        log_file.close();

def testme_backquotes_rule_yeah_baby():
    log_file = tempfile.TemporaryFile()
    try:
        for qmap in amap:
            s = `qmap`+'\n'
            log_file.write( s );
    finally:
        log_file.close();        

def testme_the_new_way_to_format():
    log_file = tempfile.TemporaryFile()
    try:
        for qmap in amap:
            s = '{0} {1} {2} {3} {4} {5} \n'.format(qmap[0], qmap[1], qmap[2], qmap[3], qmap[4], qmap[5]) 
            log_file.write( s );
    finally:
        log_file.close();

# python 2.5 helper
default_number = 50 
def _xtimeit(stmt="pass",  timer=timeit.default_timer,
           number=default_number):
    """quick and dirty"""
    if stmt<>"pass":
        stmtcall = stmt+"()"
        ssetup = "from __main__ import "+stmt
    else:
        stmtcall = stmt
        ssetup = "pass"
    t = timeit.Timer(stmtcall,setup=ssetup)
    try:
      return t.timeit(number)
    except:
      t.print_exc()


# no formatting operation in testme2

print "now timing variations on a theme"

#times = []
#for i in range(0,10):

n0 = _xtimeit( "pass",number=50)
print "pass = ",n0

n1 = _xtimeit( "testme_percent",number=50);
print "old style % formatting=",n1

n2 = _xtimeit( "testme_tuple_percent",number=50);
print "old style % formatting with tuples=",n2

n3 = _xtimeit( "testme_backquotes_rule_yeah_baby",number=50);
print "backquotes=",n3

n4 = _xtimeit( "testme_the_new_way_to_format",number=50);
print "new str.format conversion=",n4


#        times.append( n);




print "done"    

I think you could optimize your code by building your TUPLES of floats somewhere else, wherever you built that map, in the first place, build your tuple list, and then applying the fmt_string % tuple this way: 我认为您可以通过在其他任何地方构建浮点数TUPLES来优化您的代码,无论您在哪个地方构建该地图,首先构建元组列表,然后以这种方式应用fmt_string%元组:

for tup in mytups:
    log_file.write( fmt_str % tup )

I was able to shave the 8.7 seconds down to 8.5 seconds by dropping the making-a-tuple part out of the for loop. 通过将for-tuple部分从for循环中删除,我能够将8.7秒减少到8.5秒。 Which ain't much. 没什么。 The big boy there is floating point formatting, which I believe is always going to be expensive. 那里的大个子有浮点格式,我相信这总是很昂贵的。

Alternative: 选择:

Have you considered NOT writing such huge logs as text, and instead, saving them using the fastest "persistence" method available, and then writing a short utility to dump them to text, when needed? 您是否考虑过不编写诸如文本之类的庞大日志,而是使用可用的最快“持久性”方法保存它们,然后编写一个简短的实用程序以在需要时将其转储为文本? Some people use NumPy with very large numeric data sets, and it does not seem they would use a line-by-line dump to store their stuff. 有些人将NumPy与非常大的数字数据集一起使用,而且似乎不愿意使用逐行转储来存储其内容。 See: 看到:

http://thsant.blogspot.com/2007/11/saving-numpy-arrays-which-is-fastest.html http://thsant.blogspot.com/2007/11/saving-numpy-arrays-which-is-fastest.html

Without wishing to wade into the optimize-this-code morass, I would have written the code more like this: 不希望涉入优化此代码的麻烦,我会像这样编写代码:

log_file = open('testfile', 'w')
x = start
map_iter = zip(range(length), map[0], map[1], map[2], map[3], map[4], map[5])
fmt = '%-5d %8.3f %13g %13g %13g %13g %13g %13g\n'
for i, m0, m1, m2, m3, m4, m5 in mapiter:
    s = fmt % (i, x, m0, m1, m2, m3, m4, m5)
    log_file.write(s)
    x += interval

But I will weigh in with the recommendation that you not name variables after python builtins, like map . 但我会建议不要在python内置函数之后给变量命名,例如map

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM