简体   繁体   English

将 append 串入 python 的正确方法

[英]Correct way to append to string in python

I've read this reply which explains that CPython has an optimization to do an in-place append without copy when appending to a string using a = a + b or a += b .我读过这个回复,它解释了 CPython 有一个优化,可以在使用a = a + ba += b附加到字符串时执行就地 append 而无需复制。 I've also read this PEP8 recommendation:我还阅读了此 PEP8 推荐:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).代码的编写方式应该不会对 Python 的其他实现(PyPy、Jython、IronPython、Cython、Psyco 等)产生不利影响。 For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b.例如,不要依赖 CPython 对 a += b 或 a = a + b 形式的语句的就地字符串连接的高效实现。 This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting.这种优化即使在 CPython 中也是脆弱的(它只适用于某些类型)并且在不使用引用计数的实现中根本不存在。 In performance sensitive parts of the library, the ''.join() form should be used instead.在库的性能敏感部分,应该使用 ''.join() 形式。 This will ensure that concatenation occurs in linear time across various implementations.这将确保跨各种实现在线性时间内发生串联。

So if I understand correctly, instead of doing a += b + c in order to trigger this CPython optimization which does the replacement in-place, the proper way is to call a = ''.join([a, b, c]) ?因此,如果我理解正确,而不是执行a += b + c来触发此 CPython 优化,即就地进行替换,正确的方法是调用a = ''.join([a, b, c])

But then why is this form with join significantly slower than the form in += in this example (In loop1 I'm using a = a + b + c on purpose in order to not trigger the CPython optimization)?但是为什么在这个例子中这个带有join的形式比+=中的形式慢得多(在 loop1 中我故意使用a = a + b + c以便不触发 CPython 优化)?

import os
import time

if __name__ == "__main__":
    start_time = time.time()
    print("begin: %s " % (start_time))
    s = ""
    for i in range(100000):
        s = s + str(i) + '3'
    time1 = time.time()
    print("end loop1: %s " % (time1 - start_time))

    s2 = ""
    for i in range(100000):
        s2 += str(i) + '3'

    time2 = time.time()
    print("end loop2: %s " % (time2 - time1))

    s3 = ""
    for i in range(100000):
        s3 = ''.join([s3, str(i), '3'])

    time3 = time.time()
    print("end loop3: %s " % (time3 - time2))

The results show join is significantly slower in this case:结果显示join在这种情况下明显变慢:

~/testdir$ python --version
Python 3.10.6
~/testdir$ python concatenate.py 
begin: 1675268345.0761461 
end loop1: 3.9019 
end loop2: 0.0260 
end loop3: 0.9289 

Is my version with join wrong?我的join版本错了吗?

In "loop3" you bypass a lot of the gain of join() by continuously calling it in an unneeded way.在“loop3”中,您通过以不需要的方式不断调用它来绕过join()的很多好处。 It would be better to build up the full list of characters then join() once.最好构建完整的字符列表,然后join()一次。

Check out:查看:

import time

iterations = 100_000

##----------------
s = ""
start_time = time.time()
for i in range(iterations):
    s = s + "." + '3'
end_time = time.time()
print("end loop1: %s " % (end_time - start_time))
##----------------

##----------------
s = ""
start_time = time.time()
for i in range(iterations):
    s += "." + '3'
end_time = time.time()
print("end loop2: %s " % (end_time - start_time))
##----------------

##----------------
s = ""
start_time = time.time()
for i in range(iterations):
    s = ''.join([s, ".", '3'])
end_time = time.time()
print("end loop3: %s " % (end_time - start_time))
##----------------

##----------------
s = []
start_time = time.time()
for i in range(iterations):
    s.append(".")
    s.append("3")
s = "".join(s)
end_time = time.time()
print("end loop4: %s " % (end_time - start_time))
##----------------

##----------------
s = []
start_time = time.time()
for i in range(iterations):
    s.extend((".", "3"))
s = "".join(s)
end_time = time.time()
print("end loop5: %s " % (end_time - start_time))
##----------------

Just to be clear, you can run this with:需要说明的是,您可以使用以下命令运行它:

iterations = 10_000_000

If you like, just be sure to remove "loop1" and "loop3" as they get dramatically slower after about 300k.如果您愿意,请务必删除“loop1”和“loop3”,因为它们在大约 300k 后会变得非常慢。

When I run this with 10 million iterations I see:当我用 1000 万次迭代运行它时,我看到:

end loop2: 16.977502584457397 
end loop4: 1.6301295757293701 
end loop5: 1.0435805320739746

So, clearly there is a way to use join() that is fast:-)所以,显然有一种方法可以快速使用join() :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM