简体   繁体   中英

Why does "".join() appear to be slower than +=

Despite this question Why is ''.join() faster than += in Python? and it's answers and this great explanation of the code behind the curtain: https://paolobernardi.wordpress.com/2012/11/06/python-string-concatenation-vs-list-join/
My tests suggest otherwise and I am baffled.
Am I doing something simple, incorrectly? I'll admit that I'm fudging the creation of xa bit but I don't see how that would affect the outcome.

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
    y+=x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524757.681939, 1473524757.68521, '=', 0.0032711029052734375)

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
    y=y+x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524814.544177, 1473524814.547544, '=', 0.0033669471740722656)

import time
x=10000*"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524861.949515, 1473524861.978755, '=', 0.029239892959594727)

As can be seen the "".join() is much slower and yet we're told that it's meant to be quicker.
These values are very similar in both python2.7 and python3.4

Edit: Ok fair enough.

The "one huge string" thing is the kicker.

import time
x=[]
for i in range(10000):
    x.append("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473526344.55748, 1473526344.558409, '=', 0.0009288787841796875)

An order of magnitude quicker. Mea Culpa!

You called ''.join() on one huge string , not a list (multiplying a string produces a larger string). This forces str.join() to iterate over that huge string, joining 74k individual 'x' characters . In other words, your second test does 74 times more work than your first.

To conduct a fair trial, you need to start with the same inputs for both, and use the timeit module to reduce the influence of garbage collection and other processes on your system.

That means both approaches need to work from a list of strings (your assignment examples rely on repeatedly adding a string literal, stored as a constant):

from timeit import timeit

testlist = ['x' * 74 for _ in range(100)]

def strjoin(testlist):
    return ''.join(testlist)

def inplace(testlist):
    result = ''
    for element in testlist:
        result += element
    return result

def concat(testlist):
    result = ''
    for element in testlist:
        result = result + element
    return result

for f in (strjoin, inplace, concat):
    timing = timeit('f(testlist)', 'from __main__ import f, testlist',
                    number=100000)
    print('{:>7}: {}'.format(f.__name__, timing))

On my Macbook Pro, on Python 3.5, this produces:

strjoin: 0.09923043003072962
inplace: 1.0032496969797648
 concat: 1.0027298880158924

On 2.7, I get:

strjoin: 0.118290185928
inplace: 0.85814499855
 concat: 0.867822885513

str.join() is still the winner here.

You are not comparing the same operation because your first operation added the long string every iteration while join added every item of the string seperatly. (See also @MartijnPieters answer)

If I run a comparison I get completly different timings suggesting that str.join is much faster:

x = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

def join_inplace_add(y, x, num):
    for _ in range(num):
        y += x
    return y

def join_by_join(x, num):
    return ''.join([x for _ in range(num)])

%timeit join_by_join('', x, 1000)
# 10000 loops, best of 3: 91 µs per loop
%timeit join_inplace_add(x, 1000)
# 1000 loops, best of 3: 325 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM