简体   繁体   中英

Python str.translate VS str.replace

Why in Python replace is ~1.5x quicker than translate ?

In [188]: s = '1 a  2'

In [189]: s.replace(' ','')
Out[189]: '1a2'

In [190]: s.translate(None,' ')
Out[190]: '1a2'

In [191]: %timeit s.replace(' ','')
1000000 loops, best of 3: 399 ns per loop

In [192]: %timeit s.translate(None,' ')
1000000 loops, best of 3: 614 ns per loop

Assuming Python 2.7 (because I had to flip a coin without it being stated), we can find the source code for string.translate and string.replace in string.py :

>>> import inspect
>>> import string
>>> inspect.getsourcefile(string.translate)
'/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py'
>>> inspect.getsourcefile(string.replace)
'/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py'
>>>

Oh, we can't, as string.py starts with:

"""A collection of string operations (most are no longer used).

Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object.

I upvoted you because you started down the path of profiling, so let's continue down that thread:

from cProfile import run
from string import ascii_letters

s = '1 a  2'

def _replace():
    for x in range(5000000):
        s.replace(' ', '')

def _translate():
    for x in range(5000000):    
        s.translate(None, ' ')

for replace:

run("_replace()")
         5000004 function calls in 2.059 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.976    0.976    2.059    2.059 <ipython-input-3-9253b3223cde>:8(_replace)
        1    0.000    0.000    2.059    2.059 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5000000    1.033    0.000    1.033    0.000 {method 'replace' of 'str' objects}
        1    0.050    0.050    0.050    0.050 {range}

and for translate:

run("_translate()")

         5000004 function calls in 1.785 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.977    0.977    1.785    1.785 <ipython-input-3-9253b3223cde>:12(_translate)
        1    0.000    0.000    1.785    1.785 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5000000    0.756    0.000    0.756    0.000 {method 'translate' of 'str' objects}
        1    0.052    0.052    0.052    0.052 {range}

our number of function calls are the same, not that more function calls means that a run will be slower, but it's typically a good place to look. What's fun is that translate ran faster on my machine than replace ! Consider that the fun of not testing changes in isolation -- not that it matters because we only are concerned with being able to tell why there could be a difference.

In any case, we at least now know that there's maybe a performance difference and it does exist when evaluating the string object's method (see tottime ). The translate __docstring__ suggests that there's a translation table in play, while the replace only mentions old-to-new-substring replacement.

Let's turn to our old buddy dis for hints:

from dis import dis

replace:

def dis_replace():
    '1 a  2'.replace(' ', '')

dis(dis_replace)


dis("'1 a  2'.replace(' ', '')")

  3           0 LOAD_CONST               1 ('1 a  2')
              3 LOAD_ATTR                0 (replace)
              6 LOAD_CONST               2 (' ')
              9 LOAD_CONST               3 ('')
             12 CALL_FUNCTION            2
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        

and translate , which ran faster for me:

def dis_translate():
    '1 a  2'.translate(None, ' ')
dis(dis_translate)    


  2           0 LOAD_CONST               1 ('1 a  2')
              3 LOAD_ATTR                0 (translate)
              6 LOAD_CONST               0 (None)
              9 LOAD_CONST               2 (' ')
             12 CALL_FUNCTION            2
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        

unfortunately, the two look identical to dis , which means that we should start looking to the string's C source here (found by going to the python source code for the version of Python I'm using right now)]( https://hg.python.org/cpython/file/a887ce8611d2/Objects/stringobject.c ).

Here's the source for translate .
If you go through the comments, you can see that there are multiple replace function definition lines, based on the length of the input.

Our options for substring replacement are:

replace_substring_in_place

/* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring_in_place(PyStringObject *self,

and replace_substring :

/* len(self)>=1, len(from)>=2, len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring(PyStringObject *self,

and replace_delete_single_character :

/* Special case for deleting a single character */
/* len(self)>=1, len(from)==1, to="", maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_delete_single_character(PyStringObject *self,
                                char from_c, Py_ssize_t maxcount)

'1 a 2'.replace(' ', '') is a len(self)==6, replacing 1 char with an empty string, making it a replace_delete_single_character .

You can check out the function body for yourself, but the answer is "the C function body runs faster in replace_delete_single_character than string_translate for this specific input.

Thank you for asking this question.

translate will likely be faster as N and M increase, where N is the number of unique character replacement maps, and M is the length of the string that is being translated.

import random
import string
import timeit
import re

def do_translation(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    trans_tab = string.maketrans(*map("".join,trans_map))
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    return s.translate(trans_tab)

def do_resub(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    trans_tab = dict(zip(*trans_map))
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    return re.sub("([%s])"%("".join(trans_map[0]),),lambda m:trans_tab.get(m.group(0),m.group(0)),s)

def do_replace(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    for k,v in zip(*trans_map):
       s = s.replace(k,v)
    return s


data = {}
for i in range(2,20,2):
    for j in range(10,200,10):
        data[(i,j)] = {
            "translate":timeit.timeit("do_translation(%s,%s)"%(i,j),"from __main__ import do_translation,string,random",number=100),
            "re.sub":timeit.timeit("do_resub(%s,%s)"%(i,j),"from __main__ import do_resub,re,random",number=100),
            "replace":timeit.timeit("do_replace(%s,%s)"%(i,j),"from __main__ import do_replace,random",number=100)}

print data

will show you several different timings ... including that translate can be faster in several of these cases (I considered adding some plots here ... but ive already invested more time in this question than I really should have :P)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM