简体   繁体   中英

Replacing different characters in Python

Suppose you have a string which you want to parse into a specific format. That means: replace all ' ', '.', '-', etc with '_' .

I know that I could do this:

>s = "Hello----.....    World"
>s = s.replace('-','_').replace('.', '_').replace(' ', '_')
>print s
>Hello_____________World

And get what I want. But, is there a cleaner way? A more pythonic way? I tried parsing a list in to the first argument of replace, but that didn't work very well.

Use Regular Expressions .

Ex:

import re

s = "Hello----.....    World"
print(re.sub(r"[ .-]", "_", s))

Here is the Python tutorial .

Use re

>>> import re
>>> print re.sub(' |\.|-', '_',"Hello----.....    World")
Hello_____________World

Bonus solution not using regex:

>>> keys = [' ', '.', '-']
>>> print ''.join('_' if c in keys else c for c in "Hello----.....    World")
Hello_____________World

You can do it using str.translate and string.maketrans which will be the most efficient approach not chaining calls etc..:

In [6]: from string import maketrans

In [7]: s = "Hello----.....    World"

In [8]: table = maketrans(' .-',"___")

In [9]: print(s.translate(table))
Hello_____________World

The timings:

In [12]: %%timeit
   ....: s = "Hello----.....    World"
   ....: table = maketrans(' .-',"___")
   ....: s.translate(table)
   ....: 

1000000 loops, best of 3: 1.14 µs per loop

In [13]: timeit  s.replace('-','_').replace('.', '_').replace(' ', '_')
100000 loops, best of 3: 2.2 µs per loop
In [14]: %%timeit                                                      
text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')
   ....: 
100000 loops, best of 3: 3.51 µs per loop

In [18]: %%timeit
....: s = "Hello----.....    World"
....: re.sub(r"[ .-]", "_", s)
....: 
100000 loops, best of 3: 11 µs per loop

Even pre-compiling the pattern leaves around 10µs so the regex is by far the least efficient approach.

In [20]: patt=  re.compile(r"[ .-]")

In [21]: %%timeit            
s = "Hello----.....    World"
patt.sub( "_", s)
   ....: 
100000 loops, best of 3: 9.98 µs per loop

Pre creating the table gets us down to nanoseconds:

In [22]: %%timeit                                                      
s = "Hello----.....    World"
s.translate(table)
   ....: 

1000000 loops, best of 3: 590 ns per loop

This answer lays out a variety of different ways to accomplish this task, contrasting different functions and inputs by speed.

If you are replacing few characters, the fastest way is the way in your question, by chaining multiple replaces, with regular expressions being the slowest .

If you want to make this more 'pythonic', the best way to leverage both speed and readability , is to make a list of the characters you want to replace, and loop through them.

text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM