简体   繁体   English

替换Python中的不同字符

[英]Replacing different characters in Python

Suppose you have a string which you want to parse into a specific format. 假设您有一个要解析为特定格式的字符串。 That means: replace all ' ', '.', '-', etc with '_' . 这意味着:将所有 ' ', '.', '-', etc with '_'

I know that I could do this: 我知道我可以这样做:

>s = "Hello----.....    World"
>s = s.replace('-','_').replace('.', '_').replace(' ', '_')
>print s
>Hello_____________World

And get what I want. 并得到我想要的。 But, is there a cleaner way? 但是,有没有更清洁的方法? A more pythonic way? 一种更pythonic方式? I tried parsing a list in to the first argument of replace, but that didn't work very well. 我尝试将列表解析为replace的第一个参数,但是效果不佳。

Use Regular Expressions . 使用正则表达式

Ex: 例如:

import re

s = "Hello----.....    World"
print(re.sub(r"[ .-]", "_", s))

Here is the Python tutorial . 这是Python教程

Use re 使用re

>>> import re
>>> print re.sub(' |\.|-', '_',"Hello----.....    World")
Hello_____________World

Bonus solution not using regex: 使用正则表达式的奖励解决方案:

>>> keys = [' ', '.', '-']
>>> print ''.join('_' if c in keys else c for c in "Hello----.....    World")
Hello_____________World

You can do it using str.translate and string.maketrans which will be the most efficient approach not chaining calls etc..: 您可以使用str.translatestring.maketrans做到这一点 ,这将是不链接调用等的最有效方法。

In [6]: from string import maketrans

In [7]: s = "Hello----.....    World"

In [8]: table = maketrans(' .-',"___")

In [9]: print(s.translate(table))
Hello_____________World

The timings: 时间:

In [12]: %%timeit
   ....: s = "Hello----.....    World"
   ....: table = maketrans(' .-',"___")
   ....: s.translate(table)
   ....: 

1000000 loops, best of 3: 1.14 µs per loop

In [13]: timeit  s.replace('-','_').replace('.', '_').replace(' ', '_')
100000 loops, best of 3: 2.2 µs per loop
In [14]: %%timeit                                                      
text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')
   ....: 
100000 loops, best of 3: 3.51 µs per loop

In [18]: %%timeit
....: s = "Hello----.....    World"
....: re.sub(r"[ .-]", "_", s)
....: 
100000 loops, best of 3: 11 µs per loop

Even pre-compiling the pattern leaves around 10µs so the regex is by far the least efficient approach. 即使预编译模式, 也要花费10µs的时间,因此正则表达式是迄今为止效率最低的方法。

In [20]: patt=  re.compile(r"[ .-]")

In [21]: %%timeit            
s = "Hello----.....    World"
patt.sub( "_", s)
   ....: 
100000 loops, best of 3: 9.98 µs per loop

Pre creating the table gets us down to nanoseconds: 预先创建表格可以使我们降低到纳秒级:

In [22]: %%timeit                                                      
s = "Hello----.....    World"
s.translate(table)
   ....: 

1000000 loops, best of 3: 590 ns per loop

This answer lays out a variety of different ways to accomplish this task, contrasting different functions and inputs by speed. 该答案列出了完成此任务的各种不同方法,并按速度对比了不同的功能和输入。

If you are replacing few characters, the fastest way is the way in your question, by chaining multiple replaces, with regular expressions being the slowest . 如果您要替换几个字符,则最快的方法是通过链接多个替换(正则表达式最慢)来解决问题

If you want to make this more 'pythonic', the best way to leverage both speed and readability , is to make a list of the characters you want to replace, and loop through them. 如果您想使其更具“ Python风格”,则可以同时兼顾速度 可读性 ,最好的方法是列出要替换的字符,并循环遍历它们。

text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM