简体   繁体   English

如何忽略[az] [AZ]以外的字符

[英]how to Ignore characters other than [a-z][A-Z]

How can I ignore characters other than [az][AZ] in input string in python, and after applying method what will the string look like? 如何在python输入字符串中忽略[az] [AZ]以外的字符,并且在应用方法后,字符串将是什么样?

Do I need to use regular expressions? 我需要使用正则表达式吗?

If you need to use a regex, use a negative character class ( [^...] ): 如果需要使用正则表达式,请使用负字符类( [^...] ):

re.sub(r'[^a-zA-Z]', '', inputtext)

A negative character class matches anything not named in the class. 否定字符类会匹配类中命名的任何字符。

Demo: 演示:

>>> import re
>>> inputtext = 'The quick brown fox!'
>>> re.sub(r'[^a-zA-Z]', '', inputtext)
'Thequickbrownfox'

But using str.translate() is way faster: 但使用str.translate()远远快:

import string
ascii_letters = set(map(ord, string.ascii_letters))
non_letters = ''.join(chr(i) for i in range(256) if i not in ascii_letters)
inputtext.translate(None, non_letters)

Using str.translate() is more than 10 times faster than a regular expression: 使用str.translate()比正则表达式快十倍以上:

>>> import timeit, partial, re
>>> ascii_only = partial(re.compile(r'[^a-zA-Z]').sub, '')
>>> timeit.timeit('f(t)', 'from __main__ import ascii_only as f, inputtext as t')
7.903045892715454
>>> timeit.timeit('t.translate(None, m)', 'from __main__ import inputtext as t, non_letters as m')
0.5990171432495117

Using Jakub's method is slower still: 使用Jakub的方法仍然较慢:

>>> timeit.timeit("''.join(c for c in t if c not in l)", 'from __main__ import inputtext as t; import string; l = set(string.letters)')
9.960685968399048

You can use regex : 您可以使用regex

re.compile(r'[^a-zA-Z]').sub('', your_string)

You could also manage without regular expressions (eg, if you had regex phobia): 您也可以不使用正则表达式进行管理(例如,如果您遇到了正则表达式恐惧症):

import string
new_string = ''.join(c for c in old_string
                     if c not in set(string.letters))

Although I would use regex, this example has additional educational values: set , comprehension and string library. 尽管我将使用正则表达式,但此示例还具有其他教育意义: setcomprehension字符串库。 Note that set is not strictly needed here 请注意,这里并不需要严格set

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM