[英]Removing everything except letters and spaces from string in Python3.3
I have this example string: happy t00 go 129.129
and I want to keep only the spaces and letters. 我有这个示例字符串:
happy t00 go 129.129
,我只想保留空格和字母。 All I have been able to come up with so far that is pretty efficient is: 到目前为止,我所能想到的是非常有效的:
print(re.sub("\d", "", 'happy t00 go 129.129'.replace('.', '')))
but it is only specific to my example string. 但这仅适用于我的示例字符串。 How can remove all characters other than letters and spaces?
如何删除字母和空格以外的所有字符?
whitelist = set('abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ')
myStr = "happy t00 go 129.129$%^&*("
answer = ''.join(filter(whitelist.__contains__, myStr))
Output: 输出:
>>> answer
'happy t go '
使用一组补码:
re.sub(r'[^a-zA-Z ]+', '', 'happy t00 go 129.129')
Slight variation on inspectorG4dget's method - import from string
& generator comprehension: inspectorG4dget方法的细微变化-从
string
和生成器理解中导入:
from string import ascii_letters
allowed = set(ascii_letters + ' ')
myStr = 'happy t00 go 129.129'
answer = ''.join(l for l in myStr if l in allowed)
answer
# >>> 'happy t go '
(I made myStr a bit longer and pre-compiled the regex to make things a bit more interesting) (我使myStr更长,并预编译了正则表达式,使事情变得更加有趣)
import re
from string import ascii_letters, digits
myStr = 'happy t00 go 129.129'*20
allowed = set(ascii_letters + ' ')
# Generator
%timeit answer = ''.join(l for l in myStr if l in allowed)
# filter/__contains__
%timeit answer = ''.join(filter(allowed.__contains__, myStr))
# Regex
pat = re.compile(r'[^a-zA-Z ]+')
%timeit answer = re.sub(pat, '', myStr)
53 µs ± 6.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 每个循环53 µs±6.43 µs(平均±标准偏差,共运行7次,每个循环10000个)
43.3 µs ± 7.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 每个回路43.3 µs±7.48 µs(平均±标准偏差,共运行7次,每个回路10000个)
26 µs ± 509 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 每个循环26 µs±509 ns(平均±标准偏差,共运行7次,每个循环10000个)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.