简体   繁体   English

Python正则表达式:删除可选字符

[英]Python Regex: Remove optional characters

I have a regex pattern with optional characters however at the output I want to remove those optional characters. 我有一个带有可选字符的正则表达式模式,但是在输出中我想删除那些可选字符。 Example: 例:

string = 'a2017a12a'    
pattern =  re.compile("((20[0-9]{2})(.?)(0[1-9]|1[0-2]))")
result = pattern.search(string)
print(result)

I can have a match like this but what I want as an output is: 我可以有一个这样的比赛,但我想要的输出是:

desired output = '201712'

Thank you. 谢谢。

You've already captured the intended data in groups and now you can use re.sub to replace the whole match with just contents of group1 and group2. 您已经按组捕获了预期的数据,现在您可以使用re.sub将全部匹配项替换为group1和group2的内容。

Try your modified Python code, 尝试修改后的Python代码,

import re

string = 'a2017a12a'    
pattern =  re.compile(".*(20[0-9]{2}).?(0[1-9]|1[0-2]).*")
result = re.sub(pattern, r'\1\2', string)
print(result)

Notice, how I've added .* around the pattern, so any of the extra characters around your data is matched and gets removed. 请注意,我是如何在模式周围添加.* ,以便匹配并删除数据中的所有多余字符。 Also, removed extra parenthesis that were not needed. 此外,删除不需要的多余括号。 This will also work with strings where you may have other digits surrounding that text like this hello123 a2017a12a some other 99 numbers 这也适用于字符串,在该字符串中您可能还有其他数字,例如hello123 a2017a12a some other 99 numbers

Output, 输出,

201712

Regex Demo 正则表达式演示

You can just use re.sub with the pattern \\D (=not a number): 您可以只将re.sub\\D (=不是数字):

>>> import re
>>> string = 'a2017a12a'
>>> re.sub(r'\D', '', string)
'201712'

Try this one: 试试这个:

import re
string = 'a2017a12a'    
pattern =  re.findall("(\d+)", string)  # this regex will capture only digit
print("".join(p for p in pattern))  # combine all digits

Output: 输出:

201712

If you want to remove all character from string then you can do this 如果要从字符串中删除所有字符,则可以执行此操作

import re
string = 'a2017a12a'
re.sub('[A-Za-z]+','',string)

Output: 输出:

'201712'

You can use re module method to get required output, like: 您可以使用re模块方法来获取所需的输出,例如:

    import re

    #method 1
    string = 'a2017a12a'
    print (re.sub(r'\D', '', string))

    #method 2
    pattern =  re.findall("(\d+)", string)
    print("".join(p for p in pattern))

You can also refer below doc for further knowledge. 您也可以参考下面的文档以了解更多信息。

https://docs.python.org/3/library/re.html https://docs.python.org/3/library/re.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM