简体   繁体   English

如何搜索/提取字符串中的模式?

[英]How to search/extract patterns in a string?

I have a pattern I want to search for in my message. 我有一个要在邮件中搜索的模式。 The patterns are: 模式是:

1. "aaa-b3-c"
2. "a3-b6-c"
3. "aaaa-bb-c"

I know how to search for one of the patterns, but how do I search for all 3? 我知道如何搜索其中一种模式,但是如何搜索所有3种模式?

Also, how do you identify and extract dates in this format: 5/21 or 5/21/2019. 另外,您如何识别和提取以下格式的日期:5/21或5/21/2019。

found = re.findall(r'.{3}-.{2}-.{1}', message)

The first part could be a quantifier {2,4} instead of 3. The dot matches any character except a newline, [a-zA-Z0-9] will match a upper or lowercase char az or a digit: 第一部分可以是量词{2,4}而不是3。点与换行符匹配,除了换行符, [a-zA-Z0-9]将匹配大写或小写字符az或数字:

\b[a-zA-Z0-9]{2,4}-[a-zA-Z0-9]{2}-[a-zA-Z0-9]\b

Demo 演示版

You could add word boundaries \\b or anchors ^ and $ on either side if the characters should not be part of a longer word. 如果字符不应该是较长单词的一部分,则可以在单词两侧加上\\b或锚定^$

For the second pattern you could also use \\d with a quantifier to match a digit and an optional patter to match the part with / and 4 digits: 对于第二种模式,您还可以使用\\d和量词来匹配数字,并使用可选的模式来匹配带有/和4位数字的部分:

\d{1,2}/\d{2}(?:/\d{4})?

Regex demo 正则表达式演示

Note that the format does not validate a date itself. 请注意,该格式不会验证日期本身。 Perhaps this page can help you creating / customize a more specific date format. 也许此页面可以帮助您创建/自定义更具体的日期格式。

尝试这个 :

found = re.findall(r'a{2,4}-b{2}-c', message)

You could use 你可以用

a{2,4}-bb-c

as a pattern. 作为一种模式。


Now you need to check the match for truthiness: 现在您需要检查匹配的真实性:

if (match := re.search(pattern, string)) is not None:
    # do sth. here

As from Python 3.8 you can use the walrus operator as in Python 3.8您可以像下面一样使用walrus运算符

 if (match := re.search(pattern, string)) is not None: # do sth. here 

尝试这个:

re.findall(r'a.*-b.*-c',message)

Here, we might just want to write three expressions, and swipe our inputs from left to right just to be safe and connect them using logical ORs and in case we had more patterns we can simply add to it, similar to: 在这里,我们可能只想编写三个表达式,然后从左向右滑动我们的输入以确保安全并使用逻辑或将它们连接起来,如果我们有更多的模式,我们可以简单地添加到其中,类似于:

([a-z]+-[a-z]+[0-9]+-[a-z]+)
([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])
([a-z]+-[a-z]+-[a-z])

which would add to: 这将增加:

([a-z]+-[a-z]+[0-9]+-[a-z]+)|([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])|([a-z]+-[a-z]+-[a-z])

Then, we might want to bound it with start and end chars: 然后,我们可能希望将其与开始和结束字符绑定:

^([a-z]+-[a-z]+[0-9]+-[a-z]+)$|^([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])$|^([a-z]+-[a-z]+-[a-z])$

or 要么

^(([a-z]+-[a-z]+[0-9]+-[a-z]+)|([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])|([a-z]+-[a-z]+-[a-z]))$

在此处输入图片说明

RegEx 正则表达式

If this expression wasn't desired, it can be modified or changed in regex101.com . 如果不需要此表达式,则可以在regex101.com中对其进行修改或更改。

RegEx Circuit RegEx电路

jex.im visualizes regular expressions: jex.im可视化正则表达式:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM