简体   繁体   中英

How to search/extract patterns in a string?

I have a pattern I want to search for in my message. The patterns are:

1. "aaa-b3-c"
2. "a3-b6-c"
3. "aaaa-bb-c"

I know how to search for one of the patterns, but how do I search for all 3?

Also, how do you identify and extract dates in this format: 5/21 or 5/21/2019.

found = re.findall(r'.{3}-.{2}-.{1}', message)

The first part could be a quantifier {2,4} instead of 3. The dot matches any character except a newline, [a-zA-Z0-9] will match a upper or lowercase char az or a digit:

\b[a-zA-Z0-9]{2,4}-[a-zA-Z0-9]{2}-[a-zA-Z0-9]\b

Demo

You could add word boundaries \\b or anchors ^ and $ on either side if the characters should not be part of a longer word.

For the second pattern you could also use \\d with a quantifier to match a digit and an optional patter to match the part with / and 4 digits:

\d{1,2}/\d{2}(?:/\d{4})?

Regex demo

Note that the format does not validate a date itself. Perhaps this page can help you creating / customize a more specific date format.

尝试这个 :

found = re.findall(r'a{2,4}-b{2}-c', message)

You could use

a{2,4}-bb-c

as a pattern.


Now you need to check the match for truthiness:

if (match := re.search(pattern, string)) is not None:
    # do sth. here

As from Python 3.8 you can use the walrus operator as in

 if (match := re.search(pattern, string)) is not None: # do sth. here 

尝试这个:

re.findall(r'a.*-b.*-c',message)

Here, we might just want to write three expressions, and swipe our inputs from left to right just to be safe and connect them using logical ORs and in case we had more patterns we can simply add to it, similar to:

([a-z]+-[a-z]+[0-9]+-[a-z]+)
([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])
([a-z]+-[a-z]+-[a-z])

which would add to:

([a-z]+-[a-z]+[0-9]+-[a-z]+)|([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])|([a-z]+-[a-z]+-[a-z])

Then, we might want to bound it with start and end chars:

^([a-z]+-[a-z]+[0-9]+-[a-z]+)$|^([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])$|^([a-z]+-[a-z]+-[a-z])$

or

^(([a-z]+-[a-z]+[0-9]+-[a-z]+)|([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])|([a-z]+-[a-z]+-[a-z]))$

在此处输入图片说明

RegEx

If this expression wasn't desired, it can be modified or changed in regex101.com .

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM