简体   繁体   中英

Regular expression for find the text

I want to get the My Text Content that immediately follows AB.00.000 .

I could able to get this AB.00.000 by using the below regular expression,

([A-Z]{2,3}\.[0-9]{2}\.[0-9]{3})

How do I get the text next to the AB.00.000 in Python?

Here is the input string:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard 

AB.00.000 My Text Content

$!#"!

23:50

My Phone

It seems you want to get the whole rest of the line after your pattern is found.

You may use

r'\b[A-Z]{2,3}\.[0-9]{2}\.[0-9]{3}\b\s*(.*)'

See the regex demo . Note that \\b is a word boundary that requires a char other than a letter/digit/ _ before or after a word char (or start/end of string). The \\s*(.*) is what your solution is missing badly:

  • \\s* - 0+ whitespaces
  • (.*) - Capturing group #1: any 0 or more chars other than line break chars, as many as possible, ie the rest of the line.

If the pattern must reside at the beginning of a line a regex way to extract the text you need will look like

r'(?m)^[A-Z]{2,3}\.[0-9]{2}\.[0-9]{3}\b\s*(.*)'

See another regex demo . (?m) (= re.M option) makes ^ match start of a line, not only start of the whole string, position.

Python:

m = re.search(r'\b[A-Z]{2,3}\.[0-9]{2}\.[0-9]{3}\b\s*(.*)')
if m:
    print(m.group(1))

Note that to access the first (and only here) parenthesized part of the match you need to access the match group via .group(1) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM