简体   繁体   中英

How to extract a substring using regex python?

I have a string .I want to extract the substring which starts with a number and ends with a number in that substring.

MY string is "05/24/2019 04:33 PM 582 atm1.py"

I tried with the below pattern ^\\d.+\\s+\\d$

i="05/24/2019  04:33 PM               582 atm1.py"    
print(re.match("^\d.+\s+\d$",i))

Expected o/p= "05/24/2019 04:33 PM 582" Actual o/p=the entire string I am getting.

A very sensitive pattern:

print(re.match("\d+/\d+/\d+\s+\d+:\d+\s+PM\s+\d+",i).group(0))

Or use:

print(re.match(".+\s+",i).group(0))

Output:

05/24/2019  04:33 PM               582

Try the following regex: "\\d[\\d\\s:APM/]*\\d"

import re

s = "05/24/2019  04:33 PM               582 atm1.py"
pattern = "\d[\d\s:APM/]*\d"
print(re.match(pattern, s).group(0))

Regex breakdown: 1. \\d : a decimal character (0-9) 2. [\\d\\s:APM/]* : the * means any number of the characters inside the square brackets. Inside the square brackets we have \\d for decimals (0-9), \\s for spaces, and :APM/ for those literal characters ( : for the time, APM for AM and PM, and / for the date). 3. \\d : a decimal character (0-9)

Outputs: 05/24/2019 04:33 PM 582

Demo

If you want to get a substring that starts with the first number as whole word and ends with the last number as whole from a longer string, you may use

r'\b\d+\b.*\b\d+\b'

Details

  • \\b\\d+\\b - a word boundary, digit and a word boundary (no digits, letters or underscores before and after are allowed)
  • .* - any 0+ chars (without re.DOTALL or re.S flag, only matching non-linebreak chars), as many as possible
  • \\b\\d+\\b - a word boundary, digit and a word boundary (no digits, letters or underscores before and after are allowed)

In Python, use

import re
i="05/24/2019  04:33 PM               582 atm1.py"
m = re.search(r'\b\d+\b.*\b\d+\b', i)
if m:
    print(m.group()) # => 05/24/2019  04:33 PM               582

See the Python demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM