I have a string .I want to extract the substring which starts with a number and ends with a number in that substring.
MY string is "05/24/2019 04:33 PM 582 atm1.py"
I tried with the below pattern ^\\d.+\\s+\\d$
i="05/24/2019 04:33 PM 582 atm1.py"
print(re.match("^\d.+\s+\d$",i))
Expected o/p= "05/24/2019 04:33 PM 582"
Actual o/p=the entire string I am getting.
A very sensitive pattern:
print(re.match("\d+/\d+/\d+\s+\d+:\d+\s+PM\s+\d+",i).group(0))
Or use:
print(re.match(".+\s+",i).group(0))
Output:
05/24/2019 04:33 PM 582
Try the following regex: "\\d[\\d\\s:APM/]*\\d"
import re
s = "05/24/2019 04:33 PM 582 atm1.py"
pattern = "\d[\d\s:APM/]*\d"
print(re.match(pattern, s).group(0))
Regex breakdown: 1. \\d
: a decimal character (0-9) 2. [\\d\\s:APM/]*
: the *
means any number of the characters inside the square brackets. Inside the square brackets we have \\d
for decimals (0-9), \\s
for spaces, and :APM/
for those literal characters ( :
for the time, APM
for AM and PM, and /
for the date). 3. \\d
: a decimal character (0-9)
Outputs: 05/24/2019 04:33 PM 582
If you want to get a substring that starts with the first number as whole word and ends with the last number as whole from a longer string, you may use
r'\b\d+\b.*\b\d+\b'
Details
\\b\\d+\\b
- a word boundary, digit and a word boundary (no digits, letters or underscores before and after are allowed) .*
- any 0+ chars (without re.DOTALL
or re.S
flag, only matching non-linebreak chars), as many as possible \\b\\d+\\b
- a word boundary, digit and a word boundary (no digits, letters or underscores before and after are allowed) In Python, use
import re
i="05/24/2019 04:33 PM 582 atm1.py"
m = re.search(r'\b\d+\b.*\b\d+\b', i)
if m:
print(m.group()) # => 05/24/2019 04:33 PM 582
See the Python demo .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.