简体   繁体   English

如何使用正则表达式python提取子字符串?

[英]How to extract a substring using regex python?

I have a string .I want to extract the substring which starts with a number and ends with a number in that substring.我有一个字符串。我想提取以数字开头并以该子字符串中的数字结尾的子字符串。

MY string is "05/24/2019 04:33 PM 582 atm1.py"我的字符串是"05/24/2019 04:33 PM 582 atm1.py"

I tried with the below pattern ^\\d.+\\s+\\d$我尝试使用以下模式^\\d.+\\s+\\d$

i="05/24/2019  04:33 PM               582 atm1.py"    
print(re.match("^\d.+\s+\d$",i))

Expected o/p= "05/24/2019 04:33 PM 582" Actual o/p=the entire string I am getting.预期 o/p= "05/24/2019 04:33 PM 582"实际 o/p=我得到的整个字符串。

A very sensitive pattern:一个非常敏感的模式:

print(re.match("\d+/\d+/\d+\s+\d+:\d+\s+PM\s+\d+",i).group(0))

Or use:或使用:

print(re.match(".+\s+",i).group(0))

Output:输出:

05/24/2019  04:33 PM               582

Try the following regex: "\\d[\\d\\s:APM/]*\\d"试试下面的正则表达式: "\\d[\\d\\s:APM/]*\\d"

import re

s = "05/24/2019  04:33 PM               582 atm1.py"
pattern = "\d[\d\s:APM/]*\d"
print(re.match(pattern, s).group(0))

Regex breakdown: 1. \\d : a decimal character (0-9) 2. [\\d\\s:APM/]* : the * means any number of the characters inside the square brackets.正则表达式细分: 1. \\d :十进制字符 (0-9) 2. [\\d\\s:APM/]**表示方括号内的任意数量的字符。 Inside the square brackets we have \\d for decimals (0-9), \\s for spaces, and :APM/ for those literal characters ( : for the time, APM for AM and PM, and / for the date).在方括号内,我们有\\d代表小数 (0-9), \\s代表空格,以及:APM/代表这些文字字符( :代表时间, APM代表 AM 和 PM,以及/代表日期)。 3. \\d : a decimal character (0-9) 3. \\d : 十进制字符 (0-9)

Outputs: 05/24/2019 04:33 PM 582输出: 05/24/2019 04:33 PM 582

Demo演示

If you want to get a substring that starts with the first number as whole word and ends with the last number as whole from a longer string, you may use如果你想从一个较长的字符串中得到一个以第一个数字作为整个单词开始并以最后一个数字作为整个单词结束的子字符串,你可以使用

r'\b\d+\b.*\b\d+\b'

Details细节

  • \\b\\d+\\b - a word boundary, digit and a word boundary (no digits, letters or underscores before and after are allowed) \\b\\d+\\b - 字边界、数字和字边界(前后不允许有数字、字母或下划线)
  • .* - any 0+ chars (without re.DOTALL or re.S flag, only matching non-linebreak chars), as many as possible .* - 任何 0+ 个字符(没有re.DOTALLre.S标志,只匹配非换行符),尽可能多
  • \\b\\d+\\b - a word boundary, digit and a word boundary (no digits, letters or underscores before and after are allowed) \\b\\d+\\b - 字边界、数字和字边界(前后不允许有数字、字母或下划线)

In Python, use在 Python 中,使用

import re
i="05/24/2019  04:33 PM               582 atm1.py"
m = re.search(r'\b\d+\b.*\b\d+\b', i)
if m:
    print(m.group()) # => 05/24/2019  04:33 PM               582

See the Python demo .请参阅Python 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM