I was trying to script a website of length information using the following simple code:
list = re.findall('(?<=Length:\s\s)[:\d]+', response.text)
if len(list) > 0:
data['Length'] = list[0]
else:
data['Length'] = '00:00'
However, it only gets the information if the length information is less than one hour. For example, it gets the 51:00 but not 01:08:47. I checked the source code for both shorter and longer than one hour. Here are how they look. It seems that for length more than 1 hour, there is one less white space. So I tried, but this time, list only returns a white space. Does anybody know how to get both short and long information? Thank you very much!
list = re.findall('(?<=Length:)[\s:\d]+', response.text)
if len(list) > 0:
data['Length'] = list[0]
else:
data['Length'] = '00:00'
您需要'(?<=Length:)\\s*(\\d\\d[\\s*:\\s*\\d\\d]+)'
。
Try this Regex and extract whatever is present in group 1:
Length\s*:\s*(\d+\s*(?::\s*\d+\s*){1,2})
Explanation:
Length\\s*:
- matches Length
literally followed by 0+ occurrences of a white-space, as many as possible :\\s*
- matches a :
followed by 0+ white-spaces \\d+\\s*
- matches 1+ occurrences of a digit followed by 0+ white-spaces. We start capturing the text from here in Group 1. We capture until the end of the match. (?::\\s*\\d+\\s*){1,2}
- matches either 1 or 2 occurrences of the pattern (?::\\s*\\d+\\s*)
(?:)
- indicates a non-capturing group :\\s*
- matches a :
followed by 0+ occurrences of a white-space \\d+
- matches 1+ occurrences of a digit \\s*
- matches 0+ occurrences of a white-space Alternative Regex:(without any group)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.