Python RegEx：无法捕获所有数据（python3.6，scrapy）

Question

I was trying to script a website of length information using the following simple code: 我试图使用以下简单代码编写长度信息网站的脚本：

list = re.findall('(?<=Length:\s\s)[:\d]+', response.text)      
if len(list) > 0:            
    data['Length'] = list[0]        
else:            
    data['Length'] = '00:00'

However, it only gets the information if the length information is less than one hour. 但是，只有在长度信息少于一小时的情况下，它才能获取信息。 For example, it gets the 51:00 but not 01:08:47. 例如，它将获取51:00，但不会获取01:08:47。 I checked the source code for both shorter and longer than one hour. 我检查了源代码的时间是否短于一个小时。 Here are how they look. 这是它们的外观。 It seems that for length more than 1 hour, there is one less white space. 似乎长度超过1小时，空白空间减少了一个。 So I tried, but this time, list only returns a white space. 所以我尝试了，但是这次，列表仅返回空白。 Does anybody know how to get both short and long information? 有人知道如何同时获取简短信息和长期信息吗？ Thank you very much! 非常感谢你！

list = re.findall('(?<=Length:)[\s:\d]+', response.text)      
if len(list) > 0:            
    data['Length'] = list[0]        
else:            
    data['Length'] = '00:00'

Answer 1

您需要'(?<=Length:)\\s*(\\d\\d[\\s*:\\s*\\d\\d]+)' 。

Answer 2

Try this Regex and extract whatever is present in group 1: 尝试使用此Regex并提取组1中存在的所有内容：

Length\s*:\s*(\d+\s*(?::\s*\d+\s*){1,2})

Click for Demo 点击演示

Explanation: 说明：

Length\\s*: - matches Length literally followed by 0+ occurrences of a white-space, as many as possible Length\\s*: -匹配Length字面量，后面尽可能多地出现0+个空格
:\\s* - matches a : followed by 0+ white-spaces :\\s* -匹配一个:后跟0+空格
\\d+\\s* - matches 1+ occurrences of a digit followed by 0+ white-spaces. \\d+\\s* -匹配1+个出现的数字，后跟0+个空格。 We start capturing the text from here in Group 1. We capture until the end of the match. 我们从第1组的此处开始捕获文本。我们捕获直到比赛结束。
(?::\\s*\\d+\\s*){1,2} - matches either 1 or 2 occurrences of the pattern (?::\\s*\\d+\\s*) (?::\\s*\\d+\\s*){1,2} -匹配模式中出现的1或2次(?::\\s*\\d+\\s*)
- (?:) - indicates a non-capturing group (?:) -表示非捕获组
- :\\s* - matches a : followed by 0+ occurrences of a white-space :\\s* -匹配一个:然后出现0+次空格
- \\d+ - matches 1+ occurrences of a digit \\d+ -匹配1+个数字
- \\s* - matches 0+ occurrences of a white-space \\s* -匹配0+次出现的空白

Alternative Regex:(without any group) 替代正则表达式：（无任何组）

(?<=Length:\\s\\s)\\d+\\s*(?::\\s*\\d+\\s*){1,2}

Python RegEx：无法捕获所有数据（python3.6，scrapy）

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-01-14 03:45:27

解决方案2
1 2018-01-14 04:44:19

Python RegEx：无法捕获所有数据（python3.6，scrapy）

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-01-14 03:45:27

解决方案2 1 2018-01-14 04:44:19

解决方案1
1 已采纳 2018-01-14 03:45:27

解决方案2
1 2018-01-14 04:44:19