Python-使用Regex从文件中提取yyyyMMddhhmmss

Question

我试图使用正则表达式从字符串中获取日期（格式为yyyymmddhhmmss），但是我找不到要使用的模式。

我正在尝试以下代码：

import re
string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\s\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0],
print(result)

但是我收到以下错误：

result = regex.findall(string)[0],
IndexError: list index out of range

如何使用正则表达式从脚本中的字符串返回“ 20190529050003”？

谢谢！

Answer 1

如果我们的日期恰好在斜杠之后，我们可以简单地使用以下表达式：

.+\/(\d{4})(\d{2})(\d{2}).+

然后，如果有必要，并且我们希望添加更多的边界，我们肯定可以这样做，例如：

.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2}).+

演示

要么：

^.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\/.+$

演示

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r".+\/(\d{4})(\d{2})(\d{2}).+"

test_str = "date file /20190529050003/folder "

subst = "\\1-\\2-\\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

如果我们想获取所有数字，则可以使用另一个表达式：

.+\/(\d+)\/.+

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r".+\/(\d+)\/.+"

test_str = "date file /20190529050003/folder "

subst = "\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

演示

RegEx电路

jex.im可视化正则表达式：

Answer 2

您的正则表达式模式已关闭，因为目标时间戳记中没有空间。 这是执行搜索的一种简单方法：

string = "date file /20190529050003/folder "
matches = re.findall(r'\b\d{14}\b', string)
print(matches)

打印：

['20190529050003']

我们可以尝试使模式更具针对性，例如仅允许小时，分钟等字段的有效值。 但是，这将需要做更多的工作，并且如果您不希望在文本中看到任何非时间戳的14位数字，那么我建议您避免使该模式比必须的复杂。

Answer 3

从表达式中删除了\\s 。

string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0]
'20190529050003'

Answer 4

我建议将导致错误的行分为两行：

matches = regex.findall(string)
result = matches[0]

现在，您可以检查matches以查看其包含的内容。

Python-使用Regex从文件中提取yyyyMMddhhmmss

问题描述

4 个解决方案

解决方案1
3 已采纳 2019-05-29 16:48:09

演示

演示

测试

测试

演示

RegEx电路

解决方案2
1 2019-05-29 16:45:20

解决方案3
1 2019-05-29 16:48:25

解决方案4
0 2019-05-29 16:45:42

Python-使用Regex从文件中提取yyyyMMddhhmmss

问题描述

4 个解决方案

解决方案1 3 已采纳 2019-05-29 16:48:09

演示

演示

测试

测试

演示

RegEx电路

解决方案2 1 2019-05-29 16:45:20

解决方案3 1 2019-05-29 16:48:25

解决方案4 0 2019-05-29 16:45:42

解决方案1
3 已采纳 2019-05-29 16:48:09

解决方案2
1 2019-05-29 16:45:20

解决方案3
1 2019-05-29 16:48:25

解决方案4
0 2019-05-29 16:45:42