繁体   English   中英

Python-使用Regex从文件中提取yyyyMMddhhmmss

[英]Python - Extract yyyyMMddhhmmss from file using Regex

我试图使用正则表达式从字符串中获取日期(格式为yyyymmddhhmmss),但是我找不到要使用的模式。

我正在尝试以下代码:

import re
string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\s\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0],
print(result)

但是我收到以下错误:

result = regex.findall(string)[0],
IndexError: list index out of range

如何使用正则表达式从脚本中的字符串返回“ 20190529050003”?

谢谢!

如果我们的日期恰好在斜杠之后,我们可以简单地使用以下表达式:

.+\/(\d{4})(\d{2})(\d{2}).+

然后,如果有必要,并且我们希望添加更多的边界,我们肯定可以这样做,例如:

.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2}).+

演示

要么:

^.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\/.+$

演示

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r".+\/(\d{4})(\d{2})(\d{2}).+"

test_str = "date file /20190529050003/folder "

subst = "\\1-\\2-\\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

如果我们想获取所有数字,则可以使用另一个表达式:

.+\/(\d+)\/.+

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r".+\/(\d+)\/.+"

test_str = "date file /20190529050003/folder "

subst = "\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

演示

RegEx电路

jex.im可视化正则表达式:

在此处输入图片说明

您的正则表达式模式已关闭,因为目标时间戳记中没有空间。 这是执行搜索的一种简单方法:

string = "date file /20190529050003/folder "
matches = re.findall(r'\b\d{14}\b', string)
print(matches)

打印:

['20190529050003']

我们可以尝试使模式更具针对性,例如仅允许小时,分钟等字段的有效值。 但是,这将需要做更多的工作,并且如果您不希望在文本中看到任何时间戳的14位数字,那么我建议您避免使该模式比必须的复杂。

从表达式中删除了\\s

string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0]
'20190529050003'

我建议将导致错误的行分为两行:

matches = regex.findall(string)
result = matches[0]

现在,您可以检查matches以查看其包含的内容。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM