[英]Python - Extract yyyyMMddhhmmss from file using Regex
我试图使用正则表达式从字符串中获取日期(格式为yyyymmddhhmmss),但是我找不到要使用的模式。
我正在尝试以下代码:
import re
string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\s\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0],
print(result)
但是我收到以下错误:
result = regex.findall(string)[0],
IndexError: list index out of range
如何使用正则表达式从脚本中的字符串返回“ 20190529050003”?
谢谢!
如果我们的日期恰好在斜杠之后,我们可以简单地使用以下表达式:
.+\/(\d{4})(\d{2})(\d{2}).+
然后,如果有必要,并且我们希望添加更多的边界,我们肯定可以这样做,例如:
.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2}).+
要么:
^.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\/.+$
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r".+\/(\d{4})(\d{2})(\d{2}).+"
test_str = "date file /20190529050003/folder "
subst = "\\1-\\2-\\3"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
如果我们想获取所有数字,则可以使用另一个表达式:
.+\/(\d+)\/.+
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r".+\/(\d+)\/.+"
test_str = "date file /20190529050003/folder "
subst = "\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
jex.im可视化正则表达式:
您的正则表达式模式已关闭,因为目标时间戳记中没有空间。 这是执行搜索的一种简单方法:
string = "date file /20190529050003/folder "
matches = re.findall(r'\b\d{14}\b', string)
print(matches)
打印:
['20190529050003']
我们可以尝试使模式更具针对性,例如仅允许小时,分钟等字段的有效值。 但是,这将需要做更多的工作,并且如果您不希望在文本中看到任何非时间戳的14位数字,那么我建议您避免使该模式比必须的复杂。
从表达式中删除了\\s
。
string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0]
'20190529050003'
我建议将导致错误的行分为两行:
matches = regex.findall(string)
result = matches[0]
现在,您可以检查matches
以查看其包含的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.