[英]Python RegEx: Extract timestamp between Square brackets
I have a source data which is given below:-我有一个源数据,如下所示:-
14.284.2.1572 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
187.109.797.1798 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
16.197.978.107 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
190.392.905.549 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
I wanted to extract data between square brackets example 21/Jun/2019:15:45:24 -0700
.我想在方括号示例
21/Jun/2019:15:45:24 -0700
之间提取数据。
I am written a regex code but it looks not optimum, can we have a better way to achieve desired result.我写了一个正则表达式代码,但它看起来不是最佳的,我们可以有更好的方法来达到预期的结果。
re.findall(r"([0-9]{2}/[A-Za-z]{3}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\s-[0-9]{4})", data)
I have also tried with ?<=
and ?=
, but problem is special characters in data.我也尝试过
?<=
和?=
,但问题是数据中的特殊字符。 Any suggestion or input will be appreciated.任何建议或意见将不胜感激。
I would simplify your regex pattern and just match a leading IP address, followed by dash, a username, and then a timestamp term inside square brackets.我会简化你的正则表达式模式,只匹配一个领先的 IP 地址,然后是破折号、用户名,然后是方括号内的时间戳项。
inp = """14.284.2.1572 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
187.109.797.1798 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
16.197.978.107 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
190.392.905.549 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048"""
timestamps = re.findall(r'^\d+(?:\.\d+){3} - \w+ \[(.*?)\]', inp, flags=re.M)
print(timestamps)
This prints:这打印:
[
'21/Jun/2019:15:45:24 -0700',
'21/Jun/2019:15:45:25 -0700',
'21/Jun/2019:15:45:27 -0700',
'21/Jun/2019:15:45:28 -0700'
]
This might be what you've been looking for: re.findall(r"(?<=\[).*?(?=\])", data)
;这可能是您一直在寻找的:
re.findall(r"(?<=\[).*?(?=\])", data)
; returns ['21/Jun/2019:15:45:24 -0700']
for your first line.为您的第一行返回
['21/Jun/2019:15:45:24 -0700']
。
Another option would be to try using .split()
like data.split('[')[1].split(']')[0]
另一种选择是尝试使用
.split()
之类data.split('[')[1].split(']')[0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.