简体   繁体   English

Python RegEx:提取方括号之间的时间戳

[英]Python RegEx: Extract timestamp between Square brackets

I have a source data which is given below:-我有一个源数据,如下所示:-

14.284.2.1572 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
187.109.797.1798 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
16.197.978.107 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
190.392.905.549 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048

I wanted to extract data between square brackets example 21/Jun/2019:15:45:24 -0700 .我想在方括号示例21/Jun/2019:15:45:24 -0700之间提取数据。

I am written a regex code but it looks not optimum, can we have a better way to achieve desired result.我写了一个正则表达式代码,但它看起来不是最佳的,我们可以有更好的方法来达到预期的结果。

re.findall(r"([0-9]{2}/[A-Za-z]{3}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\s-[0-9]{4})", data)

I have also tried with ?<= and ?= , but problem is special characters in data.我也尝试过?<=?= ,但问题是数据中的特殊字符。 Any suggestion or input will be appreciated.任何建议或意见将不胜感激。

I would simplify your regex pattern and just match a leading IP address, followed by dash, a username, and then a timestamp term inside square brackets.我会简化你的正则表达式模式,只匹配一个领先的 IP 地址,然后是破折号、用户名,然后是方括号内的时间戳项。

inp = """14.284.2.1572 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
187.109.797.1798 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
16.197.978.107 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
190.392.905.549 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048"""

timestamps = re.findall(r'^\d+(?:\.\d+){3} - \w+ \[(.*?)\]', inp, flags=re.M)
print(timestamps)

This prints:这打印:

[
    '21/Jun/2019:15:45:24 -0700',
    '21/Jun/2019:15:45:25 -0700',
    '21/Jun/2019:15:45:27 -0700',
    '21/Jun/2019:15:45:28 -0700'
]

This might be what you've been looking for: re.findall(r"(?<=\[).*?(?=\])", data) ;这可能是您一直在寻找的: re.findall(r"(?<=\[).*?(?=\])", data) ; returns ['21/Jun/2019:15:45:24 -0700'] for your first line.为您的第一行返回['21/Jun/2019:15:45:24 -0700']

Another option would be to try using .split() like data.split('[')[1].split(']')[0]另一种选择是尝试使用.split()之类data.split('[')[1].split(']')[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM