[英]Python - Search for a specific time range in text file (sed -n equivalent)
I'm trying to make a python script that outputs a specific time range from a log file (similar to the sed command listed below): 我正在尝试制作一个从日志文件输出特定时间范围的python脚本(类似于下面列出的sed命令):
sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
2017-01-26 18:00:00 2017-01-26 18:01:01 2017-01-26 18:01:02 2017-01-26 18:01:09 2017-01-26 18:01:09 2017-01-26 18:01:11 2017-01-26 18:02:01
My python script is searching for a fixed string and not like the sed command above (I suspect that I am doing something wrong, but I can't find the error - please check the code below): 我的python脚本正在搜索固定的字符串,而不像上面的sed命令(我怀疑我做错了,但是我找不到错误-请检查下面的代码):
Please point me where the code should be changed and also advises for code enhancement. 请指出需要更改代码的位置,并建议您进行代码增强。 Thanks! 谢谢!
#!/usr/bin/python
import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0
now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=5)
timestamp = now.strftime("%y%m%d")
fiveago = now - timedelta(minutes=5,seconds=now.second)
current = now.strftime("%Y-%m-%d %H:%M")
pasttime = fiveago.strftime("%Y-%m-%d %H:%M")
pattern = str(current + "|" + pasttime)
f = open('/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log', 'r')
for line in f:
if "POST" in line:
if re.search(pattern, line, re.IGNORECASE):
date = line.split(' ')[1]
time = line.split(' ')[14]
avgtime += int(time)
counter += 1
print(date,time)
f.close()
print(pattern)
print("Total amount of time: ",counter)
print("Total scan time: ",avgtime)
print("Average scan time: ",avgtime / counter)
I do not see what the problem is, but you are asking for sed equivalent of your command, so here is exact translation to python: 我看不出问题出在哪里,但是您要求的是sed命令,因此这里是对python的准确翻译:
import sys, re
use = False
for line in open('/logfile.log'):
if re.search('2017-01-26 18:00', line): use = True
if use: sys.stdout.write(line)
if re.search('2017-01-26 18:02', line): use = False
IIUC, you need the enteries from the log between time stamp that you pass. IIUC,您需要从传递的时间戳之间的日志中输入内容。
import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0
now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=100000)
timestamp = now.strftime("%y%m%d")
fiveago = now - timedelta(minutes=5,seconds=now.second)
current = now.strftime("%Y-%m-%d %H:%M")
pasttime = fiveago.strftime("%Y-%m-%d %H:%M")
pattern = str(current + "|" + pasttime)
print "Start time: ", pasttime ,"End time: ",current ,"\n\n"
filename ='/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log'
with open(filename, 'r') as f:
contents = f.readlines()
for line in contents:
if "POST" in line:
date = line.split(' ')[1]
time = line.split(' ')[14]
logdatetime=date+" "+time
if logdatetime <= current and logdatetime >= pasttime:
print "yes, within the interval : " ,logdatetime
output 产量
Start time: 2017-01-26 20:23 End time: 2017-01-26 20:28
yes, within the interval : 2017-01-26 20:23:20
yes, within the interval : 2017-01-26 20:23:01
yes, within the interval : 2017-01-26 20:23:02
input used for this 用于此的输入
POST 2017-01-26 20:23:20 XX
POST 2017-01-26 20:23:01 XC
POST 2017-01-26 20:23:02 CV
POST 2017-01-26 20:20:09 DAF
POST 2017-01-26 20:20:09 fASF
POST 2017-01-26 20:20:11 Sfas
POST 2017-01-26 20:20:01 fsAf
POST 2017-01-26 20:20:02 asf
POST 2017-01-26 20:20:03 asf
The problem with your solution is that you only look for the two "edge times". 解决方案的问题在于,您只寻找两个“边缘时间”。 In n your 3 minute timerange example this was 18:00
and 18:02
. 在3分钟的时间范围示例中,这是18:00
和18:02
。
What the sed
command does is: sed
命令的作用是:
sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
-n
) 遍历行而不打印( -n
) 2017-01-26 18:00
it starts printing all lines 只要sed发现2017-01-26 18:00
它就会开始打印所有行 2017-01-26 18:02
it stops printing 只要sed发现2017-01-26 18:02
它就会停止打印 In your example your regex pattern is: 在您的示例中,您的正则表达式模式为:
2017-01-26 18:00|2017-01-26 18:02
And will only find either 18:00 or 18:02. 而只会发现无论是在18:00 或 18:02。 So, what you can do is one of these: 因此,您可以执行以下操作之一:
pimp your regex so it also searches for the times in between: 拉皮条您的正则表达式,以便它也搜索之间的时间:
pattern = "|".join([(now-timedelta(minutes=i)).strftime("%Y-%m-%d %H:%M") for i in range(6)])
this will produce eg: 这将产生例如:
'2016-01-26 18:00|2016-01-26 17:59|2016-01-26 17:58|2016-01-26 17:57|2016-01-26 17:56|2016-01-26 17:55'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.