[英]Trying to create a Python Script to extract data from .log files
I'm trying to create a Python Script but I'm a bit stuck and can't find what I'm looking for on a Google search as it's quite specific. 我正在尝试创建Python脚本,但是有点卡住了,因为它非常具体,所以无法在Google搜索中找到所需的内容。
I need to run a script on two .log files (auth.log and access.log) to view the following information: 我需要在两个.log文件(auth.log和access.log)上运行脚本,以查看以下信息:
Find how many attempts were made with the bin account
查找使用bin帐户进行了多少次尝试
So how many attempts the bin account made to try and get into the server . 因此,bin帐户尝试了多少次尝试进入服务器 。
The logs are based off being hacked and needing to identify how and who is responsible. 日志基于被黑客入侵,需要确定如何以及由谁负责。
Would anyone be able to give me some help in how I go about doing this? 任何人都可以在我的工作方式上给我一些帮助吗? I can provide more information if needed.
如果需要,我可以提供更多信息。
Thanks in advance. 提前致谢。
Edit: 编辑:
I've managed to print all the times 'bin' appears in the log which is one way of doing it. 我已经设法打印出“ bin”出现在日志中的所有时间,这是做到这一点的一种方法。 Does anyone know if I can count how many times 'bin' appears as well?
有谁知道我是否也可以算出 “ bin”出现了多少次?
with open("auth.log") as f:
for line in f:
if "bin" in line:
print line
If you want ot use tool then you can use ELK(Elastic,Logstash and kibana). 如果您要使用工具,则可以使用ELK(弹性,Logstash和kibana)。 if no then you have to read first log file then apply regex according to your requirment.
如果否,那么您必须先阅读日志文件,然后根据您的要求应用正则表达式。
Given that you work with system logs and their format is known and stable, my approach would be something like: 假设您使用系统日志并且它们的格式是已知且稳定的,那么我的方法将是:
You could use shell tools (like grep
, cut
and/or awk
) to pre-process the log and extract relevant lines from the log (I assume you only need eg error entries). 您可以使用shell工具(如
grep
, cut
和/或awk
)对日志进行预处理,并从日志中提取相关行(我假设您仅需要例如错误条目)。
You can use something like this as a starting point. 您可以使用类似这样的起点。
In case you might be interested in extracting some data and save it to a .txt
file, the following sample code might be helpful: 如果您可能想提取一些数据并将其保存到
.txt
文件,则以下示例代码可能会有所帮助:
import re
import sys
import os.path
expDate = '2018-11-27'
expTime = '11-21-09'
infile = r"/home/xenial/Datasets/CIVIT/Nov_27/rover/NMND17420010S_"+expDate+"_"+expTime+".LOG"
keep_phrases = ["FINESTEERING"]
with open(infile) as f:
f = f.readlines()
with open('/home/xenial/Datasets/CIVIT/Nov_27/rover/GPS_'+expDate+'_'+expTime+'.txt', 'w') as file:
file.write("gpsWeek,gpsSOW\n")
for line in f:
for phrase in keep_phrases:
if phrase in line:
resFind = re.findall('\.*?FINESTEERING,(\d+).*?,(\d+\.\d*)',line)[0]
gpsWeek = re.findall('\.*?FINESTEERING,(\d+)',line)[0]
gpsWeekStr = str(gpsWeek)
gpsSOW = re.findall('\.*?FINESTEERING,'+ gpsWeekStr + ',(\d+\.\d*)',line)[0]
gpsSOWStr = str(gpsSOW)
file.write(gpsWeekStr+','+gpsSOWStr+'\n')
break
print ("------------------------------------")
In my case, FINESTEERING was an interesting keyword in my .log
file to extract numbers, including GPS_Week and GPS_Seconds_of_Weeks. 就我而言,FINESTEERING是我的
.log
文件中一个有趣的关键字,用于提取数字,包括GPS_Week和GPS_Seconds_of_Weeks。 You may modify this code to suit your own application. 您可以修改此代码以适合您自己的应用程序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.