简体   繁体   English

尝试创建Python脚本以从.log文件提取数据

[英]Trying to create a Python Script to extract data from .log files

I'm trying to create a Python Script but I'm a bit stuck and can't find what I'm looking for on a Google search as it's quite specific. 我正在尝试创建Python脚本,但是有点卡住了,因为它非常具体,所以无法在Google搜索中找到所需的内容。

I need to run a script on two .log files (auth.log and access.log) to view the following information: 我需要在两个.log文件(auth.log和access.log)上运行脚本,以查看以下信息:

Find how many attempts were made with the bin account 查找使用bin帐户进行了多少次尝试

So how many attempts the bin account made to try and get into the server . 因此,bin帐户尝试了多少次尝试进入服务器

The logs are based off being hacked and needing to identify how and who is responsible. 日志基于被黑客入侵,需要确定如何以及由谁负责。

Would anyone be able to give me some help in how I go about doing this? 任何人都可以在我的工作方式上给我一些帮助吗? I can provide more information if needed. 如果需要,我可以提供更多信息。

Thanks in advance. 提前致谢。

Edit: 编辑:

I've managed to print all the times 'bin' appears in the log which is one way of doing it. 我已经设法打印出“ bin”出现在日志中的所有时间,这是做到这一点的一种方法。 Does anyone know if I can count how many times 'bin' appears as well? 有谁知道我是否也可以算出 “ bin”出现了多少次?

with open("auth.log") as f:
for line in f:
    if "bin" in line:
        print line

If you want ot use tool then you can use ELK(Elastic,Logstash and kibana). 如果您要使用工具,则可以使用ELK(弹性,Logstash和kibana)。 if no then you have to read first log file then apply regex according to your requirment. 如果否,那么您必须先阅读日志文件,然后根据您的要求应用正则表达式。

Given that you work with system logs and their format is known and stable, my approach would be something like: 假设您使用系统日志并且它们的格式是已知且稳定的,那么我的方法将是:

  • identify a set of keywords (either common, or one per log) 识别一组关键字(通用或每个日志一个)
  • for each log, iterate line by line 对于每个日志,逐行进行迭代
  • once keywords match, add the relevant information from each line in eg a dictionary 关键字匹配后,在字典中添加每行的相关信息

You could use shell tools (like grep , cut and/or awk ) to pre-process the log and extract relevant lines from the log (I assume you only need eg error entries). 您可以使用shell工具(如grepcut和/或awk )对日志进行预处理,并从日志中提取相关行(我假设您仅需要例如错误条目)。

You can use something like this as a starting point. 您可以使用类似这样的起点。

In case you might be interested in extracting some data and save it to a .txt file, the following sample code might be helpful: 如果您可能想提取一些数据并将其保存到.txt文件,则以下示例代码可能会有所帮助:

import re
import sys
import os.path


expDate = '2018-11-27'
expTime = '11-21-09'


infile = r"/home/xenial/Datasets/CIVIT/Nov_27/rover/NMND17420010S_"+expDate+"_"+expTime+".LOG"

keep_phrases = ["FINESTEERING"]

with open(infile) as f:
    f = f.readlines()

with open('/home/xenial/Datasets/CIVIT/Nov_27/rover/GPS_'+expDate+'_'+expTime+'.txt', 'w') as file:
    file.write("gpsWeek,gpsSOW\n")
    for line in f:
        for phrase in keep_phrases:
            if phrase in line:
                resFind = re.findall('\.*?FINESTEERING,(\d+).*?,(\d+\.\d*)',line)[0]
                gpsWeek = re.findall('\.*?FINESTEERING,(\d+)',line)[0]
                gpsWeekStr = str(gpsWeek)

                gpsSOW = re.findall('\.*?FINESTEERING,'+ gpsWeekStr + ',(\d+\.\d*)',line)[0]
                gpsSOWStr = str(gpsSOW)

                file.write(gpsWeekStr+','+gpsSOWStr+'\n')
                break

print ("------------------------------------")

In my case, FINESTEERING was an interesting keyword in my .log file to extract numbers, including GPS_Week and GPS_Seconds_of_Weeks. 就我而言,FINESTEERING是我的.log文件中一个有趣的关键字,用于提取数字,包括GPS_Week和GPS_Seconds_of_Weeks。 You may modify this code to suit your own application. 您可以修改此代码以适合您自己的应用程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM