简体   繁体   English

如何在不重新启动守护程序的情况下读取和截断snmptrapd日志文件

[英]How to read and truncate the snmptrapd log file without restarting the daemon

i have made a python script that performs a nagios check. 我做了一个执行nagios检查的python脚本。 The functionality of the script is pretty simple it just parses a log and matches some info witch is used to construct the nagios check output. 该脚本的功能非常简单,它仅解析日志并匹配一些信息,用于构造nagios检查输出。 The log is a snmptrapd log witch records the traps from other servers and logs them in /var/log/snmptrapd after witch i just parse them with the script. 该日志是一个snmptrapd日志,它记录了来自其他服务器的陷阱,并将它们记录在/var/log/snmptrapd直到我用脚本解析了它们。 In order to have the latest traps i erase the log from python each time after reading it. 为了拥有最新的陷阱,我每次读取日志后都会从python中删除日志。 In order to preserve the info i have made a cron job that copies the content of the log into another log at an time interval a bit smaller than the nagios check interval. 为了保留信息,我做了一个cron作业,它以比nagios检查间隔小的时间间隔将日志内容复制到另一个日志中。 The thing that i don't understand is why is the log growing so much (i mean the messages log which has i guess 1000 times more info is smaller). 我不明白的是为什么日志增长如此之多(我的意思是消息日志的信息量要小1000倍)。 From what i've seen in the log there are a lot of special characters like ^@ and i think that this is done by the way i'm manipulating the file from pyton but seeing that i olny have like three weeks of experience with it I can't seem to figure out the problem. 根据我在日志中看到的内容,有很多特殊字符,例如^@ ,我认为这是通过操作pyton中的文件来完成的,但是看到我奥尔尼有3周的使用经验我似乎无法找出问题所在。

The script code is the following: 脚本代码如下:

import sys, os, re

validstring = "OK"
filename = "/var/log/snmptrapd.log"

if os.stat(filename)[6] == 0:
        print validstring
        sys.exit()

else:
        f = open(filename,"r")
        sharestring = ""
        line1 = []
        patte0 = re.compile("[0-9]+-[0-9]+-[0-9]+")
        patte2 = re.compile("NG: [a-zA-Z\s=0-9]+.*")
        for line in f:
                line1 = line.split(" ")
                if re.search(patte0,line1[0]):
                        sharestring = sharestring + line1[1] + " "
                        continue
                result2 = re.search(patte2,line)
                if result2:
                        result22 = result2.group()
                        result22 = result22.replace("NG:","")
                        sharestring = sharestring + result22 + " "
        f.close()
        f1 = open(filename,"w")
        f1.close()
        print sharestring
        sys.exit(2)

~

The log looks like: 日志如下:

2012-07-11 04:17:16 Some IP(via UDP: [this is an ip]:port) TRAP, SNMP v1, community somestring
    SNMPv2-SMI::enterprises.OID Some info which is not necesarry
    SNMPv2-MIB::sysDescrOID = STRING: info which i'm matching

I'm pretty sure that it has something to do with the my way of erasing the file but i can't figure it out. 我很确定这与我擦除文件的方式有关,但我无法弄清楚。 If you have some idea i would be really interested. 如果您有任何想法,我将非常感兴趣。 Thank you. 谢谢。

As an information about the size i have 93 lines(so says Vim) and the log occupies 161K and that is not ok because the lines are quite short. 作为有关大小的信息,我有93行(所以说Vim),对数占用161K,这并不好,因为行很短。

OK it has nothing to do with the way i read and erased the file. 好的,它与我读取和删除文件的方式无关。 Is something in the snmptrapd daemon that is doing this when i'm erasing it's log file. snmptrapd守护程序中有什么在我擦除日志文件时执行的操作。 I have modified my code and now i send SIGSTOP to snmptrapd reight before i open the file, and i make my modifications to the file and then i send SIGCONT after i'm done but it seem i experience the same behavior. 我已经修改了代码,现在我在打开文件之前将SIGSTOP发送到snmptrapd reight,然后对文件进行了修改,然后在完成后发送SIGCONT,但似乎我遇到了同样的行为。 The new code looks like(the different parts): 新代码看起来像(不同部分):

else:
    command = "pidof snmptrapd"
    p=subprocess.Popen(shlex.split(command),stdout=subprocess.PIPE)
    pidstring = p.stdout.readline()
    patte1 = re.compile("[0-9]+")
    pidnr = re.search(patte1,pidstring)
    pid = pidnr.group()
    os.kill(int(pid), SIGSTOP)
    time.sleep(0.5)
    f = open(filename,"r+")
    sharestring = ""

and

                  sharestring = sharestring + result22 + " "
    f.truncate(0)
    f.close()
    time.sleep(0.5)
    os.kill(int(pid), SIGCONT)
    print sharestring

I'm thinking of stopping the daemon erasing the file and after that recreating it with the proper permissions and starting the daemon. 我正在考虑停止守护进程擦除文件,然后以适当的权限重新创建它并启动守护进程。

I don't think you can, but here are some things to try 我认为您不能,但是您可以尝试以下方法

Truncating a File 截断文件

f1 = open(filename, 'w')
f1.close()

is a hacky side effect way of deleting a files contents and will probably be causing undesired side effects depending on the underlying OS if other applications have that file open. 是删除文件内容的一种不良副作用,如果其他应用程序打开了该文件,则可能会导致不希望的副作用,具体取决于底层操作系统。

Using the File Object method truncate() 使用文件对象方法truncate()

truncate([size])

Truncate the file's size. 截断文件的大小。 If the optional size argument is present, the file is truncated to (at most) that size. 如果存在可选的size参数,则文件将被截断为(最多)该大小。 The size defaults to the current position. 尺寸默认为当前位置。 The current file position is not changed. 当前文件位置未更改。 Note that if a specified size exceeds the file's current size, the result is platform-dependent: possibilities include that the file may remain unchanged, increase to the specified size as if zero-filled, or increase to the specified size with undefined new content. 请注意,如果指定的大小超过文件的当前大小,则结果取决于平台:可能的情况是文件可能保持不变,增大为指定大小(好像是零填充的)或使用未定义的新内容增大到指定的大小。 Availability: Windows, many Unix variants. 可用性:Windows,许多Unix变体。

Probably the only determinist way to do this is 可能唯一的确定性方法是

stop the snmptrapd process at the start of the script, use the proper os module function remove and then recreate the file and restart the snmptrapd daemon at the end of the script. 在脚本开始处停止snmptrapd进程,使用适当的os module功能remove ,然后重新创建文件,并在脚本末尾重新启动snmptrapd守护程序。

os.remove(path)

Remove (delete) the file path. 删除(删除)文件路径。 If path is a directory, OSError is raised; 如果path是目录,则引发OSError;否则,将引发OSError。 see rmdir() below to remove a directory. 请参阅下面的rmdir()删除目录。 This is identical to the unlink() function documented below. 这与下面记录的unlink()函数相同。 On Windows, attempting to remove a file that is in use causes an exception to be raised; 在Windows上,尝试删除正在使用的文件会引发异常。 on Unix, the directory entry is removed but the storage allocated to the file is not made available until the original file is no longer in use. 在Unix上,目录条目已删除,但分配给文件的存储空间不再可用,直到不再使用原始文件为止。

Shared resource concern 共享资源问题

You still might have problems with having two processes trying to fight for writing to a single file without some kind of locking mechanism and having non-deterministic things happening to the file. 您可能会遇到以下问题:两个进程试图在没有某种锁定机制的情况下为写入单个文件而奋斗,并且文件上发生了不确定的事情。 I bet you can send a SIGINT or something similar to your daemon process and get it to re-read the file or something, check your documentation. 我敢打赌,您可以发送SIGINT或类似于守护进程的内容,并让它重新读取文件或其他内容,请查看文档。

Manipulating shared resources, especially file resources without exclusive locking is going to be trouble, especially with filesystem caching and application caching of data. 在没有排他锁定的情况下操作共享资源,尤其是文件资源将会很麻烦,尤其是在文件系统缓存和数据的应用程序缓存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM