简体   繁体   English

限制Python输出文件大小

[英]Limit Python Output File Size

I have a Python program running on Debian that outputs data using a File object. 我有一个在Debian上运行的Python程序,它使用File对象输出数据。 I would like to set a limit on how large my file can be but I don't want to stop writing the file--I just want to remove the oldest line (at the top of the file). 我想设置一个限制我的文件有多大,但我不想停止写文件 - 我只想删除最旧的行(在文件的顶部)。 My data is written randomly as packets arrive from clients (think web logging). 当数据包从客户端到达时,我的数据会随机写入(想想网络日志记录)。

I know it works but would it be in my best interest to pull this off by using a combination of File.tell() and then executing the following system command if my file exceeds the limit? 我知道它有效,但是如果我的文件超出限制,那么通过使用File.tell()的组合然后执行以下系统命令来解决这个问题是否符合我的最佳利益?

sed -i '1 d' filename 

Once it reaches the size limit, it would execute sed everytime. 一旦达到大小限制,它将每次执行sed。 Is there a better way? 有没有更好的办法?

There is a reason that no logging system uses this strategy. 没有记录系统使用此策略是有原因的。 You can't remove the first line from a file without rewriting the whole file, so it's extremely slow on a large file. 如果不重写整个文件,则无法从文件中删除第一行,因此在大文件上速度非常慢。 Also, you can't write new data to the file while you're rewriting it. 此外,在重写文件时,无法将新数据写入文件。

The normal strategy is to start writing to a new file when the current one becomes too big. 正常的策略是在当前文件变得太大时开始写入新文件。 You can then delete files that are older than a threshold. 然后,您可以删除早于阈值的文件。 This is the "log rotation" that others have mentioned. 这是其他人提到的“日志轮换”。

If you really want to create a queue where you remove one line of data as you add a new one, I'd suggest using a database instead. 如果您真的想要创建一个队列,在添加新数据时删除一行数据,我建议使用数据库。 MongoDB and other database managers supports arrays, but you could do something similar with an SQL database if required. MongoDB和其他数据库管理器支持数组,但如果需要,您可以使用SQL数据库执行类似操作。

Its seems that you are unaware of logrotate . 它似乎你不知道logrotate You are looking for similar implementation. 您正在寻找类似的实现。 Check this out: 看一下这个:

The reason Python's logging module does not use this strategy is because of the performance penalty it entails. Python的日志记录模块不使用此策略的原因是它需要的性能损失。 If log files rotated according to size or age simply are not acceptable, then as I see it you have two basic choices: overwrite the log file in place, and write a temp file then replace. 如果根据大小或年龄轮换的日志文件根本不可接受,那么我认为你有两个基本选择:覆盖日志文件,然后写一个临时文件然后替换。

If overwriting the log file in place, you would first choose the integer address in the file (position of first \\n byte plus one maybe) that will become the 'new zero' (call it X). 如果在适当的位置覆盖日志文件,您首先要选择文件中的整数地址(第一个\\ n字节加一个的位置),它将成为“新零”(称之为X)。 Then choose a block size, maybe 32K. 然后选择块大小,可能是32K。 Then start counting. 然后开始计数。 Seek to X + block size * block number, read one block. 寻找X +块大小*块号,读取一个块。 Seek to block size * block number, write the block back. 寻求阻止大小*块号,写回块。 When you reach EOF when reading, truncate the file to length block size * block number. 当您在读取时到达EOF时,将文件截断为长度块大小*块编号。

If using a temp file, find the 'new zero', copy the remainder of the file to a temp file, then rename it to the original name. 如果使用临时文件,请找到“新零”,将文件的其余部分复制到临时文件,然后将其重命名为原始名称。 Easier than the above I guess, easier to explain anyway, but uses more space. 比我想象的更容易,无论如何更容易解释,但使用更多的空间。

Following all that, write the new data and close the file. 然后,编写新数据并关闭文件。 This whole procedure has to happen for every log message. 每个日志消息都必须执行整个过程。 Good luck! 祝好运!

You should checkout the Python logging module and more specifically the class RotatingFileHandler . 您应该检查Python日志记录模块 ,更具体地说是RotatingFileHandler类。 This allows you to write to a file which will have a fixed size. 这允许您写入具有固定大小的文件。 It doesn't however allow to operate on the number of lines. 但是它不允许对行数进行操作。

Unless you need near real time access to the file from another process, I would probably write each log line to a collections.deque of a fixed size. 除非您需要从另一个进程实时访问该文件,否则我可能会将每个日志行写入一个固定大小的collections.deque You could implement a method that would sync the items (rows) from the collections.deque to lines in a log file on demand. 您可以实现一种方法,可以根据需要将collections.deque的项(行)同步到日志文件中的行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM