简体   繁体   English

Python 中截断(0)后文件中的垃圾

[英]Garbage in file after truncate(0) in Python

Assume there is a file test.txt containing a string 'test' .假设有一个文件test.txt包含一个字符串'test'

Now, consider the following Python code:现在,考虑以下 Python 代码:

f = open('test', 'r+')
f.read()
f.truncate(0)
f.write('passed')
f.flush();

Now I expect test.txt to contain 'passed' now, however there are additionally some strange symbols!现在我希望test.txt现在包含'passed' ,但是还有一些奇怪的符号!

Update: flush after truncate does not help.更新:截断后刷新无济于事。

Yeah, that's true that truncate() doesn't move the position, but said that, is simple as death: 是的, truncate()确实没有移动位置,但是说,就像死亡一样简单:

f.read()
f.seek(0)
f.truncate(0)
f.close()

this is perfectly working ;) 这是完美的工作;)

This is because truncate doesn't change the stream position. 这是因为truncate不会改变流的位置。

When you read() the file, you move the position to the end. 当您read()文件时,将位置移动到最后。 So successive write s will write to file from that position. 因此,连续write将从该位置写入文件。 However, when you call flush() , it seems not only it tries to write the buffer to the file, but also does some error checking and fixes the current file position. 但是,当您调用flush() ,它似乎不仅尝试将缓冲区写入文件,而且还会执行一些错误检查并修复当前文件位置。 When Flush() is called after the truncate(0) , writes nothing (buffer is empty), then checks the file size and places the position at the first applicable place (which is 0 ). truncate(0)之后调用Flush() ,不写入任何内容(缓冲区为空),然后检查文件大小并将位置放在第一个适用的位置(即0 )。

UPDATE UPDATE

Python's file function are NOT just wrappers around the C standard library equivalents, but knowing the C functions helps knowing what is happening more precisely. Python的文件函数不仅仅是C标准库等价物的包装,但了解C函数有助于更准确地了解正在发生的事情。

From the ftruncate man page : ftruncate手册页

The value of the seek pointer is not modified by a call to ftruncate(). 调用ftruncate()不会修改查找指针的值。

From the fflush man page : fflush手册页

If stream points to an input stream or an update stream into which the most recent operation was input, that stream is flushed if it is seekable and is not already at end-of-file. 如果流指向输入最新操作的输入流或更新流,则如果该流是可搜索的并且尚未在文件结尾处,则刷新该流。 Flushing an input stream discards any buffered input and adjusts the file pointer such that the next input operation accesses the byte after the last one read. 刷新输入流会丢弃任何缓冲的输入并调整文件指针,以便下一个输入操作在最后一次读取后访问该字节。

This means if you put flush before truncate it has no effect. 这意味着如果在truncate之前放置flush ,它就没有效果。 I checked and it was so. 我查了一下就是这样。

But for putting flush after truncate : 但对于把flushtruncate

If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() causes any unwritten data for that stream to be written to the file, and the st_ctime and st_mtime fields of the underlying file are marked for update. 如果stream指向输入流或未输入最新操作的更新流,则fflush()会将该流的任何未写入数据写入该文件,并标记基础文件的st_ctime和st_mtime字段更新。

The man page doesn't mention the seek pointer when explaining output streams with last operation not being input. 在解释输出流时没有输入最后一个操作的手册页没有提到搜索指针。 (Here our last operation is truncate ) (这里我们的最后一个操作是truncate

UPDATE 2 更新2

I found something in python source code: Python-3.2.2\\Modules\\_io\\fileio.c:837 我在python源代码中找到了一些东西: Python-3.2.2\\Modules\\_io\\fileio.c:837

#ifdef HAVE_FTRUNCATE
static PyObject *
fileio_truncate(fileio *self, PyObject *args)
{
    PyObject *posobj = NULL; /* the new size wanted by the user */
#ifndef MS_WINDOWS
    Py_off_t pos;
#endif

...

#ifdef MS_WINDOWS
    /* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
       so don't even try using it. */
    {
        PyObject *oldposobj, *tempposobj;
        HANDLE hFile;

////// THIS LINE //////////////////////////////////////////////////////////////
        /* we save the file pointer position */
        oldposobj = portable_lseek(fd, NULL, 1);
        if (oldposobj == NULL) {
            Py_DECREF(posobj);
            return NULL;
        }

        /* we then move to the truncation position */
        ...

        /* Truncate.  Note that this may grow the file! */
        ...

////// AND THIS LINE //////////////////////////////////////////////////////////
        /* we restore the file pointer position in any case */
        tempposobj = portable_lseek(fd, oldposobj, 0);
        Py_DECREF(oldposobj);
        if (tempposobj == NULL) {
            Py_DECREF(posobj);
            return NULL;
        }
        Py_DECREF(tempposobj);
    }
#else

...

#endif /* HAVE_FTRUNCATE */

Look at the two lines I indicated ( ///// This Line ///// ). 看看我指出的两行( ///// This Line ///// )。 If your platform is Windows, then it's saving the position and returning it back after the truncate. 如果您的平台是Windows,那么它将保存位置并在截断后将其返回。

To my surprise, most of the flush functions inside the Python 3.2.2 functions either did nothing or did not call fflush C function at all. 令我惊讶的是,Python 3.2.2函数中的大多数flush函数都没有做任何事情或根本没有调用fflush C函数。 The 3.2.2 truncate part was also very undocumented. 3.2.2截断部分也非常无证。 However, I did find something interesting in Python 2.7.2 sources. 但是,我确实在Python 2.7.2源代码中找到了一些有趣的东西。 First, I found this in Python-2.7.2\\Objects\\fileobject.c:812 in truncate implementation: 首先,我在truncate实现中的Python-2.7.2\\Objects\\fileobject.c:812中找到了这个:

 /* Get current file position.  If the file happens to be open for
 * update and the last operation was an input operation, C doesn't
 * define what the later fflush() will do, but we promise truncate()
 * won't change the current position (and fflush() *does* change it
 * then at least on Windows).  The easiest thing is to capture
 * current pos now and seek back to it at the end.
 */

So to summarize all, I think this is a fully platform dependent thing. 总而言之,我认为这是一个完全依赖平台的事情。 I checked on default Python 3.2.2 for Windows x64 and got the same results as you. 我检查了默认的Python 3.2.2 for Windows x64并得到了与您相同的结果。 Don't know what happens on *nixes. 不知道* nixes会发生什么。

Truncate doesn't change the file position. 截断不会更改文件位置。

Note also that even if the file is opened in read+write you cannot just switch between the two types of operation (a seek operation is required to be able to switch from read to write or vice versa). 另请注意,即使文件以读+写方式打开,您也不能只在两种类型的操作之间切换(需要搜索操作才能从读取切换到写入,反之亦然)。

If anyone is in the same boat as mine, here is my problem with solution: 如果有人和我一样在同一条船上,这是解决方案的问题:

  • I have a program that is always ON ie it doesn't stop, keeps on polling the data and writes to a log file 我有一个始终打开的程序,即它不会停止,继续轮​​询数据并写入日志文件
  • The problem is, i want to split the main file as soon as it reaches the 10 MB mark, therefore, i wrote the below program. 问题是,我想在主文件达到10 MB标记时立即拆分,因此,我编写了以下程序。
  • I found the solution as well to the problem, where truncate was writing null values to the file causing further problem. 我找到了问题的解决方案,其中truncate将空值写入文件导致进一步的问题。

Below is an illustration on how i solved this issue. 下面是我如何解决这个问题的说明。

f1 = open('client.log','w')
nowTime = datetime.datetime.now().time() 
f1.write(os.urandom(1024*1024*15)) #Adding random values worth 15 MB
if (int(os.path.getsize('client.log') / 1048576) > 10): #checking if file size is 10 MB and above
    print 'File size limit Exceeded, needs trimming'
    dst = 'client_'+ str(randint(0, 999999)) + '.log'       
    copyfile('client.log', dst) #Copying file to another one
    print 'Copied content to ' + str(dst)
    print 'Erasing current file'
    f1.truncate(0) #Truncating data, this works fine but puts the counter at the last 
    f1.seek(0)  #very important to use after truncate so that new data begins from 0 
    print 'File truncated successfully'
    f1.write('This is fresh content') #Dummy content
f1.close()  
print 'All Job Processed'

I expect the following is the code you meant to write: 我希望以下是您打算编写的代码:

open('test.txt').read()
open('test.txt', 'w').write('passed')

It depends. 这取决于。 If you want to keep the file open and access it without closing it then flush will force writing to the file. 如果要保持文件打开并在不关闭文件的情况下访问它,则flush将强制写入文件。 If you're closing the file right after flush then no you don't need it because close will flush for you. 如果你在冲洗后立即关闭文件,那么你不需要它,因为close会为你冲洗。 That's my understanding from the docs 这是我对文档的理解

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM