[英]Garbage in file after truncate(0) in Python
Assume there is a file test.txt
containing a string 'test'
.假设有一个文件
test.txt
包含一个字符串'test'
。
Now, consider the following Python code:现在,考虑以下 Python 代码:
f = open('test', 'r+')
f.read()
f.truncate(0)
f.write('passed')
f.flush();
Now I expect test.txt
to contain 'passed'
now, however there are additionally some strange symbols!现在我希望
test.txt
现在包含'passed'
,但是还有一些奇怪的符号!
Update: flush after truncate does not help.更新:截断后刷新无济于事。
Yeah, that's true that truncate()
doesn't move the position, but said that, is simple as death: 是的,
truncate()
确实没有移动位置,但是说,就像死亡一样简单:
f.read()
f.seek(0)
f.truncate(0)
f.close()
this is perfectly working ;) 这是完美的工作;)
This is because truncate doesn't change the stream position. 这是因为truncate不会改变流的位置。
When you read()
the file, you move the position to the end. 当您
read()
文件时,将位置移动到最后。 So successive write
s will write to file from that position. 因此,连续
write
将从该位置写入文件。 However, when you call flush()
, it seems not only it tries to write the buffer to the file, but also does some error checking and fixes the current file position. 但是,当您调用
flush()
,它似乎不仅尝试将缓冲区写入文件,而且还会执行一些错误检查并修复当前文件位置。 When Flush()
is called after the truncate(0)
, writes nothing (buffer is empty), then checks the file size and places the position at the first applicable place (which is 0
). 在
truncate(0)
之后调用Flush()
,不写入任何内容(缓冲区为空),然后检查文件大小并将位置放在第一个适用的位置(即0
)。
UPDATE UPDATE
Python's file function are NOT just wrappers around the C standard library equivalents, but knowing the C functions helps knowing what is happening more precisely. Python的文件函数不仅仅是C标准库等价物的包装,但了解C函数有助于更准确地了解正在发生的事情。
From the ftruncate man page : 从ftruncate手册页 :
The value of the seek pointer is not modified by a call to ftruncate().
调用ftruncate()不会修改查找指针的值。
From the fflush man page : 从fflush手册页 :
If stream points to an input stream or an update stream into which the most recent operation was input, that stream is flushed if it is seekable and is not already at end-of-file.
如果流指向输入最新操作的输入流或更新流,则如果该流是可搜索的并且尚未在文件结尾处,则刷新该流。 Flushing an input stream discards any buffered input and adjusts the file pointer such that the next input operation accesses the byte after the last one read.
刷新输入流会丢弃任何缓冲的输入并调整文件指针,以便下一个输入操作在最后一次读取后访问该字节。
This means if you put flush
before truncate
it has no effect. 这意味着如果在
truncate
之前放置flush
,它就没有效果。 I checked and it was so. 我查了一下就是这样。
But for putting flush
after truncate
: 但对于把
flush
后truncate
:
If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() causes any unwritten data for that stream to be written to the file, and the st_ctime and st_mtime fields of the underlying file are marked for update.
如果stream指向输入流或未输入最新操作的更新流,则fflush()会将该流的任何未写入数据写入该文件,并标记基础文件的st_ctime和st_mtime字段更新。
The man page doesn't mention the seek pointer when explaining output streams with last operation not being input. 在解释输出流时没有输入最后一个操作的手册页没有提到搜索指针。 (Here our last operation is
truncate
) (这里我们的最后一个操作是
truncate
)
UPDATE 2 更新2
I found something in python source code: Python-3.2.2\\Modules\\_io\\fileio.c:837
我在python源代码中找到了一些东西:
Python-3.2.2\\Modules\\_io\\fileio.c:837
#ifdef HAVE_FTRUNCATE
static PyObject *
fileio_truncate(fileio *self, PyObject *args)
{
PyObject *posobj = NULL; /* the new size wanted by the user */
#ifndef MS_WINDOWS
Py_off_t pos;
#endif
...
#ifdef MS_WINDOWS
/* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
so don't even try using it. */
{
PyObject *oldposobj, *tempposobj;
HANDLE hFile;
////// THIS LINE //////////////////////////////////////////////////////////////
/* we save the file pointer position */
oldposobj = portable_lseek(fd, NULL, 1);
if (oldposobj == NULL) {
Py_DECREF(posobj);
return NULL;
}
/* we then move to the truncation position */
...
/* Truncate. Note that this may grow the file! */
...
////// AND THIS LINE //////////////////////////////////////////////////////////
/* we restore the file pointer position in any case */
tempposobj = portable_lseek(fd, oldposobj, 0);
Py_DECREF(oldposobj);
if (tempposobj == NULL) {
Py_DECREF(posobj);
return NULL;
}
Py_DECREF(tempposobj);
}
#else
...
#endif /* HAVE_FTRUNCATE */
Look at the two lines I indicated ( ///// This Line /////
). 看看我指出的两行(
///// This Line /////
)。 If your platform is Windows, then it's saving the position and returning it back after the truncate. 如果您的平台是Windows,那么它将保存位置并在截断后将其返回。
To my surprise, most of the flush
functions inside the Python 3.2.2 functions either did nothing or did not call fflush
C function at all. 令我惊讶的是,Python 3.2.2函数中的大多数
flush
函数都没有做任何事情或根本没有调用fflush
C函数。 The 3.2.2 truncate part was also very undocumented. 3.2.2截断部分也非常无证。 However, I did find something interesting in Python 2.7.2 sources.
但是,我确实在Python 2.7.2源代码中找到了一些有趣的东西。 First, I found this in
Python-2.7.2\\Objects\\fileobject.c:812
in truncate
implementation: 首先,我在
truncate
实现中的Python-2.7.2\\Objects\\fileobject.c:812
中找到了这个:
/* Get current file position. If the file happens to be open for
* update and the last operation was an input operation, C doesn't
* define what the later fflush() will do, but we promise truncate()
* won't change the current position (and fflush() *does* change it
* then at least on Windows). The easiest thing is to capture
* current pos now and seek back to it at the end.
*/
So to summarize all, I think this is a fully platform dependent thing. 总而言之,我认为这是一个完全依赖平台的事情。 I checked on default Python 3.2.2 for Windows x64 and got the same results as you.
我检查了默认的Python 3.2.2 for Windows x64并得到了与您相同的结果。 Don't know what happens on *nixes.
不知道* nixes会发生什么。
Truncate doesn't change the file position. 截断不会更改文件位置。
Note also that even if the file is opened in read+write you cannot just switch between the two types of operation (a seek operation is required to be able to switch from read to write or vice versa). 另请注意,即使文件以读+写方式打开,您也不能只在两种类型的操作之间切换(需要搜索操作才能从读取切换到写入,反之亦然)。
If anyone is in the same boat as mine, here is my problem with solution: 如果有人和我一样在同一条船上,这是解决方案的问题:
Below is an illustration on how i solved this issue. 下面是我如何解决这个问题的说明。
f1 = open('client.log','w')
nowTime = datetime.datetime.now().time()
f1.write(os.urandom(1024*1024*15)) #Adding random values worth 15 MB
if (int(os.path.getsize('client.log') / 1048576) > 10): #checking if file size is 10 MB and above
print 'File size limit Exceeded, needs trimming'
dst = 'client_'+ str(randint(0, 999999)) + '.log'
copyfile('client.log', dst) #Copying file to another one
print 'Copied content to ' + str(dst)
print 'Erasing current file'
f1.truncate(0) #Truncating data, this works fine but puts the counter at the last
f1.seek(0) #very important to use after truncate so that new data begins from 0
print 'File truncated successfully'
f1.write('This is fresh content') #Dummy content
f1.close()
print 'All Job Processed'
I expect the following is the code you meant to write: 我希望以下是您打算编写的代码:
open('test.txt').read()
open('test.txt', 'w').write('passed')
It depends. 这取决于。 If you want to keep the file open and access it without closing it then flush will force writing to the file.
如果要保持文件打开并在不关闭文件的情况下访问它,则flush将强制写入文件。 If you're closing the file right after flush then no you don't need it because close will flush for you.
如果你在冲洗后立即关闭文件,那么你不需要它,因为close会为你冲洗。 That's my understanding from the docs
这是我对文档的理解
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.