简体   繁体   中英

Garbage in file after truncate(0) in Python

Assume there is a file test.txt containing a string 'test' .

Now, consider the following Python code:

f = open('test', 'r+')
f.read()
f.truncate(0)
f.write('passed')
f.flush();

Now I expect test.txt to contain 'passed' now, however there are additionally some strange symbols!

Update: flush after truncate does not help.

Yeah, that's true that truncate() doesn't move the position, but said that, is simple as death:

f.read()
f.seek(0)
f.truncate(0)
f.close()

this is perfectly working ;)

This is because truncate doesn't change the stream position.

When you read() the file, you move the position to the end. So successive write s will write to file from that position. However, when you call flush() , it seems not only it tries to write the buffer to the file, but also does some error checking and fixes the current file position. When Flush() is called after the truncate(0) , writes nothing (buffer is empty), then checks the file size and places the position at the first applicable place (which is 0 ).

UPDATE

Python's file function are NOT just wrappers around the C standard library equivalents, but knowing the C functions helps knowing what is happening more precisely.

From the ftruncate man page :

The value of the seek pointer is not modified by a call to ftruncate().

From the fflush man page :

If stream points to an input stream or an update stream into which the most recent operation was input, that stream is flushed if it is seekable and is not already at end-of-file. Flushing an input stream discards any buffered input and adjusts the file pointer such that the next input operation accesses the byte after the last one read.

This means if you put flush before truncate it has no effect. I checked and it was so.

But for putting flush after truncate :

If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() causes any unwritten data for that stream to be written to the file, and the st_ctime and st_mtime fields of the underlying file are marked for update.

The man page doesn't mention the seek pointer when explaining output streams with last operation not being input. (Here our last operation is truncate )

UPDATE 2

I found something in python source code: Python-3.2.2\\Modules\\_io\\fileio.c:837

#ifdef HAVE_FTRUNCATE
static PyObject *
fileio_truncate(fileio *self, PyObject *args)
{
    PyObject *posobj = NULL; /* the new size wanted by the user */
#ifndef MS_WINDOWS
    Py_off_t pos;
#endif

...

#ifdef MS_WINDOWS
    /* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
       so don't even try using it. */
    {
        PyObject *oldposobj, *tempposobj;
        HANDLE hFile;

////// THIS LINE //////////////////////////////////////////////////////////////
        /* we save the file pointer position */
        oldposobj = portable_lseek(fd, NULL, 1);
        if (oldposobj == NULL) {
            Py_DECREF(posobj);
            return NULL;
        }

        /* we then move to the truncation position */
        ...

        /* Truncate.  Note that this may grow the file! */
        ...

////// AND THIS LINE //////////////////////////////////////////////////////////
        /* we restore the file pointer position in any case */
        tempposobj = portable_lseek(fd, oldposobj, 0);
        Py_DECREF(oldposobj);
        if (tempposobj == NULL) {
            Py_DECREF(posobj);
            return NULL;
        }
        Py_DECREF(tempposobj);
    }
#else

...

#endif /* HAVE_FTRUNCATE */

Look at the two lines I indicated ( ///// This Line ///// ). If your platform is Windows, then it's saving the position and returning it back after the truncate.

To my surprise, most of the flush functions inside the Python 3.2.2 functions either did nothing or did not call fflush C function at all. The 3.2.2 truncate part was also very undocumented. However, I did find something interesting in Python 2.7.2 sources. First, I found this in Python-2.7.2\\Objects\\fileobject.c:812 in truncate implementation:

 /* Get current file position.  If the file happens to be open for
 * update and the last operation was an input operation, C doesn't
 * define what the later fflush() will do, but we promise truncate()
 * won't change the current position (and fflush() *does* change it
 * then at least on Windows).  The easiest thing is to capture
 * current pos now and seek back to it at the end.
 */

So to summarize all, I think this is a fully platform dependent thing. I checked on default Python 3.2.2 for Windows x64 and got the same results as you. Don't know what happens on *nixes.

Truncate doesn't change the file position.

Note also that even if the file is opened in read+write you cannot just switch between the two types of operation (a seek operation is required to be able to switch from read to write or vice versa).

If anyone is in the same boat as mine, here is my problem with solution:

  • I have a program that is always ON ie it doesn't stop, keeps on polling the data and writes to a log file
  • The problem is, i want to split the main file as soon as it reaches the 10 MB mark, therefore, i wrote the below program.
  • I found the solution as well to the problem, where truncate was writing null values to the file causing further problem.

Below is an illustration on how i solved this issue.

f1 = open('client.log','w')
nowTime = datetime.datetime.now().time() 
f1.write(os.urandom(1024*1024*15)) #Adding random values worth 15 MB
if (int(os.path.getsize('client.log') / 1048576) > 10): #checking if file size is 10 MB and above
    print 'File size limit Exceeded, needs trimming'
    dst = 'client_'+ str(randint(0, 999999)) + '.log'       
    copyfile('client.log', dst) #Copying file to another one
    print 'Copied content to ' + str(dst)
    print 'Erasing current file'
    f1.truncate(0) #Truncating data, this works fine but puts the counter at the last 
    f1.seek(0)  #very important to use after truncate so that new data begins from 0 
    print 'File truncated successfully'
    f1.write('This is fresh content') #Dummy content
f1.close()  
print 'All Job Processed'

I expect the following is the code you meant to write:

open('test.txt').read()
open('test.txt', 'w').write('passed')

It depends. If you want to keep the file open and access it without closing it then flush will force writing to the file. If you're closing the file right after flush then no you don't need it because close will flush for you. That's my understanding from the docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM