简体   繁体   中英

Python script that writes result to txt file - why the lag?

I'm using Windows 7 and I have a super-simple script that goes over a directory of images, checking a specified condition for each image (in my case, whether there's a face in the image, using dlib), while writing the paths of images that fulfilled the condition to a text file:

def process_dir(dir_path):
    i = 0
    with open(txt_output, 'a') as f:
        for filename in os.listdir(dir_path):
            # loading image to check whether dlib detects a face:
            image_path = os.path.join(dir_path, filename)
            opencv_img = cv2.imread(image_path)
            dets = detector(opencv_img, 1)
            if len(dets) > 0 : 
                f.write(image_path)
                f.write('\n')
                i = i + 1
                print i

Now the following thing happens: there seems to be a significant lag in appending lines to files. For example, I can see the script has "finished" checking several images (ie, the console prints ~20, meaning 20 files who fulfill the condition have been found) but the .txt file is still empty. At first I thought there was a problem with my script, but after waiting a while I saw that they were in fact added to the file, only it seems to be updated in "batches".

This may not seem like the most crucial issue (and it's definitely not), but still I'm wondering - what explains this behavior? As far as I understand, every time the f.write(image_path) line is executed the file is changed - then why do I see the update with a lag?

您是否尝试使用buffersize 0,open(txt_output,'a',0)。

I'm, not sure about Windows (please, someone correct me here if I'm wrong), but I believe this is because of how the write buffer is handled. Although you are requesting a write, the buffer only writes every so often (when the buffer is full), and when the file is closed. You can open the file with a smaller buffer:

with open(txt_output, 'a', 0) as f:

or manually flush it at the end of the loop:

if len(dets) > 0 : 
    f.write(image_path)
    f.write('\n')
    f.flush()
    i = i + 1
    print i

I would personally recommend flushing manually when you need to.

Data written to a file object won't necessarily show up on disk immediately.

In the interests of efficiency, most operating systems will buffer the writes, meaning that data is only written out to disk when a certain amount has accumulated (usually 4K).

If you want to write your data right now , use the flush() function, as others have said.

It sounds like you're running into file stream buffering.

In short, writing to a file is a very slow process (relative to other sorts of things that the processor does). Modifying the hard disk is about the slowest thing you can do, other than maybe printing to the screen.

Because of this, most file I/O libraries will "buffer" your output, meaning that as you write to the file the library will save your data in an in-memory buffer instead of modifying the hard disk right away. Only when the buffer fills up will it "flush" the buffer (write the data to disk), after which point it starts filling the buffer again. This often reduces the number of actual write operations by quite a lot.

To answer your question, the first question to answer is, do really need to append to the file immediately every time you find a face? It will probably slow down your processing by a noticeable amount, especially if you're processing a large number of files.

If you really do need to update immediately, you basically have two options:

  1. Manually flush the write buffer each time you write to the file. In Python, this usually means calling f.flush() , as @JamieCounsell pointed out.
  2. Tell Python to just not use a buffer, or more accurately to use a buffer of size 0. As @VikasMadhusudana pointed out, you can tell Python how big of a buffer to use with a third argument to open() : open(txt_output, 'a', 0) for a 0-byte buffer.

Again, you probably don't need this; the only case I can think that might require this sort of thing is if you have some other external operation that's watching the file and triggers off of new data being added to it.

Hope that helps!

It's flush related, try:

print(image_path, file=f)  # Python 3

or

print >>f, image_page  # Python 2

instead of:

f.write(image_path)
f.write('\n')

print flushes.

another good thing about print is it gives you the newline for free.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM