简体   繁体   中英

Forcing os.walk to stop if taking too long time

I want to find all files in a directory tree with a given file extension. However, some folders are really large and I therefore want to stop this process if it takes too long time (say 1 second). My current code looks something like this:

import os
import time

start_time = time.time()
file_ext = '.txt'
path = 'C:/'
file_list = []
for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith(file_ext):
            relDir = os.path.relpath(root, path)
            relFile = os.path.join(relDir, file)
            file_list.append(relFile)
        if time.time() - start_time> 1:
            break
    if time.time() - start_time> 1:
        break

The problem with this code is that when I get to a really large subfolder, this code does not break until that folder has been completely traversed. If that folder contains many files, it might take much longer time than I would like. Is there any way I can make sure that the code does not run for much longer than the allotted time?

Edit: Note that while it is certainly helpful to find ways to speed up the code (for instance by using os.scandir), this question deals primarily with how to kill a process that is running.

You can do the walk in a subprocess and kill that. Options include multiprocessing.Process but the multiprocessing libs on Windows may need to do a fair amount of work that you don't need. Instead, you can just pipe the walker code into a python subprocess and go from there.

import os
import sys
import threading
import subprocess as subp

walker_script = """
import os
import sys
path = os.environ['TESTPATH']
file_ext = os.environ['TESTFILEEXT']

# let parent know we are going
print('started')

for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith(file_ext):
            relDir = os.path.relpath(root, path)
            relFile = os.path.join(relDir, file)
            print(relFile)
"""

file_ext = '.txt'
path = 'C:/'

encoding = sys.getdefaultencoding()

# subprocess reads directories... additional python flags seek to
# speed python initialization. If a linuxy system, forking would
# be a good option.

env = {'TESTPATH':path, 'TESTFILEEXT':file_ext}
env.update(os.environ)
proc = subp.Popen([sys.executable, '-E', '-s', '-S', '-'], stdin=subp.PIPE,
    stdout=subp.PIPE,      # , stderr=open(os.devnull, 'wb'))
    env = env)

# write walker script
proc.stdin.write(walker_script.encode('utf-8'))
proc.stdin.close()

# wait for start marker
next(proc.stdout)

# timer kills directory traversal when bored
threading.Timer(1, proc.kill).start()

file_list = [line.decode(encoding).strip() for line in proc.stdout]
print(file_list)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM