[英]python threading.timer set time limit when program runs out of time
我有一些問題與設置Python中函數的最大運行時間有關。 實際上,我想使用pdfminer
將.pdf
文件轉換為.txt
。
問題是很多時候,有些文件無法解碼,並且需要很長時間。 因此,我想設置threading.Timer()
以將每個文件的轉換時間限制為5秒。 此外,我在Windows下運行,因此無法為此使用signal
模塊。
我使用pdfminer.convert_pdf_to_txt()
成功運行了轉換代碼(在我的代碼中為“ c
”),但是我不確定以下代碼中的threading.Timer()
有效。 (我認為這不會適當限制每次處理的時間)
總而言之,我想:
將pdf轉換為txt
每次轉換的時間限制為5秒,如果時間用完,則拋出異常並保存一個空文件
將所有txt文件保存在同一文件夾下
如果有任何異常/錯誤,請仍然保存文件但內容為空。
這是當前代碼:
import converter as c
import os
import timeit
import time
import threading
import thread
yourpath = 'D:/hh/'
def iftimesout():
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
timer = threading.Timer(5.0,iftimesout)
timer.start()
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
print("yes")
timer.cancel()
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
我終於想通了!
首先,定義一個函數以在有限的超時時間內調用另一個函數:
import multiprocessing
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
該功能做什么?
p.join()
)並允許函數在此時執行 超時到期后,檢查功能是否仍在運行
False
True
我們可以用time.sleep()
測試它:
import time
finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
print("No timeout")
else:
print("Timeout")
我們運行一個需要一秒鍾才能完成的功能,超時設置為兩秒鍾:
No timeout
如果運行time.sleep(10)
並將超時設置為兩秒:
finished = call_timeout(2, time.sleep, args=(10, ))
結果:
Timeout
請注意,程序在兩秒后停止,沒有完成所調用的函數。
您的最終代碼將如下所示:
import converter as c
import os
import timeit
import time
import multiprocessing
yourpath = 'D:/hh/'
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
def convert(root, name, g, t):
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
finished = call_timeout(5, convert, args=(root, name, g, t))
if finished:
print("yes")
else:
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
該代碼應該易於理解,如果不是,請隨時提出。
我真的希望這對您有所幫助( 因為我們花了一些時間才把它弄好;) )!
檢查以下代碼,如有任何問題,請通知我。 也讓我知道您是否仍要使用強制終止功能( KeyboardInterruption
)
path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5 # seconds
TIME_TO_CHECK = 1 # seconds
# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
my_txt = text_file_name(my_pdf)
with open(my_txt, "w") as my_text_file:
try:
my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
except:
print "Error. %s file wasn't converted" % my_pdf
# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
return path_to_text + (pdf_file.split('.')[0]+ ".txt")
if __name__ == "__main__":
# for each pdf file in PDF folder
for root, dirs, files in os.walk(path_to_pdf, topdown=False):
for my_file in files:
count = 0
p = Process(target=convert, args=(root, my_file,))
p.start()
# some delay to be sure that text file created
while not os.path.isfile(text_file_name(my_file)):
time.sleep(0.001)
while True:
# if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
# else: close file and start new iteration
if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
count += TIME_TO_CHECK
time.sleep(TIME_TO_CHECK)
else:
p.terminate()
break
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.