[英]python threading.timer set time limit when program runs out of time
I have some questions related to setting the maximum running time of a function in Python. 我有一些问题与设置Python中函数的最大运行时间有关。 In fact, I would like to use
pdfminer
to convert the .pdf
files to .txt
. 实际上,我想使用
pdfminer
将.pdf
文件转换为.txt
。
The problem is that very often, some files are not possible to decode and take extremely long time. 问题是很多时候,有些文件无法解码,并且需要很长时间。 So I want to set
threading.Timer()
to limit the conversion time for each file to 5 seconds. 因此,我想设置
threading.Timer()
以将每个文件的转换时间限制为5秒。 In addition, I run under windows so I cannot use the signal
module for this. 此外,我在Windows下运行,因此无法为此使用
signal
模块。
I succeeded in running the conversion code with pdfminer.convert_pdf_to_txt()
(in my code it is " c
"), but I am not sure that the in the following code, threading.Timer()
works. 我使用
pdfminer.convert_pdf_to_txt()
成功运行了转换代码(在我的代码中为“ c
”),但是我不确定以下代码中的threading.Timer()
有效。 (I don't think it properly constrains the time for each processing) (我认为这不会适当限制每次处理的时间)
In summary, I want to: 总而言之,我想:
Convert the pdf to txt 将pdf转换为txt
Time limit for each conversion is 5 sec, if it runs out of time, throw an exception and save an empty file 每次转换的时间限制为5秒,如果时间用完,则抛出异常并保存一个空文件
Save all the txt files under the same folder 将所有txt文件保存在同一文件夹下
If there are any exceptions/errors, still save the file but with empty content. 如果有任何异常/错误,请仍然保存文件但内容为空。
Here is the current code: 这是当前代码:
import converter as c
import os
import timeit
import time
import threading
import thread
yourpath = 'D:/hh/'
def iftimesout():
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
timer = threading.Timer(5.0,iftimesout)
timer.start()
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
print("yes")
timer.cancel()
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
I finally figured it out! 我终于想通了!
First of all, define a function to call another function with a limited timeout: 首先,定义一个函数以在有限的超时时间内调用另一个函数:
import multiprocessing
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
What does the function do? 该功能做什么?
p.join()
) and allow the function to be executed in this time p.join()
)并允许函数在此时执行 After the timeout expires, check if the function is still running 超时到期后,检查功能是否仍在运行
False
False
True
True
We can test it with time.sleep()
: 我们可以用
time.sleep()
测试它:
import time
finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
print("No timeout")
else:
print("Timeout")
We run a function which needs one second to finish, timeout is set to two seconds: 我们运行一个需要一秒钟才能完成的功能,超时设置为两秒钟:
No timeout
If we run time.sleep(10)
and set the timeout to two seconds: 如果运行
time.sleep(10)
并将超时设置为两秒:
finished = call_timeout(2, time.sleep, args=(10, ))
Result: 结果:
Timeout
Notice the program stops after two seconds without finishing the called function. 请注意,程序在两秒后停止,没有完成所调用的函数。
Your final code will look like this: 您的最终代码将如下所示:
import converter as c
import os
import timeit
import time
import multiprocessing
yourpath = 'D:/hh/'
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
def convert(root, name, g, t):
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
finished = call_timeout(5, convert, args=(root, name, g, t))
if finished:
print("yes")
else:
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
The code should be easy to understand, if not, feel free to ask. 该代码应该易于理解,如果不是,请随时提出。
I really hope this helps ( as it took some time for us to get it right ;) )! 我真的希望这对您有所帮助( 因为我们花了一些时间才把它弄好;) )!
Check following code and let me know in case of any issues. 检查以下代码,如有任何问题,请通知我。 Also let me know whether you still want to use force termination feature (
KeyboardInterruption
) 也让我知道您是否仍要使用强制终止功能(
KeyboardInterruption
)
path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5 # seconds
TIME_TO_CHECK = 1 # seconds
# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
my_txt = text_file_name(my_pdf)
with open(my_txt, "w") as my_text_file:
try:
my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
except:
print "Error. %s file wasn't converted" % my_pdf
# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
return path_to_text + (pdf_file.split('.')[0]+ ".txt")
if __name__ == "__main__":
# for each pdf file in PDF folder
for root, dirs, files in os.walk(path_to_pdf, topdown=False):
for my_file in files:
count = 0
p = Process(target=convert, args=(root, my_file,))
p.start()
# some delay to be sure that text file created
while not os.path.isfile(text_file_name(my_file)):
time.sleep(0.001)
while True:
# if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
# else: close file and start new iteration
if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
count += TIME_TO_CHECK
time.sleep(TIME_TO_CHECK)
else:
p.terminate()
break
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.