简体   繁体   English

python threading.timer设置程序运行时间耗尽的时间限制

[英]python threading.timer set time limit when program runs out of time

I have some questions related to setting the maximum running time of a function in Python. 我有一些问题与设置Python中函数的最大运行时间有关。 In fact, I would like to use pdfminer to convert the .pdf files to .txt . 实际上,我想使用pdfminer.pdf文件转换为.txt

The problem is that very often, some files are not possible to decode and take extremely long time. 问题是很多时候,有些文件无法解码,并且需要很长时间。 So I want to set threading.Timer() to limit the conversion time for each file to 5 seconds. 因此,我想设置threading.Timer()以将每个文件的转换时间限制为5秒。 In addition, I run under windows so I cannot use the signal module for this. 此外,我在Windows下运行,因此无法为此使用signal模块。

I succeeded in running the conversion code with pdfminer.convert_pdf_to_txt() (in my code it is " c "), but I am not sure that the in the following code, threading.Timer() works. 我使用pdfminer.convert_pdf_to_txt()成功运行了转换代码(在我的代码中为“ c ”),但是我不确定以下代码中的threading.Timer()有效。 (I don't think it properly constrains the time for each processing) (我认为这不会适当限制每次处理的时间)

In summary, I want to: 总而言之,我想:

  1. Convert the pdf to txt 将pdf转换为txt

  2. Time limit for each conversion is 5 sec, if it runs out of time, throw an exception and save an empty file 每次转换的时间限制为5秒,如果时间用完,则抛出异常并保存一个空文件

  3. Save all the txt files under the same folder 将所有txt文件保存在同一文件夹下

  4. If there are any exceptions/errors, still save the file but with empty content. 如果有任何异常/错误,请仍然保存文件但内容为空。

Here is the current code: 这是当前代码:

import converter as c
import os
import timeit
import time
import threading
import thread

yourpath = 'D:/hh/'

def iftimesout():
    print("no")

    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write("")


for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           timer = threading.Timer(5.0,iftimesout)
           timer.start()
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])

           with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
                print("yes")

           timer.cancel()

         except KeyboardInterrupt:
               raise

         except:
             for name in files:
                 t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                 a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

                 g=str(a.split("\\")[1])
                 with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                     newfile.write("") 

I finally figured it out! 我终于想通了!

First of all, define a function to call another function with a limited timeout: 首先,定义一个函数以在有限的超时时间内调用另一个函数:

import multiprocessing

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

What does the function do? 该功能做什么?

  • Check timeout and function to be valid 检查超时和功能是否有效
  • Start the given function in a new process, which has some advantages over threads 在一个新的过程中启动给定的函数,它比线程具有一些优势
  • Block the program for x seconds ( p.join() ) and allow the function to be executed in this time 阻塞程序x秒钟( p.join() )并允许函数在此时执行
  • After the timeout expires, check if the function is still running 超时到期后,检查功能是否仍在运行

    • Yes: Terminate it and return False 是:终止并返回False
    • No: Fine, no timeout! 否:很好,没有超时! Return True 返回True

We can test it with time.sleep() : 我们可以用time.sleep()测试它:

import time

finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
    print("No timeout")
else:
    print("Timeout")

We run a function which needs one second to finish, timeout is set to two seconds: 我们运行一个需要一秒钟才能完成的功能,超时设置为两秒钟:

No timeout

If we run time.sleep(10) and set the timeout to two seconds: 如果运行time.sleep(10)并将超时设置为两秒:

finished = call_timeout(2, time.sleep, args=(10, ))

Result: 结果:

Timeout

Notice the program stops after two seconds without finishing the called function. 请注意,程序在两秒后停止,没有完成所调用的函数。

Your final code will look like this: 您的最终代码将如下所示:

import converter as c
import os
import timeit
import time
import multiprocessing

yourpath = 'D:/hh/'

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

def convert(root, name, g, t):
    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])
           finished = call_timeout(5, convert, args=(root, name, g, t))

           if finished:
               print("yes")
           else:
               print("no")

               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("")

        except KeyboardInterrupt:
             raise

       except:
           for name in files:
                t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

               g=str(a.split("\\")[1])
               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("") 

The code should be easy to understand, if not, feel free to ask. 该代码应该易于理解,如果不是,请随时提出。

I really hope this helps ( as it took some time for us to get it right ;) )! 我真的希望这对您有所帮助( 因为我们花了一些时间才把它弄好;) )!

Check following code and let me know in case of any issues. 检查以下代码,如有任何问题,请通知我。 Also let me know whether you still want to use force termination feature ( KeyboardInterruption ) 也让我知道您是否仍要使用强制终止功能( KeyboardInterruption

path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5  # seconds
TIME_TO_CHECK = 1  # seconds


# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
    my_txt = text_file_name(my_pdf)
    with open(my_txt, "w") as my_text_file:
         try:
              my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
         except:
              print "Error. %s file wasn't converted" % my_pdf


# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
    return path_to_text + (pdf_file.split('.')[0]+ ".txt")


if __name__ == "__main__":
    # for each pdf file in PDF folder
    for root, dirs, files in os.walk(path_to_pdf, topdown=False):
        for my_file in files:
            count = 0
            p = Process(target=convert, args=(root, my_file,))
            p.start()
            # some delay to be sure that text file created
            while not os.path.isfile(text_file_name(my_file)):
                time.sleep(0.001)
            while True:
                # if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
                # else: close file and start new iteration
                if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
                    count += TIME_TO_CHECK
                    time.sleep(TIME_TO_CHECK)
                else:
                    p.terminate()
                    break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM