python threading.timer設置程序運行時間耗盡的時間限制

Question

我有一些問題與設置Python中函數的最大運行時間有關。 實際上，我想使用pdfminer將.pdf文件轉換為.txt 。

問題是很多時候，有些文件無法解碼，並且需要很長時間。 因此，我想設置threading.Timer()以將每個文件的轉換時間限制為5秒。 此外，我在Windows下運行，因此無法為此使用signal模塊。

我使用pdfminer.convert_pdf_to_txt()成功運行了轉換代碼（在我的代碼中為“ c ”），但是我不確定以下代碼中的threading.Timer()有效。 （我認為這不會適當限制每次處理的時間）

總而言之，我想：

將pdf轉換為txt
每次轉換的時間限制為5秒，如果時間用完，則拋出異常並保存一個空文件
將所有txt文件保存在同一文件夾下
如果有任何異常/錯誤，請仍然保存文件但內容為空。

這是當前代碼：

import converter as c
import os
import timeit
import time
import threading
import thread

yourpath = 'D:/hh/'

def iftimesout():
    print("no")

    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write("")


for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           timer = threading.Timer(5.0,iftimesout)
           timer.start()
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])

           with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
                print("yes")

           timer.cancel()

         except KeyboardInterrupt:
               raise

         except:
             for name in files:
                 t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                 a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

                 g=str(a.split("\\")[1])
                 with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                     newfile.write("")

Answer 1

我終於想通了！

首先，定義一個函數以在有限的超時時間內調用另一個函數：

import multiprocessing

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

該功能做什么？

檢查超時和功能是否有效
在一個新的過程中啟動給定的函數，它比線程具有一些優勢
阻塞程序x秒鍾（ p.join() ）並允許函數在此時執行
超時到期后，檢查功能是否仍在運行
- 是：終止並返回False
- 否：很好，沒有超時！ 返回True

我們可以用time.sleep()測試它：

import time

finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
    print("No timeout")
else:
    print("Timeout")

我們運行一個需要一秒鍾才能完成的功能，超時設置為兩秒鍾：

No timeout

如果運行time.sleep(10)並將超時設置為兩秒：

finished = call_timeout(2, time.sleep, args=(10, ))

結果：

Timeout

請注意，程序在兩秒后停止，沒有完成所調用的函數。

您的最終代碼將如下所示：

import converter as c
import os
import timeit
import time
import multiprocessing

yourpath = 'D:/hh/'

def call_timeout(timeout, func, args=(), kwargs={}):
    if type(timeout) not in [int, float] or timeout <= 0.0:
        print("Invalid timeout!")

    elif not callable(func):
        print("{} is not callable!".format(type(func)))

    else:
        p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
        p.start()
        p.join(timeout)

        if p.is_alive():
            p.terminate()
            return False
        else:
            return True

def convert(root, name, g, t):
    with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
        newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
           t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
           a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
           g=str(a.split("\\")[1])
           finished = call_timeout(5, convert, args=(root, name, g, t))

           if finished:
               print("yes")
           else:
               print("no")

               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("")

        except KeyboardInterrupt:
             raise

       except:
           for name in files:
                t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

               g=str(a.split("\\")[1])
               with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                   newfile.write("")

該代碼應該易於理解，如果不是，請隨時提出。

我真的希望這對您有所幫助（ 因為我們花了一些時間才把它弄好；） ）！

Answer 2

檢查以下代碼，如有任何問題，請通知我。 也讓我知道您是否仍要使用強制終止功能（ KeyboardInterruption ）

path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5  # seconds
TIME_TO_CHECK = 1  # seconds


# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
    my_txt = text_file_name(my_pdf)
    with open(my_txt, "w") as my_text_file:
         try:
              my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
         except:
              print "Error. %s file wasn't converted" % my_pdf


# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
    return path_to_text + (pdf_file.split('.')[0]+ ".txt")


if __name__ == "__main__":
    # for each pdf file in PDF folder
    for root, dirs, files in os.walk(path_to_pdf, topdown=False):
        for my_file in files:
            count = 0
            p = Process(target=convert, args=(root, my_file,))
            p.start()
            # some delay to be sure that text file created
            while not os.path.isfile(text_file_name(my_file)):
                time.sleep(0.001)
            while True:
                # if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
                # else: close file and start new iteration
                if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
                    count += TIME_TO_CHECK
                    time.sleep(TIME_TO_CHECK)
                else:
                    p.terminate()
                    break

python threading.timer設置程序運行時間耗盡的時間限制

問題描述

2 個解決方案

解決方案1
5 2016-11-22 18:48:05

解決方案2
0 已采納 2016-11-23 17:50:46

python threading.timer設置程序運行時間耗盡的時間限制

問題描述

2 個解決方案

解決方案1 5 2016-11-22 18:48:05

解決方案2 0 已采納 2016-11-23 17:50:46

解決方案1
5 2016-11-22 18:48:05

解決方案2
0 已采納 2016-11-23 17:50:46