Threading subprocess and get progress

Question

I'd like to automate handbrake a little bit and wrote a little program in python. Now I have a problem with the subprocess and threading module. I want to dynamically change the number of handbrake processes which I run. And I implement the queue module, for getting and putting the movies.

CompressThread calls the encode method in the handbrake class, and encode calls _execute . Now I want to store the progress, which I read in the handbrake class, in the compressor class centralized. So I can publish the progress to a socketserver and a webgui . No I write to a sqlite3 db, but this should be removed(because of threading issues), and only on exit of the program saved.

The only way I can think of save the data centralized is to create another thread, and poll data in the CompressThread class. My problem with this is that there are 4 threads for my program.

Is there a better solution? Maybe the db is not wrong, and I shouldn't remove it?

Compressor class:

class CompressThread(threading.Thread):
    """ Manage the queue of movies to be compressed
    """

    def __init__(self):
        threading.Thread.__init__(self)
        self._config = ConfigParser()
        self._config.process_config()
        self._handbrake = self._config.get_handbrake()
        self._lock = threading.Lock()

    def run(self):
        while True:
            movie_id = QUEUE.get()
            return_code = self._handbrake.encode(movie_id)
            print(return_code)
            QUEUE.task_done()


class Compressor(object):
    """ Compresses given mkv file

    Attributes:


    """

    __MAX_THREADS = 1

    def __init__(self):
        self._dest_audio_tracks = None
        self._log = None
        self.settings = None
        self.config = ConfigParser()
        self._database = db.DB()
        self._database.connect()
        self._running = True
        self._threads = []
        try:
            self.handbrake, self._log = self.config.process_config()
            self._log = logging.getLogger("Compressor")
        except ConfigError as error:
            raise Error("Config error: {0}".format(error))

    def process_file(self, input_file, output_file, title):
        if not os.path.exists(input_file):
            self._log.warning("Input file not exists: {0}".format(input_file))
            print("Input file not found: {0}".format(input_file))
        else:
            media_info = mediainfo.Mediainfo.parse(input_file)
            movie_settings = settings.Settings(input_file, title, output_file)
            movie_settings.parse(media_info)
            self._log.info("Added file {0} to list".format(movie_settings.input_file))
            QUEUE.put(self._database.insert_movie(movie_settings))

            print("File added.")

    def start(self):
        self._threads = [CompressThread() for i in range(self.__MAX_THREADS)]
        for thread in self._threads:
            thread.setDaemon(True)
            thread.start()
        while self._running:
            cmd = input("mCompress> ")
            if cmd == "quit":
                self._running = False
            elif cmd == "status":
                print("{0}".format(self._threads))
            elif cmd == "newfile":
                input_file = input("mCompress> newFile> Input filename> ")
                output_file = input("mCompress> newFile> Output filename> ")
                title = input("mCompress> newFile> Title> ")
                self.process_file(input_file, output_file, title)

    def _initialize_logging(self, log_file):
        try:
            self._log_file = open(log_file, "a+")
        except IOError as error:
            log_error = "Could not open log file {0}".format(error)
            self._log.error(log_error)
            raise IOError(log_error)
        self._log_file.seek(0)

if __name__ == "__main__":
    options_parser = OptionsParser()
    args = options_parser.parser.parse_args()
    if args.start:
        Compressor().start()

A piece of the handbrake class:

def _execute(self, options):
    command = ["{0}".format(self._location)]
    if self._validate_options(options):
        for option in options:
            command.extend(option.generate_command())
        print(" ".join(command))
        state = 1
        returncode = None
        process = None
        temp_file = tempfile.TemporaryFile()
        try:
            process = subprocess.Popen(command, stdout=temp_file, stderr=temp_file, shell=False)
            temp_file.seek(0)
            while True:
                returncode = process.poll()
                if not returncode:
                    for line in temp_file.readlines():
                        p = re.search("Encoding:.*([0-9]{1,2}\.[0-9]{1,2}) % \(([0-9]{1,2}\.[0-9]{1,2}) fps, avg "
                                      "([0-9]{1,2}\.[0-9]{1,2}) fps, ETA ([0-9]{1,2}h[0-9]{1,2}m[0-9]{1,2})",
                                      line.decode("utf-8"))
                        if p is not None:
                            self._database.update_progress(p.group(1), p.group(2), p.group(3), p.group(4))
                else:
                    break
            temp_file.seek(0)
            print(temp_file.readline())
            self._write_log(temp_file.readlines())
            if returncode == 0:
                state = 5
            else:
                state = 100
                raise ExecuteError("HandBrakeCLI stopped with an exit code not null: {0}".format(returncode))
        except OSError as error:
            state = 105
            raise ExecuteError("CLI command failed: {0}".format(error))
        except KeyboardInterrupt:
            state = 101
        finally:
            try:
                process.kill()
            except:
                pass
            temp_file.close()
            return state
    else:
        raise ExecuteError("No option given")

Answer 1

Just do exactly what you were planning to do.

If this means you have 5 threads instead of 4, so what?

None of your threads are CPU-bound. That is, they're not crunching numbers or parsing strings or doing other computational work, they're just waiting on I/O, or an external process, or another thread. So there's no harm in creating more non-CPU-bound threads, unless you go hog-wild to the point where your OS can't handle them smoothly anymore. Which is in the hundreds.

If any of your threads were CPU-bound, then even 2 would be too many. In CPython,* threads have to acquire the Global Interpreter Lock to do any work,** so they end up not running in parallel, and spending more time fighting over the GIL than working. But even then, adding another non-CPU-bound thread that spends all its time waiting on a queue that the CPU-bound threads were filling wouldn't make things significantly worse than they already are.***

As for the db…

SQLite3 itself, as long as you have a new enough version, is fine with multithreading. But the Python sqlite3 module is not, for backward compatibility with very old versions of the SQLite3 engine. See Multithreading in the docs for details. If I remember correctly (the site seems to be temporarily down, so I can't check), you can build the third-party module pysqlite (which the stdlib module is based on) with threading support if you need to.

However, if you're not using the database very heavily, running a single thread to talk to the database, with a queue to listen to other threads, is a perfectly reasonable design.

* And PyPy, but not necessarily in other implementations.

** Extension modules can release the GIL to do work in C, as long as they don't touch any values visible from Python. Some well-known modules like NumPy take advantage of this.

*** The waiting thread itself might be hampered by the CPU-bound threads, especially in Python 3.1 and earlier, but it won't interfere with them.

Threading subprocess and get progress

Question

1 answers

solution1
2 ACCPTED 2013-11-13 18:51:08

Threading subprocess and get progress

Question

1 answers

solution1 2 ACCPTED 2013-11-13 18:51:08

solution1
2 ACCPTED 2013-11-13 18:51:08