I keep trying to parallelized my code that is purpose to extract text from various videos, implementing OCR making use of library OpenCV, specifically from 9 videos, of those 9 videos are divided into 3 categories, allowing them to be analyzed in the same way for each category. This is why I coded 3 principal functions, 1 for each category, this means that I can reuse a function for 3 videos of the same category. The functions takes between 90 and 200 seconds running independently. If I wanted to analyze 3 videos in the same execution, the result will be a much longer execution time, because the functions will execute sequentially.
It is for this reason that I decided to use the multiprocessing module, I finally got to make the functions run in parallel, however I did not get the expected performance. When I execute 2 process in parallel, 1 video for each process, execution time increases approx 10% - 15%, that's ok. But, when I execute 3 process in parallel, 1 video for each process, execution time increases drastically, in fact, I detected that the processes stopped executing, due the silence that my cpu cooler made. I checked this using htop for my linux system (ubuntu 20.04.2 LTS), and so it really was, when executing 3 processes in parallel, on a certain moment, the 6 cores of the cpu reached their limit (100%), causing processes to stop.
cpu usage - htop monitoring system
I found a way to partially fix it, I did it by separating the start time of the executions, in this way the processes at times did not use 100% of cores, getting an acceptable execution time. But, I still need to analyze more videos in parallel, 3 is still few videos. Is there any way to increase performance? I really didn't expect this performance for Python, considering it's running on a i5-8600k and 16gb Ram - 3200MHz.
important to mention:
If you want to check the code, you will find this in: GitHub repository
I keep trying to parallelized my code that is purpose to extract text from various videos, implementing OCR making use of library OpenCV, specifically from 9 videos, of those 9 videos are divided into 3 categories, allowing them to be analyzed in the same way for each category. This is why I coded 3 principal functions, 1 for each category, this means that I can reuse a function for 3 videos of the same category. The functions takes between 90 and 200 seconds running independently. If I wanted to analyze 3 videos in the same execution, the result will be a much longer execution time, because the functions will execute sequentially.
It is for this reason that I decided to use the multiprocessing module, I finally got to make the functions run in parallel, however I did not get the expected performance. When I execute 2 process in parallel, 1 video for each process, execution time increases approx 10% - 15%, that's ok. But, when I execute 3 process in parallel, 1 video for each process, execution time increases drastically, in fact, I detected that the processes stopped executing, due the silence that my cpu cooler made. I checked this using htop for my linux system (ubuntu 20.04.2 LTS), and so it really was, when executing 3 processes in parallel, on a certain moment, the 6 cores of the cpu reached their limit (100%), causing processes to stop.
cpu usage - htop monitoring system
I found a way to partially fix it, I did it by separating the start time of the executions, in this way the processes at times did not use 100% of cores, getting an acceptable execution time. But, I still need to analyze more videos in parallel, 3 is still few videos. Is there any way to increase performance? I really didn't expect this performance for Python, considering it's running on a i5-8600k and 16gb Ram - 3200MHz.
important to mention:
If you want to check the code, you will find this in: GitHub repository
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.