I'm using python to do some processing on text files and am having issues with MemoryError
s. Sometimes the file being processed is quite large which means that too much RAM is being used by a multiprocessing Process.
Here is a snippet of my code:
import multiprocessing as mp
import os
def preprocess_file(file_path):
with open(file_path, "r+") as f:
file_contents = f.read()
# modify the file_contents
# ...
# overwrite file
f.seek(0)
f.write(file_contents)
f.truncate()
if __name__ == "main":
with mp.Pool(mp.cpu_count()) as pool:
pool_processes = []
# for all files in dir
for root, dirs, files in os.walk(some_path):
for f in files:
pool_processes.append(os.path.join(root, f))
# start the processes
pool.map(preprocess_file, pool_processes)
I have tried to use the resource package to set a limit to how much RAM each process can use as shown below but this hasn't fixed the issue, and I still get MemoryError
s being raised which leads me to believe it's the pool.map
which is causing issues. I was hoping to have each process deal with the exception individually so that the file could be skipped rather than crashing the whole program.
import resource
def preprocess_file(file_path):
try:
hard = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES") # total bytes of RAM in machine
soft = (hard - 512 * 1024 * 1024) // mp.cpu_count() # split between each cpu and save 512MB for the system
resource.setrlimit(resource.RLIMIT_AS, (soft, hard)) # apply limit
with open(file_path, "r+") as f:
# ...
except Exception as e: # bad practice - should be more specific but just a placeholder
# ...
How can I let an individual process run out of memory while letting the other processes continue unaffected? Ideally I want to catch the exception within the preprocess_file
file so that I can log exactly which file caused the error.
Edit: The preprocess_file
function does not share data with any other processes so there is no need for shared memory. The function also needs to read the entire file at once as the file is reformatted which cannot be done line by line.
Edit 2: The traceback from the program is below. As you can see, the error doesn't actually point to the file being run, and instead comes from the package's files.
Process ForkPoolWorker-2:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 125, in worker
File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 39, in __init__
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 130, in worker
File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 39, in __init__
MemoryError
If MemoryError
is raised, the worker processmay or may not be able to recover from the situation. If it do, as @Thomas suggest, catch the MemoryError
somewhere.
import multiprocessing as mp
from time import sleep
def initializer():
# Probably set the memory limit here
pass
def worker(i):
sleep(1)
try:
if i % 2 == 0:
raise MemoryError
except MemoryError as ex:
return str(ex)
return i
if __name__ == '__main__':
with mp.Pool(2, initializer=initializer) as pool:
tasks = range(10)
results = pool.map(worker, tasks)
print(results)
If the worker cannot recover, the whole pool is unlikely working. For example, change worker
to do a force exit:
def worker(i):
sleep(1)
try:
if i % 2 == 0:
raise MemoryError
elif i == 5:
import sys
sys.exit()
except MemoryError as ex:
return str(ex)
return i
the Pool.map
never return and block forever.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.