[英]Reading thousands of json file and process them using python multiprocessing
我正在尝试从目录中读取数千个 json 文件并分别处理每个文件并将结果存储在字典中。 我已经为顺序执行编写了一个工作代码。 现在我想利用多处理来加速整个过程。
到目前为止我做了什么 -
import json
import os
from multiprocessing import Process, Manager
def read_file(file_name):
'''
Read the given json file and return data
'''
with open(file_name) as file :
data = json.load(file)
return data
def do_some_process(data):
'''
Some calculation will be done here
and return the result
'''
return some_result
def process_each_file(file, result):
file_name = file.split('.')[0]
# reading data from file
data = read_file('../data/{}'.format(file))
processed_result = do_some_process(data)
result[file_name] = processed_result
if __name__ == '__main__':
manager = Manager()
result = manager.dict()
file_list = os.listdir("../data")
all_process = [Process(target=process_each_file, args=(file, result, ))
for file in file_list if file.endswith(".json")]
for p in all_process:
p.start()
for p in all_process:
p.join()
'''
Do some further work with 'rusult' variable
'''
当我运行这段代码时,它显示OSError: [Errno 24] Too many open files
我怎样才能实现我的目标?
要使用 Python 的多处理模块读取和处理多个 JSON 文件,可以使用以下方法:
import os
import json
from multiprocessing import Pool
# List all the JSON files in the current directory
json_files = [f for f in os.listdir('.') if f.endswith('.json')]
def process_data(data):
return data
def process_json_file(filename):
with open(filename, 'r') as f:
data = json.load(f)
# Process the data here...
processed_data = process_data(data)
return processed_data
# Create a pool of workers to process the files concurrently
with Pool() as pool:
# Apply the processing function to each JSON file concurrently
results = pool.map(process_json_file, json_files)
# Do something with the results
for result in results:
print(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.