[英]Python multiprocessing - Modify JSON via multiple processes
I'm trying to modify a JSON file with multiprocessing
. 我正在尝试使用
multiprocessing
修改JSON文件。 I would be able to split the JSON into chunks, so that each process has only access to and modify a certain part of the JSON (so it's guaranteed that no two processes want to modify the same attribute). 我将能够将JSON拆分为多个块,以便每个进程只能访问和修改JSON的特定部分(因此可以确保没有两个进程想要修改同一属性)。 My question is, how can I share the JSON object between processes so that the changes are reflected on the original object?
我的问题是,如何在流程之间共享JSON对象,以便更改能够反映在原始对象上? I know, that
multiprocessing
passes the object as a copy, so I'd need to use a Manager()
, but how exactly can I do that? 我知道,
multiprocessing
将对象作为副本传递,所以我需要使用Manager()
,但是我到底该怎么做呢? Currently I have 目前我有
def parallelUpdateJSON(datachunk):
for feature in datachunk:
#modify chunk
def writeGeoJSON():
with open('geo.geojson') as f:
data = json.load(f)
pool = Pool()
for i in range(0, mp.cpu_count())):
#chunk data into a list, so I get listofchunks = [chunk1, chunk2, etc.,]
#where chunk1 = data[0:chunksize], chunk2 = data[chunksize:2*chunksize] etc.
pool.map(parallelUpdateJSON, listofchunks)
pool.close()
pool.join()
with open('test_parallel.geojson', 'w') as outfile:
json.dump(data, outfile)
But of course this passes the chunks as copies, so the original data
object doesn't get modified. 但是,当然,这会将块作为副本传递,因此原始
data
对象不会被修改。 How can I make it so that data
actually gets modified by the processes? 我怎样才能使
data
实际上被流程修改? Thank you! 谢谢!
It's probably a better idea to avoid synchronous access to your output file. 避免同步访问输出文件可能是一个更好的主意。 It will be a lot easier to just produce N partial outputs and join them together into a property of your json object.
仅产生N个部分输出并将它们连接在一起成为json对象的属性会容易得多。 Then you can dump that object to a file.
然后,您可以将该对象转储到文件中。
def process(work):
return str(work[::-1])
if __name__ == "__main__":
p = Pool()
structure = json.loads("""
{ "list":
[
"the quick brown fox jumped over the lazy dog",
"the quick brown dog jumped over the lazy fox"
]
}
""")
structure["results"] = p.map(process, structure["list"])
#print(json.dumps(structure))
with open("result.json", "w") as f:
json.dump(structure, f)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.