将 django ORM 与多处理一起使用？

Question

I have a Django ORM database (mysql or sqlite), and would like to process each row with a fairly computationally intensive operation.我有一个 Django ORM 数据库（mysql 或 sqlite），并且想用计算量相当大的操作来处理每一行。 What I have right now is something like:我现在拥有的是这样的：

entries = Entry.objects.filter(language='')
for e in entry:
    e.language = detect_language(e.text)
    e.save()

If the database was the bottleneck, I would use a transaction to speed it up.如果数据库是瓶颈，我会使用事务来加速它。 However, the detect_language function is what takes the most time.但是， detect_language函数花费的时间最多。 I could try to run the script multiple times in parallel, but that would introduce a race condition.我可以尝试并行运行脚本多次，但这会引入竞争条件。

I think this could be done with multiprocessing using Pool.map() - the main process fetches the DB entries, the child processes run detect_language .我认为这可以通过使用Pool.map() multiprocessing来完成 - 主进程获取数据库条目，子进程运行detect_language 。 I am not sure how to do it in detail, eg whether to save the entries in the child process or in the main process.我不确定如何详细地执行此操作，例如是将条目保存在子进程中还是在主进程中。

Is there anything to take care of when passing ORM objects between processes?在进程之间传递 ORM 对象时有什么需要注意的吗？ Can you give a short example how to use the ORM with multiprocessing?您能否举一个简短的示例，说明如何将 ORM 与多处理一起使用？

I've just tied it, and something like this seems to work fine.我刚刚把它绑起来，像这样的东西似乎工作正常。 I am still wondering if there is any caveat here, or if this can be improved performance wise (eg batching the DB update):我仍然想知道这里是否有任何警告，或者这是否可以提高性能（例如批处理数据库更新）：

def detect_and_save(obj):
    obj.language = detect_language(obj.text)
    obj.save()

with multiprocessing.Pool(processes=3) as pool:
    pool.map(detect_and_save, entries)

Answer 1

You don't need to pass the full ORM objects - you can just pass the parameter your function needs and save its result to the ORM objects.您不需要传递完整的 ORM 对象 - 您只需传递函数所需的参数并将其结果保存到 ORM 对象即可。
You could use bulk_update (available from Django 2.2 or as a 3rd party library ) to save in a single query.您可以使用bulk_update （可从 Django 2.2 获得或作为第 3 方库）保存在单个查询中。

texts = [e.text for e in entries]
with multiprocessing.Pool() as pool:
    languages = pool.map(detect_language, texts)
for e, l in zip(entries, languages):
    e.language = l
Entry.objects.bulk_update(entries, ['language'])

将 django ORM 与多处理一起使用？

问题描述

1 个解决方案

解决方案1
0 2020-08-04 19:22:34

将 django ORM 与多处理一起使用？

问题描述

1 个解决方案

解决方案1 0 2020-08-04 19:22:34

解决方案1
0 2020-08-04 19:22:34