简体   繁体   English

在 Django 中一次更新多个对象?

[英]Update multiple objects at once in Django?

I am using Django 1.9.我正在使用 Django 1.9。 I have a Django table that represents the value of a particular measure, by organisation by month, with raw values and percentiles:我有一个 Django 表,它表示特定度量的值,按月组织,带有原始值和百分位数:

class MeasureValue(models.Model):
    org = models.ForeignKey(Org, null=True, blank=True)
    month = models.DateField()
    calc_value = models.FloatField(null=True, blank=True)
    percentile = models.FloatField(null=True, blank=True)

There are typically 10,000 or so per month.通常每月有 10,000 个左右。 My question is about whether I can speed up the process of setting values on the models.我的问题是我是否可以加快在模型上设置值的过程。

Currently, I calculate percentiles by retrieving all the measurevalues for a month using a Django filter query, converting it to a pandas dataframe, and then using scipy's rankdata to set ranks and percentiles.目前,我通过使用 Django 过滤器查询检索一个月的所有测量值,将其转换为 Pandas 数据框,然后使用 scipy 的rankdata设置排名和百分位数来计算百分位数。 I do this because pandas and rankdata are efficient, able to ignore null values, and able to handle repeated values in the way that I want, so I'm happy with this method:我这样做是因为 pandas 和rankdata是高效的,能够忽略空值,并且能够以我想要的方式处理重复值,所以我对这个方法很满意:

records = MeasureValue.objects.filter(month=month).values()
df = pd.DataFrame.from_records(records)
// use calc_value to set percentile on each row, using scipy's rankdata

However, I then need to retrieve each percentile value from the dataframe, and set it back onto the model instances.但是,我随后需要从数据框中检索每个百分位值,并将其重新设置到模型实例上。 Right now I do this by iterating over the dataframe's rows, and updating each instance:现在我通过迭代数据帧的行并更新每个实例来做到这一点:

for i, row in df.iterrows():
    mv = MeasureValue.objects.get(org=row.org, month=month)
    if (row.percentile is None) or np.isnan(row.percentile):
        row.percentile = None
    mv.percentile = row.percentile
    mv.save()

This is unsurprisingly quite slow.不出所料,这很慢。 Is there any efficient Django way to speed it up, by making a single database write rather than tens of thousands?是否有任何有效的 Django 方法可以通过编写单个数据库而不是数万个来加快速度? I have checked the documentation , but can't see one.检查了文档,但看不到一个。

Atomic transactions can reduce the time spent in the loop:原子事务可以减少在循环中花费的时间:

from django.db import transaction

with transaction.atomic():
    for i, row in df.iterrows():
        mv = MeasureValue.objects.get(org=row.org, month=month)

        if (row.percentile is None) or np.isnan(row.percentile): 
            # if it's already None, why set it to None?
            row.percentile = None

        mv.percentile = row.percentile
        mv.save()

Django's default behavior is to run in autocommit mode. Django 的默认行为是在自动提交模式下运行。 Each query is immediately committed to the database, unless a transaction is actives.每个查询都会立即提交到数据库,除非事务处于活动状态。

By using with transaction.atomic() all the inserts are grouped into a single transaction.通过with transaction.atomic()使用with transaction.atomic()所有插入都被分组到一个事务中。 The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.提交事务所需的时间在所有包含的插入语句中分摊,因此每个插入语句的时间大大减少。

As of Django 2.2, you can use the bulk_update() queryset method to efficiently update the given fields on the provided model instances, generally with one query:从 Django 2.2 开始,您可以使用bulk_update()方法有效地更新提供的模型实例上的给定字段,通常使用一个查询:

objs = [
    Entry.objects.create(headline='Entry 1'),
    Entry.objects.create(headline='Entry 2'),
]
objs[0].headline = 'This is entry 1'
objs[1].headline = 'This is entry 2'
Entry.objects.bulk_update(objs, ['headline'])

In older versions of Django you could use update() with Case / When , eg:在旧版本的 Django 中,您可以将update()Case / When ,例如:

from django.db.models import Case, When

Entry.objects.filter(
    pk__in=headlines  # `headlines` is a pk -> headline mapping
).update(
    headline=Case(*[When(pk=entry_pk, then=headline)
                    for entry_pk, headline in headlines.items()]))

Actually, attempting @Eugene Yarmash 's answer I found I got this error:实际上,尝试@Eugene Yarmash 的回答我发现我收到了这个错误:

FieldError: Joined field references are not permitted in this query

But I believe iterating update is still quicker than multiple saves, and I expect using a transaction should also expedite.但我相信迭代update仍然比多次保存更快,我希望使用事务也应该加快速度。

So, for versions of Django that don't offer bulk_update , assuming the same data used in Eugene's answer, where headlines is a pk -> headline mapping:因此,对于不提供bulk_update的 Django 版本,假设 Eugene 的答案中使用的数据相同,其中headlines是 pk -> 标题映射:

from django.db import transaction

with transaction.atomic():
    for entry_pk, headline in headlines.items():
        Entry.objects.filter(pk=entry_pk).update(headline=headline)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM