如何使用Django ORM进行批量插入或增量类型操作

Question

I have a model as defined here: 我有一个在这里定义的模型：

class VectorSet(models.Model):
    word = models.CharField(max_length=255)
    weight = models.IntegerField()
    session = models.ForeignKey(ResearchSession)

I want to write a function that will take a list of words and a ResearchSession, and for each word in that list of words if it's doesn't already exist, create a new row with a weight of 1, otherwise take that row and increment weight by 1. 我想编写一个函数，该函数将包含一个单词列表和一个ResearchSession，并且对于该单词列表中的每个单词（如果尚不存在），创建一个权重为1的新行，否则采用该行并递增重量减1。

So far I've gotten this: 到目前为止，我已经做到了：

def train(words, session):
    for i in words:
        result, created = VectorSet.objects.get_or_create(word=i, session=session,
                                                          defaults={'weight' : 1})
        if not created:
            result.weight = F('weight') + 1
            result.save()

I'm fairly confident that there is a way to do this with one query however I can't quite figure out what that might be or if it's possible to do with django code over raw SQL. 我相当有信心，有一种方法可以对一个查询执行此操作，但是我无法弄清楚这可能是什么，或者是否有可能通过原始SQL使用Django代码。

Answer 1

There is currently no out-of-the box solution for doing bulk inserts other than bulk_create I think. 除了我认为的bulk_create，目前没有开箱即用的解决方案来进行大容量插入。 Another solution, depending on your database, is to perform get_or_create within a transaction by using atomic . 根据您的数据库，另一种解决方案是通过使用atomic在事务内执行get_or_create。 For example: 例如：

from django.db import transaction

@transaction.atomic
def train(words, session):
    for i in words:
        result, created = VectorSet.objects.get_or_create(word=i, session=session,
                                                      defaults={'weight' : 1})
        if not created:
            result.weight = F('weight') + 1
            result.save()

Otherwise, you might be able to use the DB API executemany : 否则，您也许可以使用DB API executemany ：

cursor.executemany('INSERT INTO vectorset (field1, field2, field3) VALUES (?, ?, ?)', data)

Answer 2

Logic is simple, but we need to hit DB several times, which means several queries: 逻辑很简单，但是我们需要多次访问数据库，这意味着要执行几个查询：

qs = VectorSet.objects.filter(word__in=words, session=session)
qs.update(weiget=models.F('weight')+1)
VectorSet.objects.bulk_insert(VectorSet(session=session, word=w, weight=1)
  for w in words if w not in qs.value_list('word', flat=True))

There is also a update_or_create in Django 1.7, but currently it does not distinguish defaults for update from defaults for create: Django 1.7中也有一个update_or_create ，但是当前它不能区分update的默认值和create的默认值：

for w in words:
    VectorSet.objects.update_or_create(word=w, session=session,
                                    defaults={'weight': models.F('weight')+1})

Thus the above code will fail in creating by int(models.F('weight')+1) (We could override the __int__ method, but too hack to make sense...IMO) 因此，上述代码将无法通过int(models.F('weight')+1) （我们可以覆盖__int__方法，但太过分__int__ ……IMO）

如何使用Django ORM进行批量插入或增量类型操作

问题描述

2 个解决方案

解决方案1
0 2014-11-23 08:38:58

解决方案2
0 2014-11-23 09:48:50

如何使用Django ORM进行批量插入或增量类型操作

问题描述

2 个解决方案

解决方案1 0 2014-11-23 08:38:58

解决方案2 0 2014-11-23 09:48:50

解决方案1
0 2014-11-23 08:38:58

解决方案2
0 2014-11-23 09:48:50