简体   繁体   English

如何使用Django ORM进行批量插入或增量类型操作

[英]How to do a bulk insert or increment type operation with the Django ORM

I have a model as defined here: 我有一个在这里定义的模型:

class VectorSet(models.Model):
    word = models.CharField(max_length=255)
    weight = models.IntegerField()
    session = models.ForeignKey(ResearchSession)

I want to write a function that will take a list of words and a ResearchSession, and for each word in that list of words if it's doesn't already exist, create a new row with a weight of 1, otherwise take that row and increment weight by 1. 我想编写一个函数,该函数将包含一个单词列表和一个ResearchSession,并且对于该单词列表中的每个单词(如果尚不存在),创建一个权重为1的新行,否则采用该行并递增重量减1。

So far I've gotten this: 到目前为止,我已经做到了:

def train(words, session):
    for i in words:
        result, created = VectorSet.objects.get_or_create(word=i, session=session,
                                                          defaults={'weight' : 1})
        if not created:
            result.weight = F('weight') + 1
            result.save()

I'm fairly confident that there is a way to do this with one query however I can't quite figure out what that might be or if it's possible to do with django code over raw SQL. 我相当有信心,有一种方法可以对一个查询执行此操作,但是我无法弄清楚这可能是什么,或者是否有可能通过原始SQL使用Django代码。

There is currently no out-of-the box solution for doing bulk inserts other than bulk_create I think. 除了我认为的bulk_create,目前没有开箱即用的解决方案来进行大容量插入。 Another solution, depending on your database, is to perform get_or_create within a transaction by using atomic . 根据您的数据库,另一种解决方案是通过使用atomic在事务内执行get_or_create。 For example: 例如:

from django.db import transaction

@transaction.atomic
def train(words, session):
    for i in words:
        result, created = VectorSet.objects.get_or_create(word=i, session=session,
                                                      defaults={'weight' : 1})
        if not created:
            result.weight = F('weight') + 1
            result.save()

Otherwise, you might be able to use the DB API executemany : 否则,您也许可以使用DB API executemany

cursor.executemany('INSERT INTO vectorset (field1, field2, field3) VALUES (?, ?, ?)', data)

Logic is simple, but we need to hit DB several times, which means several queries: 逻辑很简单,但是我们需要多次访问数据库,这意味着要执行几个查询:

qs = VectorSet.objects.filter(word__in=words, session=session)
qs.update(weiget=models.F('weight')+1)
VectorSet.objects.bulk_insert(VectorSet(session=session, word=w, weight=1)
  for w in words if w not in qs.value_list('word', flat=True))

There is also a update_or_create in Django 1.7, but currently it does not distinguish defaults for update from defaults for create: Django 1.7中也有一个update_or_create ,但是当前它不能区分update的默认值和create的默认值:

for w in words:
    VectorSet.objects.update_or_create(word=w, session=session,
                                    defaults={'weight': models.F('weight')+1})

Thus the above code will fail in creating by int(models.F('weight')+1) (We could override the __int__ method, but too hack to make sense...IMO) 因此,上述代码将无法通过int(models.F('weight')+1) (我们可以覆盖__int__方法,但太过分__int__ ……IMO)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM