简体   繁体   English

Django-非常慢的查询

[英]Django - very slow query

I've made a module which parses xml file and updates or creates data in django db (pgsql). 我已经制作了一个模块,可以解析xml文件并在django db(pgsql)中更新或创建数据。

When the data import/update is done I try to update some meta data of my objects. 数据导入/更新完成后,我尝试更新对象的一些元数据。

I use django-mptt for tree structures and my meta-data updater is for creating such structures between my objects. 我将django-mptt用于树结构,而元数据更新程序则用于在对象之间创建此类结构。

It's really really slow it takes about 1 second to populate parent with data from other foreignkey. 用来自其他外键的数据填充父项确实很慢,大约需要1秒。

How do I optimise this? 我该如何优化呢?

for index, place in enumerate(Place.objects.filter(type=Place.TOWN, town_id_equal=True)):
    place.parent = place.second_order_division
    place.save()

    print index
    if index % 5000 == 0:
        transaction.commit()
transaction.commit()

transaction.set_autocommit(False)
for index, place in enumerate(Place.objects.filter(type=Place.TOWN, town_id_equal=False,
                                                   parent__isnull=True)):

    place.parent = Place.objects.get(town_id=place.town_id_extra)
    place.save()

    print index
    if index % 5000 == 0:
        transaction.commit()
transaction.commit()


class Place(MPTTModel):
    first_order_division = models.ForeignKey("self", null=True, blank=True, verbose_name=u"Województwo",
                                             related_name="voivodeships")
    second_order_division = models.ForeignKey("self", null=True, blank=True, verbose_name=u"Powiat",
                                              related_name="counties")
    parent = TreeForeignKey('self', null=True, blank=True, related_name='children')

Edit: 编辑:

I updated first function like this: 我更新了第一个函数,如下所示:

transaction.set_autocommit(False)
for index, obj in enumerate(Place.objects.filter(type=Place.COUNTY)):
    data = Place.objects.filter(second_order_division=obj, type=Place.TOWN, town_id_equal=True)
    data.update(parent=obj)
    print index
    transaction.commit()

Instead of using loop you should do bulk updates like 而不是使用循环,您应该执行批量更新, 例如

for first transaction you can replace your transaction with this Django query: 对于第一笔交易,您可以使用以下Django查询替换您的交易:

Place.objects.filter(type=Place.TOWN, town_id_equal=True).update(parent=F('second_order_division'))

For second transaction we can not apply bulk update because of again query on Place model. 对于第二笔交易,由于再次查询Place模型,因此我们无法应用批量更新。 for this you should do something to save hitting 'Place.objects.get(town_id=place.town_id_extra)' query each time in loop. 为此,您应该做一些保存操作,以确保每次循环都点击“ Place.objects.get(town_id = place.town_id_extra)”查询。

or can take help from this blog 或可以从此博客获得帮助

Answering a more general question, one tactic to improve performance of almost any type of system is: 在回答一个更普遍的问题时,提高几乎所有类型的系统性能的一种策略是:

Minimize interaction between the dynamic parts of your system 最小化系统动态部分之间的交互

That's it: minimize interaction through HTTP requests, database queries, etc. In your case, you are doing multiple queries to your database that can be easily reduced to fewer (perhaps one or two). 就是这样:最小化通过HTTP请求,数据库查询等进行的交互。在您的情况下,您正在对数据库执行多个查询,而这些查询可以轻松地减少到更少(也许一两个)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM