简体   繁体   English

聚合/优化object.save()?

[英]Aggregating/optimizing object.save()?

I'm working on import feature which allows user to create django database models from selected csv file. 我正在使用导入功能,该功能允许用户从选定的csv文件创建django数据库模型。

Models are related which each other with foreign keys and many-to-many fields. 模型之间通过外键和多对多字段相互关联。 There is a lot 有很多

object.save()

and Object.objects.get(...) in my code which, I suppose, cause it to run so slow. 和我的代码中的Object.objects.get(...) ,我想这会使它运行得如此缓慢。

When an error (for example integrity error) occurs, I need all the changes in database to be rolled back. 当发生错误(例如完整性错误)时,我需要回滚数据库中的所有更改。 So I'm using 所以我在用

transaction.atomic 

decorator on my view and it works fine. 我认为装饰器,它工作正常。

The problem is, my import is really slow. 问题是,我的导入确实很慢。 Parsing file containing ~2000 lines (which could possibly add about 1000 objects to my database) takes about 3 minutes, which is too long. 解析包含约2000行的文件(可能会向我的数据库中添加大约1000个对象)大约需要3分钟,这太长了。

Is there a way to make it faster? 有没有办法使其更快? I've read about 我读过

bulk_create

function, but "It does not work with many-to-many relationships.". 功能,但“不适用于多对多关系”。

If this is important, I'm using postgresql. 如果这很重要,那么我正在使用postgresql。

EDIT: File structure looks like this: 编辑:文件结构看起来像这样:

subject_name
day [A/B] begins_at - ends_at;lecturer_info  

Then multiple lines like: 然后是多行:

student_uid;student_info  

Ok, here's the code. 好的,这是代码。

def csv_import(market, csv_file):
    lines = [line.strip().decode('utf-8') for line in csv_file.readlines()]
    lines = [line for line in lines if line]
    pattern = re.compile(r'[0-9]+;.+')   

    week_days = {
        'monday': 0,
        .  
        .
        .
    }

    term, subject, lecturer, student = None, None, None, None

    for number, line in enumerate(lines):
        if not ';' in line:
            subject = Subject(subject_id=number, name=line, market=market)
            subject.save()
        elif not pattern.match(line):
            term_info, lecturer_info = line.split(';')  # term_info - 'day begins_at - ends_at', lecturer_info - lecturer
            term_info = term_info.replace(' - ', ' ').split()
            term = Term(term_id=number, subject=subject, day=week_days[term_info[0]], begin_at=term_info[-2],
                        ends_at=term_info[-1])

            if len(term_info) == 4:
                term.week = term_info[1]

            lecturer_info = lecturer_info.rsplit(' ', 1)
            try:
                lecturer = Lecturer.objects.get(first_name=lecturer_info[0], last_name=lecturer_info[1])
            except Lecturer.DoesNotExist:
                lecturer = Lecturer(first_name=lecturer_info[0], last_name=lecturer_info[1])
                lecturer.save()

            term.lecturer = lecturer

            term.save()
        else:
            gradebook_id, student_info = line.split(';')
            student_info = student_info.rsplit(' ', 1)
            try:
                student = TMUser.objects.get(uid=int(gradebook_id))
            except TMUser.DoesNotExist:
                student = TMUser(uid=int(gradebook_id), username='student'+gradebook_id, first_name=student_info[0],
                                 last_name=student_info[1], password=make_password('passwd'), user_group='user')
                student.save()
            student.terms.add(term)
            student.save()

This is some pseudo code to show you the basic idea of what I meant by caching results: 这是一些伪代码,向您展示缓存结果的基本含义:

cache = {}

for number, line in enumerate(lines):
   ...
   elif not pattern.match(line):
      ...
      term = Term(term_id=number, subject=subject, ...)

      lecturer_id = (lecturer_info[0], lecturer_info[1])   #first name and last
      if cache[lecturer_id]:
         #retrieve from cache
         lecturer = cache[lecturer_id]
      else:
         try:
            lecturer = Lecturer.objects.get(first_name= lecturer_id[0], last_name= lecturer_id[1])
         except Lecturer.DoesNotExist:
            lecturer = Lecturer(first_name= lecturer_id[0], last_name= lecturer_id[1])
            lecturer.save()
         #add to cache
         cache[lecturer_id] = lecturer

      term.lecturer = lecturer
      term.save()   

      #etc.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM