[英]Aggregating/optimizing object.save()?
I'm working on import feature which allows user to create django database models from selected csv file. 我正在使用导入功能,该功能允许用户从选定的csv文件创建django数据库模型。
Models are related which each other with foreign keys and many-to-many fields. 模型之间通过外键和多对多字段相互关联。 There is a lot
有很多
object.save()
and Object.objects.get(...)
in my code which, I suppose, cause it to run so slow. 和我的代码中的
Object.objects.get(...)
,我想这会使它运行得如此缓慢。
When an error (for example integrity error) occurs, I need all the changes in database to be rolled back. 当发生错误(例如完整性错误)时,我需要回滚数据库中的所有更改。 So I'm using
所以我在用
transaction.atomic
decorator on my view and it works fine. 我认为装饰器,它工作正常。
The problem is, my import is really slow. 问题是,我的导入确实很慢。 Parsing file containing ~2000 lines (which could possibly add about 1000 objects to my database) takes about 3 minutes, which is too long.
解析包含约2000行的文件(可能会向我的数据库中添加大约1000个对象)大约需要3分钟,这太长了。
Is there a way to make it faster? 有没有办法使其更快? I've read about
我读过
bulk_create
function, but "It does not work with many-to-many relationships.". 功能,但“不适用于多对多关系”。
If this is important, I'm using postgresql. 如果这很重要,那么我正在使用postgresql。
EDIT: File structure looks like this: 编辑:文件结构看起来像这样:
subject_name
day [A/B] begins_at - ends_at;lecturer_info
Then multiple lines like: 然后是多行:
student_uid;student_info
Ok, here's the code. 好的,这是代码。
def csv_import(market, csv_file):
lines = [line.strip().decode('utf-8') for line in csv_file.readlines()]
lines = [line for line in lines if line]
pattern = re.compile(r'[0-9]+;.+')
week_days = {
'monday': 0,
.
.
.
}
term, subject, lecturer, student = None, None, None, None
for number, line in enumerate(lines):
if not ';' in line:
subject = Subject(subject_id=number, name=line, market=market)
subject.save()
elif not pattern.match(line):
term_info, lecturer_info = line.split(';') # term_info - 'day begins_at - ends_at', lecturer_info - lecturer
term_info = term_info.replace(' - ', ' ').split()
term = Term(term_id=number, subject=subject, day=week_days[term_info[0]], begin_at=term_info[-2],
ends_at=term_info[-1])
if len(term_info) == 4:
term.week = term_info[1]
lecturer_info = lecturer_info.rsplit(' ', 1)
try:
lecturer = Lecturer.objects.get(first_name=lecturer_info[0], last_name=lecturer_info[1])
except Lecturer.DoesNotExist:
lecturer = Lecturer(first_name=lecturer_info[0], last_name=lecturer_info[1])
lecturer.save()
term.lecturer = lecturer
term.save()
else:
gradebook_id, student_info = line.split(';')
student_info = student_info.rsplit(' ', 1)
try:
student = TMUser.objects.get(uid=int(gradebook_id))
except TMUser.DoesNotExist:
student = TMUser(uid=int(gradebook_id), username='student'+gradebook_id, first_name=student_info[0],
last_name=student_info[1], password=make_password('passwd'), user_group='user')
student.save()
student.terms.add(term)
student.save()
This is some pseudo code to show you the basic idea of what I meant by caching results: 这是一些伪代码,向您展示缓存结果的基本含义:
cache = {}
for number, line in enumerate(lines):
...
elif not pattern.match(line):
...
term = Term(term_id=number, subject=subject, ...)
lecturer_id = (lecturer_info[0], lecturer_info[1]) #first name and last
if cache[lecturer_id]:
#retrieve from cache
lecturer = cache[lecturer_id]
else:
try:
lecturer = Lecturer.objects.get(first_name= lecturer_id[0], last_name= lecturer_id[1])
except Lecturer.DoesNotExist:
lecturer = Lecturer(first_name= lecturer_id[0], last_name= lecturer_id[1])
lecturer.save()
#add to cache
cache[lecturer_id] = lecturer
term.lecturer = lecturer
term.save()
#etc.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.