将数千条记录插入表中的最有效方法是什么（MySQL，Python，Django）

Question

I have a database table with a unique string field and a couple of integer fields. 我有一个数据库表，其中包含唯一的字符串字段和几个整数字段。 The string field is usually 10-100 characters long. 字符串字段通常为10-100个字符长。

Once every minute or so I have the following scenario: I receive a list of 2-10 thousand tuples corresponding to the table's record structure, eg 每分钟左右我都有以下场景：我收到一个与表的记录结构对应的2-10万个元组的列表，例如

[("hello", 3, 4), ("cat", 5, 3), ...]

I need to insert all these tuples to the table (assume I verified neither of these strings appear in the database). 我需要将所有这些元组插入到表中（假设我验证这些字符串都不会出现在数据库中）。 For clarification, I'm using InnoDB, and I have an auto-incremental primary key for this table, the string is not the PK. 为了澄清，我正在使用InnoDB，并且我有一个自动增量主键用于此表，字符串不是PK。

My code currently iterates through this list, for each tuple creates a Python module object with the appropriate values, and calls ".save()", something like so: 我的代码当前遍历此列表，因为每个元组都会创建一个具有适当值的Python模块对象，并调用“.save（）”，如下所示：

@transaction.commit_on_success
def save_data_elements(input_list):
    for (s, i1, i2) in input_list:
        entry = DataElement(string=s, number1=i1, number2=i2)
        entry.save()

This code is currently one of the performance bottlenecks in my system, so I'm looking for ways to optimize it. 此代码目前是我系统中的性能瓶颈之一，因此我正在寻找优化它的方法。

For example, I could generate SQL codes each containing an INSERT command for 100 tuples ("hard-coded" into the SQL) and execute it, but I don't know if it will improve anything. 例如，我可以生成SQL代码，每个代码包含100个元组的INSERT命令（“硬编码”到SQL中）并执行它，但我不知道它是否会改进任何东西。

Do you have any suggestion to optimize such a process? 您有什么建议来优化这样的过程吗？

Thanks 谢谢

Answer 1

对MySQL而言，加载数据的最快方法是使用LOAD DATA INFILE ，因此如果您可以将数据转换为期望的格式，那么它可能是将其放入表中的最快方式。

Answer 2

You can write the rows to a file in the format "field1", "field2", .. and then use LOAD DATA to load them 您可以使用“field1”，“field2”，..格式将行写入文件，然后使用LOAD DATA加载它们

data = '\n'.join(','.join('"%s"' % field for field in row) for row in data)
f= open('data.txt', 'w')
f.write(data)
f.close()

Then execute this: 然后执行：

LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Reference 参考

Answer 3

If you don't LOAD DATA INFILE as some of the other suggestions mention, two things you can do to speed up your inserts are : 如果你不LOAD DATA INFILE ，因为一些其他的建议提了，两件事情可以做，以加快你的刀片是：

Use prepared statements - this cuts out the overhead of parsing the SQL for every insert 使用预准备语句 - 这会减少为每个插入解析SQL的开销
Do all of your inserts in a single transaction - this would require using a DB engine that supports transactions (like InnoDB) 在单个事务中执行所有插入 - 这需要使用支持事务的数据库引擎（如InnoDB）

Answer 4

If you can do a hand-rolled INSERT statement, then that's the way I'd go. 如果你可以做一个手动INSERT语句，那就是我的方式。 A single INSERT statement with multiple value clauses is much much faster than lots of individual INSERT statements. 一个单一的INSERT有多个值条款语句不是很多个人的多快得多INSERT语句。

Answer 5

Regardless of the insert method, you will want to use the InnoDB engine for maximum read/write concurrency. 无论insert方法如何，您都希望使用InnoDB引擎实现最大的读/写并发性。 MyISAM will lock the entire table for the duration of the insert whereas InnoDB (under most circumstances) will only lock the affected rows, allowing SELECT statements to proceed. MyISAM将在插入期间锁定整个表，而InnoDB（在大多数情况下）将仅锁定受影响的行，允许SELECT语句继续。

Answer 6

what format do you receive? 你收到的格式是什么？ if it is a file, you can do some sort of bulk load: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html 如果它是一个文件，你可以做一些批量加载： http ： //www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

Answer 7

This is unrelated to the actual load of data into the DB, but... 这与数据到DB的实际负载无关，但......

If providing a "The data is loading... The load will be done shortly" type of message to the user is an option, then you can run the INSERTs or LOAD DATA asynchronously in a different thread. 如果提供“数据正在加载......加载将很快完成”类型的消息给用户是一个选项，那么您可以在另一个线程中异步运行INSERT或LOAD DATA。

Just something else to consider. 还有别的东西需要考虑。

Answer 8

I donot know the exact details, but u can use json style data representation and use it as fixtures or something. 我不知道确切的细节，但你可以使用json样式数据表示并将其用作固定装置或其他东西。 I saw something similar on Django Video Workshop by Douglas Napoleone. 我在Douglas Napoleone的Django Video Workshop上看到过类似的东西。 See the videos at http://www.linux-magazine.com/online/news/django_video_workshop . 请参阅http://www.linux-magazine.com/online/news/django_video_workshop上的视频。 and http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1 . 和http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1 。 Hope this one helps. 希望这个有所帮助。

Hope you can work it out. 希望你能解决它。 I just started learning django, so I can just point you to resources. 我刚开始学习django，所以我可以指出你的资源。

将数千条记录插入表中的最有效方法是什么（MySQL，Python，Django）

问题描述

8 个解决方案

解决方案1
12 2009-05-11 21:16:21

解决方案2
11 已采纳 2009-05-11 21:16:29

解决方案3
4 2009-05-11 21:51:33

解决方案4
4 2009-05-12 01:08:52

解决方案5
2 2009-05-12 06:38:43

解决方案6
1 2009-05-11 21:16:10

解决方案7
1 2009-05-12 02:23:39

解决方案8
1 2009-05-12 09:14:03

将数千条记录插入表中的最有效方法是什么（MySQL，Python，Django）

问题描述

8 个解决方案

解决方案1 12 2009-05-11 21:16:21

解决方案2 11 已采纳 2009-05-11 21:16:29

解决方案3 4 2009-05-11 21:51:33

解决方案4 4 2009-05-12 01:08:52

解决方案5 2 2009-05-12 06:38:43

解决方案6 1 2009-05-11 21:16:10

解决方案7 1 2009-05-12 02:23:39

解决方案8 1 2009-05-12 09:14:03

解决方案1
12 2009-05-11 21:16:21

解决方案2
11 已采纳 2009-05-11 21:16:29

解决方案3
4 2009-05-11 21:51:33

解决方案4
4 2009-05-12 01:08:52

解决方案5
2 2009-05-12 06:38:43

解决方案6
1 2009-05-11 21:16:10

解决方案7
1 2009-05-12 02:23:39

解决方案8
1 2009-05-12 09:14:03