简体   繁体   English

将数千条记录插入表中的最有效方法是什么(MySQL,Python,Django)

[英]What's the most efficient way to insert thousands of records into a table (MySQL, Python, Django)

I have a database table with a unique string field and a couple of integer fields. 我有一个数据库表,其中包含唯一的字符串字段和几个整数字段。 The string field is usually 10-100 characters long. 字符串字段通常为10-100个字符长。

Once every minute or so I have the following scenario: I receive a list of 2-10 thousand tuples corresponding to the table's record structure, eg 每分钟左右我都有以下场景:我收到一个与表的记录结构对应的2-10万个元组的列表,例如

[("hello", 3, 4), ("cat", 5, 3), ...]

I need to insert all these tuples to the table (assume I verified neither of these strings appear in the database). 我需要将所有这些元组插入到表中(假设我验证这些字符串都不会出现在数据库中)。 For clarification, I'm using InnoDB, and I have an auto-incremental primary key for this table, the string is not the PK. 为了澄清,我正在使用InnoDB,并且我有一个自动增量主键用于此表,字符串不是PK。

My code currently iterates through this list, for each tuple creates a Python module object with the appropriate values, and calls ".save()", something like so: 我的代码当前遍历此列表,因为每个元组都会创建一个具有适当值的Python模块对象,并调用“.save()”,如下所示:

@transaction.commit_on_success
def save_data_elements(input_list):
    for (s, i1, i2) in input_list:
        entry = DataElement(string=s, number1=i1, number2=i2)
        entry.save()

This code is currently one of the performance bottlenecks in my system, so I'm looking for ways to optimize it. 此代码目前是我系统中的性能瓶颈之一,因此我正在寻找优化它的方法。

For example, I could generate SQL codes each containing an INSERT command for 100 tuples ("hard-coded" into the SQL) and execute it, but I don't know if it will improve anything. 例如,我可以生成SQL代码,每个代码包含100个元组的INSERT命令(“硬编码”到SQL中)并执行它,但我不知道它是否会改进任何东西。

Do you have any suggestion to optimize such a process? 您有什么建议来优化这样的过程吗?

Thanks 谢谢

对MySQL而言,加载数据的最快方法是使用LOAD DATA INFILE ,因此如果您可以将数据转换为期望的格式,那么它可能是将其放入表中的最快方式。

You can write the rows to a file in the format "field1", "field2", .. and then use LOAD DATA to load them 您可以使用“field1”,“field2”,..格式将行写入文件,然后使用LOAD DATA加载它们

data = '\n'.join(','.join('"%s"' % field for field in row) for row in data)
f= open('data.txt', 'w')
f.write(data)
f.close()

Then execute this: 然后执行:

LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Reference 参考

If you don't LOAD DATA INFILE as some of the other suggestions mention, two things you can do to speed up your inserts are : 如果你不LOAD DATA INFILE ,因为一些其他的建议提了,两件事情可以做,以加快你的刀片是:

  1. Use prepared statements - this cuts out the overhead of parsing the SQL for every insert 使用预准备语句 - 这会减少为每个插入解析SQL的开销
  2. Do all of your inserts in a single transaction - this would require using a DB engine that supports transactions (like InnoDB) 在单个事务中执行所有插入 - 这需要使用支持事务的数据库引擎(如InnoDB)

If you can do a hand-rolled INSERT statement, then that's the way I'd go. 如果你可以做一个手动INSERT语句,那就是我的方式。 A single INSERT statement with multiple value clauses is much much faster than lots of individual INSERT statements. 一个单一的INSERT有多个值条款语句不是很多个人的多快得多INSERT语句。

Regardless of the insert method, you will want to use the InnoDB engine for maximum read/write concurrency. 无论insert方法如何,您都希望使用InnoDB引擎实现最大的读/写并发性。 MyISAM will lock the entire table for the duration of the insert whereas InnoDB (under most circumstances) will only lock the affected rows, allowing SELECT statements to proceed. MyISAM将在插入期间锁定整个表,而InnoDB(在大多数情况下)将仅锁定受影响的行,允许SELECT语句继续。

what format do you receive? 你收到的格式是什么? if it is a file, you can do some sort of bulk load: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html 如果它是一个文件,你可以做一些批量加载: http//www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

This is unrelated to the actual load of data into the DB, but... 这与数据到DB的实际负载无关,但......

If providing a "The data is loading... The load will be done shortly" type of message to the user is an option, then you can run the INSERTs or LOAD DATA asynchronously in a different thread. 如果提供“数据正在加载......加载将很快完成”类型的消息给用户是一个选项,那么您可以在另一个线程中异步运行INSERT或LOAD DATA。

Just something else to consider. 还有别的东西需要考虑。

I donot know the exact details, but u can use json style data representation and use it as fixtures or something. 我不知道确切的细节,但你可以使用json样式数据表示并将其用作固定装置或其他东西。 I saw something similar on Django Video Workshop by Douglas Napoleone. 我在Douglas Napoleone的Django Video Workshop上看到过类似的东西。 See the videos at http://www.linux-magazine.com/online/news/django_video_workshop . 请参阅http://www.linux-magazine.com/online/news/django_video_workshop上的视频。 and http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1 . http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1 Hope this one helps. 希望这个有所帮助。

Hope you can work it out. 希望你能解决它。 I just started learning django, so I can just point you to resources. 我刚开始学习django,所以我可以指出你的资源。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Django 将数千条记录插入到 SQLite 表中的有效方法是什么? - What is an efficient way of inserting thousands of records into an SQLite table using Django? 在 Python 中连接数千个数据帧的最有效方法是什么? - What is the most efficient way to concatenate thousands of dataframes in Python? 什么是最有效的Python睡眠方式? - What's the most efficient way to sleep in Python? 使用Python避免循环更新MySQL表中数据的最有效方法是什么? - What is the most efficient way of updating data in a MySQL table using Python avoiding loops? 在 Django 的视图中增加 PositiveIntegerField 的最有效方法是什么? - What's the most efficient way to increase a PositiveIntegerField in a view in Django? Django:查询许多记录的最有效方法? - Django: most efficient way to query many records? 在Python中跨多个文件连接MySQL的最有效方法是什么? - What is the most efficient way to connect to MySQL across multiple files in Python? 在 Python 中压缩大表的最有效方法是什么 - What is the most efficient way of condensing a large table in Python 使用 Python 读取位于 S3 (AWS) 上的大型 CSV 文件(10 M+ 条记录)的最有效方法是什么? - What is the most efficient way to read a large CSV file ( 10 M+ records) located on S3 (AWS) with Python? 在python中生成集合组合的最有效内存方法是什么? - What's the most memory efficient way to generate the combinations of a set in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM