在MYSQL中导入海量数据集CSV

Question

I'm trying to import a huge data set from a csv file ~400MB with 900000 rows. 我正在尝试从具有900000行的〜400MB的csv文件导入庞大的数据集。 This file has the information of two relational tables. 该文件包含两个关系表的信息。 For example: 例如：

["primary_key","name","lastname","phone,"work_id","work_name"] [ “primary_key”， “姓名”， “姓氏”， “手机”，work_id”， “work_name”]

Every row i have to check if the primary key exists for insert or updated if needed, also i need to verify work, because new works can appear in this dataset. 我必须检查每一行是否存在要插入或更新的主键，如果需要，我还需要验证工作，因为新工作可以出现在此数据集中。

My person table has more colummns that the csv file has, so i can't replace the line with mysqlimport. 我的人员表具有csv文件所具有的更多列，因此我无法将其替换为mysqlimport。

Any ideas on how to work with this? 关于如何使用此功能的任何想法？

Answer 1

Please read the documentation for LOAD DATA INFILE ; 请阅读LOAD DATA INFILE的文档； it is a good choice for loading data, even very big files. 这是加载数据甚至大型文件的不错选择。 Quoting from Reference manual: Speed of insert statements : 引用参考手册：插入语句的速度：

When loading a table from a text file, use LOAD DATA INFILE . 从文本文件加载表时，请使用LOAD DATA INFILE 。 This is usually 20 times faster than using INSERT statements 这通常比使用INSERT语句快20倍

Assuming that your table as more columns than the .csv file, then you'd have to write something like this: 假设您的表中的列比.csv文件中的列多，那么您将必须编写如下内容：

load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, ...) -- The matching column list goes here

See my own question on the subject and its answer . 看到我对这个问题的自己的问题及其答案。

Also, if you need faster inserts, you can: 此外，如果您需要更快的插入速度，则可以：

Ignore foreign key constraints, with SET foreign_key_checks = 0; 忽略外键约束， SET foreign_key_checks = 0; before executing load data , and/or 在执行load data之前，和/或
Disable the indexes of the table with alter table yourTable disable keys; 使用alter table yourTable disable keys;表的索引alter table yourTable disable keys; before executing load data , and rebuilding them afterwards with alter table yourTable enable keys; 在执行load data之前，然后使用alter table yourTable enable keys;重建alter table yourTable enable keys;

Untested: If your .csv file has more columns than your table, I think that you can assign the "exceeding" columns in the file to temp variables: 未经测试：如果您的.csv文件中的列多于表中的列，我认为您可以将文件中的“超出”列分配给临时变量：

load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, @dummyVar1, @dummyVar2, col4) -- The '@dummyVarX` variables
                                                 -- are simply place-holders for
                                                 -- columns in the .csv file that
                                                 -- don't match the columns in 
                                                 -- your table

在MYSQL中导入海量数据集CSV

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-08-07 19:17:01

在MYSQL中导入海量数据集CSV

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-08-07 19:17:01

解决方案1
1 已采纳 2014-08-07 19:17:01