简体   繁体   English

在MYSQL中导入海量数据集CSV

[英]Import huge data set csv in MYSQL

I'm trying to import a huge data set from a csv file ~400MB with 900000 rows. 我正在尝试从具有900000行的〜400MB的csv文件导入庞大的数据集。 This file has the information of two relational tables. 该文件包含两个关系表的信息。 For example: 例如:

["primary_key","name","lastname","phone,"work_id","work_name"] [ “primary_key”, “姓名”, “姓氏”, “手机”,work_id”, “work_name”]

Every row i have to check if the primary key exists for insert or updated if needed, also i need to verify work, because new works can appear in this dataset. 我必须检查每一行是否存在要插入或更新的主键,如果需要,我还需要验证工作,因为新工作可以出现在此数据集中。

My person table has more colummns that the csv file has, so i can't replace the line with mysqlimport. 我的人员表具有csv文件所具有的更多列,因此我无法将其替换为mysqlimport。

Any ideas on how to work with this? 关于如何使用此功能的任何想法?

Please read the documentation for LOAD DATA INFILE ; 请阅读LOAD DATA INFILE的文档 it is a good choice for loading data, even very big files. 这是加载数据甚至大型文件的不错选择。 Quoting from Reference manual: Speed of insert statements : 引用参考手册:插入语句的速度

When loading a table from a text file, use LOAD DATA INFILE . 从文本文件加载表时,请使用LOAD DATA INFILE This is usually 20 times faster than using INSERT statements 这通常比使用INSERT语句快20倍

Assuming that your table as more columns than the .csv file, then you'd have to write something like this: 假设您的表中的列比.csv文件中的列多,那么您将必须编写如下内容:

load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, ...) -- The matching column list goes here

See my own question on the subject and its answer . 看到我对这个问题的自己的问题及其答案

Also, if you need faster inserts, you can: 此外,如果您需要更快的插入速度,则可以:

  • Ignore foreign key constraints, with SET foreign_key_checks = 0; 忽略外键约束, SET foreign_key_checks = 0; before executing load data , and/or 在执行load data之前,和/或
  • Disable the indexes of the table with alter table yourTable disable keys; 使用alter table yourTable disable keys;表的索引alter table yourTable disable keys; before executing load data , and rebuilding them afterwards with alter table yourTable enable keys; 在执行load data之前,然后使用alter table yourTable enable keys;重建alter table yourTable enable keys;

Untested: If your .csv file has more columns than your table, I think that you can assign the "exceeding" columns in the file to temp variables: 未经测试:如果您的.csv文件中的列多于表中的列,我认为您可以将文件中的“超出”列分配给临时变量:

load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, @dummyVar1, @dummyVar2, col4) -- The '@dummyVarX` variables
                                                 -- are simply place-holders for
                                                 -- columns in the .csv file that
                                                 -- don't match the columns in 
                                                 -- your table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM