[英]How should I import a large amount of data to mysql via nodejs?
I am building a custom import tool and wondering what would be the best practice of importing a large amount of data. 我正在构建一个自定义导入工具,并想知道导入大量数据的最佳实践是什么。 I have the following JSON data structure with a minimum of 500 products across 30 days per import.
我具有以下JSON数据结构,每次导入30天中最少要有500种产品。
"rows": [{
"product_uid": "k110",
"sale_date": "2018-06-06",
"amount": 15
}, {
"product_uid": "k111",
"sale_date": "2018-06-06",
"amount": 22
}, {
"product_uid": "k110",
"sale_date": "2018-06-07",
"amount": 30
}
]
The schema for the table as follows: 该表的架构如下:
daily_sales_id - product_uid - sale_date - amount
I am using nodejs mysql to execute multiple SQL statements in a single connection. 我正在使用nodejs mysql在单个连接中执行多个SQL语句。 It works well for inserting the rows at the first time but in subsequent tries, it will insert duplicate rows.
它适合第一次插入行,但在随后的尝试中,它将插入重复的行。 I can truncate the table before inserting but this will fail if the user decide to import a delta snapshot instead of the entire records.
我可以在插入之前截断表,但是如果用户决定导入增量快照而不是整个记录,则此操作将失败。
While I can do a for-loop to check if record exist and do an update instead of insert, looping through 15,000+ records and creating 15,000+ select connections doesn't seem to be a good idea. 尽管我可以进行for循环来检查记录是否存在,并执行更新而不是插入操作,但是遍历15,000多个记录并创建15,000+的select连接似乎不是一个好主意。
Is there any other alternatives where I can keep the data structure and perform an update/insert without looping through 15,000+ records? 还有其他选择可以保持数据结构并执行更新/插入而无需遍历15,000多个记录吗? The import csv file doesn't know the daily_sales_id.
导入的csv文件不知道daily_sales_id。
One option here would be to add a unique index on the columns which define a record in your table as being duplicate, something like this: 这里的一种选择是在列上添加一个唯一索引,该索引将表中的记录定义为重复记录,如下所示:
CREATE UNIQUE INDEX your_idx ON yourTable(product_uid, sale_date);
The net result of this is that an insert which attempted to add a new record with a product_uid
/ sale_date
combination which already existed in the table would fail at the database level. 这样做的最终结果是,试图在表中已经存在的
product_uid
/ sale_date
组合中添加新记录的插入操作将在数据库级别失败。 You of course would need some Node.js code to handle this, but that should not be very difficult. 当然,您将需要一些Node.js代码来处理此问题,但这并不难。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.