简体   繁体   English

我应该如何通过nodejs将大量数据导入mysql?

[英]How should I import a large amount of data to mysql via nodejs?

I am building a custom import tool and wondering what would be the best practice of importing a large amount of data. 我正在构建一个自定义导入工具,并想知道导入大量数据的最佳实践是什么。 I have the following JSON data structure with a minimum of 500 products across 30 days per import. 我具有以下JSON数据结构,每次导入30天中最少要有500种产品。

"rows": [{
        "product_uid": "k110",
        "sale_date": "2018-06-06",
        "amount": 15
    }, {
        "product_uid": "k111",
        "sale_date": "2018-06-06",
        "amount": 22
    }, {
        "product_uid": "k110",
        "sale_date": "2018-06-07",
        "amount": 30
    }
]

The schema for the table as follows: 该表的架构如下:

daily_sales_id - product_uid - sale_date - amount

I am using nodejs mysql to execute multiple SQL statements in a single connection. 我正在使用nodejs mysql在单个连接中执行多个SQL语句。 It works well for inserting the rows at the first time but in subsequent tries, it will insert duplicate rows. 它适合第一次插入行,但在随后的尝试中,它将插入重复的行。 I can truncate the table before inserting but this will fail if the user decide to import a delta snapshot instead of the entire records. 我可以在插入之前截断表,但是如果用户决定导入增量快照而不是整个记录,则此操作将失败。

While I can do a for-loop to check if record exist and do an update instead of insert, looping through 15,000+ records and creating 15,000+ select connections doesn't seem to be a good idea. 尽管我可以进行for循环来检查记录是否存在,并执行更新而不是插入操作,但是遍历15,000多个记录并创建15,000+的select连接似乎不是一个好主意。

Is there any other alternatives where I can keep the data structure and perform an update/insert without looping through 15,000+ records? 还有其他选择可以保持数据结构并执行更新/插入而无需遍历15,000多个记录吗? The import csv file doesn't know the daily_sales_id. 导入的csv文件不知道daily_sales_id。

One option here would be to add a unique index on the columns which define a record in your table as being duplicate, something like this: 这里的一种选择是在列上添加一个唯一索引,该索引将表中的记录定义为重复记录,如下所示:

CREATE UNIQUE INDEX your_idx ON yourTable(product_uid, sale_date);

The net result of this is that an insert which attempted to add a new record with a product_uid / sale_date combination which already existed in the table would fail at the database level. 这样做的最终结果是,试图在表中已经存在的product_uid / sale_date组合中添加新记录的插入操作将在数据库级别失败。 You of course would need some Node.js code to handle this, but that should not be very difficult. 当然,您将需要一些Node.js代码来处理此问题,但这并不难。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM