mysql 定价表中大量连续（重新）导入和刷新数据

Question

I have a large dataset (~2.5 million rows), which needs to be (re-)imported continously into a MySQL table "price_list".我有一个大型数据集（约 250 万行），需要（重新）连续导入到 MySQL 表“price_list”中。 All tables are InnoDB.所有表都是 InnoDB。 Currently I'm using "LOAD DATA LOCAL INFILE", because those datasets come from csv files:目前我正在使用“LOAD DATA LOCAL INFILE”，因为这些数据集来自 csv 文件：

LOAD DATA LOCAL INFILE 'sample.csv'
INTO TABLE `price_list`
(...)
(...)
IGNORE 1 LINES

Example of a table in in db "price_list":数据库“price_list”中的表示例：

hotel_id | room_category            | price 1 person | price 2nd person     | <other meta info>

1        | single room (w/o window) | 150€           | 200€                 | ...
2        | single room (w window)   | 170€           | 220€                 | ...
3        | single room (rooftop)    | 240€           | 250€                 | ...
4        | single room (whirlpool)  | 200€           | 280€                 | ...
5        | double room (w/o window) | 200€           | 220€                 | ...
6        | double room (w window)   | 240€           | 260€                 | ...
7        | double room (rooftop)    | 280€           | 300€                 | ...
8        | double room (whirlpool)  | 320€           | 340€                 | ...
(...)

Based on this data I need to update table "offers" (the table sits in another database, user of has "price_list" no access to "offers" and its technically not possible to give the proper access) with refreshed pricings.基于这些数据，我需要更新表“offers”（该表位于另一个数据库中，“price_list”的用户无法访问“offers”，并且在技术上无法提供适当的访问权限）并使用更新的定价。 Pricings change a lot and we need to reimport those data every 15 minutes.价格变化很大，我们需要每 15 分钟重新导入这些数据。

id  |       offer_name           |   price_single_room    |    price_double_room
1   |  WHIRLPOOL OFFER SINGLES   |         200€           |          200€

In the above example the price for the "best" single room (with a whirlpool) was choosen (200€).在上面的示例中，选择了“最佳”单人间（带漩涡浴缸）的价格（200 欧元）。 The 2nd price is not needed in this offer, but is calculated on purpose (and can be deactived if wanted).此优惠中不需要第二个价格，而是有意计算的（如果需要，可以停用）。

My current solution is that I fetch all offers from the "offers" table in PHP, which are marked as active, and loop through them ( oof ).我目前的解决方案是，我从 PHP 中的“offers”表中获取所有被标记为活动的报价，并循环遍历它们（ oof ）。 Each offer has 6 columns for different pricings (eg hotel has different rooms available; column 1 is the best price for single room, column 2 is the best price for double room, ...) that needs to be looked up.每个报价有 6 列用于不同的定价（例如，酒店有不同的房间可供选择；第 1 列是单人间的最优惠价格，第 2 列是双人间的最优惠价格，......）需要查找。

Currently we have around 10.000 active offers which means I'm sending the following amount to the database server: 10.000 queries * 6 queries to look up best pricings per offer.目前我们有大约 10.000 个有效报价，这意味着我将向数据库服务器发送以下金额：10.000 个查询 * 6 个查询以查找每个报价的最佳定价。

When I execute those queries on the same server (no network latency etc.) the performance is not the worst (the whole job takes around 5 minutes (importing ~2.5m rows, refresh pricings, ...)), but since the data is growing, we want to split up database and webserver.当我在同一台服务器上执行这些查询时（没有网络延迟等），性能并不是最差的（整个工作大约需要 5 分钟（导入约 250 万行，刷新定价，...）），但是由于数据正在增长，我们要拆分数据库和网络服务器。 I now realized that the part where I'm refreshing the prices produces a lot of overhead with network and is very slow since each request from webserver to db server takes around 0.025s (25minutes only for refreshing prices).我现在意识到我刷新价格的部分会产生大量的网络开销并且非常慢，因为从网络服务器到数据库服务器的每个请求大约需要 0.025 秒（仅刷新价格需要 25 分钟）。

I thought about the following solution(s):我想到了以下解决方案：

move the table "offers" to the same database "price_list" = working, but still very slow since the database server is not on the same machine as the webserver = network latency is the bottleneck.将表“offers”移动到同一个数据库“price_list”=工作，但仍然很慢，因为数据库服务器与网络服务器不在同一台机器上=网络延迟是瓶颈。
write a stored procedure, which is triggered by PHP and the database server does the job.编写一个存储过程，由 PHP 触发，数据库服务器完成这项工作。

Does someone have experience with those recurring data loads and maybe a solution for my given problem?有人有处理这些重复数据加载的经验吗？也许是我给定问题的解决方案？ The goal is to reduce timings and split web and database server.目标是减少时间并拆分 Web 和数据库服务器。

Thanks!谢谢！

Answer 1

You have a table called `real;您有一张名为“real”的表； the following will replace it with a new table containing the fresh data.以下将用包含新数据的新表替换它。

CREATE TABLE t_new LIKE real;
LOAD DATA INFILE new ...;
RENAME TABLE real TO t_old,
             t_new TO real;
DROP TABLE t_old;

Notes:笔记：

⚈  The LOAD DATA step can be replaced by whatever process you have for importing the data.
⚈  The Loading is the only slow step.
⚈  The RENAME is atomic, so real always exists.
⚈  You may choose to delay the DROP in case the new data might be bad and you want to revert.
⚈  FOREIGN KEYs can be a hassle; it might be good not to have such.

-- http://mysql.rjweb.org/doc.php/deletebig#optimal_reload_of_a_table -- http://mysql.rjweb.org/doc.php/deletebig#optimal_reload_of_a_table

mysql 定价表中大量连续（重新）导入和刷新数据

问题描述

1 个解决方案

解决方案1
0 2020-02-04 22:44:33

mysql 定价表中大量连续（重新）导入和刷新数据

问题描述

1 个解决方案

解决方案1 0 2020-02-04 22:44:33

解决方案1
0 2020-02-04 22:44:33