简体   繁体   English

mysql 定价表中大量连续(重新)导入和刷新数据

[英]Huge continous (re)import and refresh of data in pricing table in mysql

I have a large dataset (~2.5 million rows), which needs to be (re-)imported continously into a MySQL table "price_list".我有一个大型数据集(约 250 万行),需要(重新)连续导入到 MySQL 表“price_list”中。 All tables are InnoDB.所有表都是 InnoDB。 Currently I'm using "LOAD DATA LOCAL INFILE", because those datasets come from csv files:目前我正在使用“LOAD DATA LOCAL INFILE”,因为这些数据集来自 csv 文件:

LOAD DATA LOCAL INFILE 'sample.csv'
INTO TABLE `price_list`
(...)
(...)
IGNORE 1 LINES

Example of a table in in db "price_list":数据库“price_list”中的表示例:

hotel_id | room_category            | price 1 person | price 2nd person     | <other meta info>

1        | single room (w/o window) | 150€           | 200€                 | ...
2        | single room (w window)   | 170€           | 220€                 | ...
3        | single room (rooftop)    | 240€           | 250€                 | ...
4        | single room (whirlpool)  | 200€           | 280€                 | ...
5        | double room (w/o window) | 200€           | 220€                 | ...
6        | double room (w window)   | 240€           | 260€                 | ...
7        | double room (rooftop)    | 280€           | 300€                 | ...
8        | double room (whirlpool)  | 320€           | 340€                 | ...
(...)

Based on this data I need to update table "offers" (the table sits in another database, user of has "price_list" no access to "offers" and its technically not possible to give the proper access) with refreshed pricings.基于这些数据,我需要更新表“offers”(该表位于另一个数据库中,“price_list”的用户无法访问“offers”,并且在技术上无法提供适当的访问权限)并使用更新的定价。 Pricings change a lot and we need to reimport those data every 15 minutes.价格变化很大,我们需要每 15 分钟重新导入这些数据。

id  |       offer_name           |   price_single_room    |    price_double_room
1   |  WHIRLPOOL OFFER SINGLES   |         200€           |          200€

In the above example the price for the "best" single room (with a whirlpool) was choosen (200€).在上面的示例中,选择了“最佳”单人间(带漩涡浴缸)的价格(200 欧元)。 The 2nd price is not needed in this offer, but is calculated on purpose (and can be deactived if wanted).此优惠中不需要第二个价格,而是有意计算的(如果需要,可以停用)。

My current solution is that I fetch all offers from the "offers" table in PHP, which are marked as active, and loop through them ( oof ).我目前的解决方案是,我从 PHP 中的“offers”表中获取所有被标记为活动的报价,并循环遍历它们( oof )。 Each offer has 6 columns for different pricings (eg hotel has different rooms available; column 1 is the best price for single room, column 2 is the best price for double room, ...) that needs to be looked up.每个报价有 6 列用于不同的定价(例如,酒店有不同的房间可供选择;第 1 列是单人间的最优惠价格,第 2 列是双人间的最优惠价格,......)需要查找。

Currently we have around 10.000 active offers which means I'm sending the following amount to the database server: 10.000 queries * 6 queries to look up best pricings per offer.目前我们有大约 10.000 个有效报价,这意味着我将向数据库服务器发送以下金额:10.000 个查询 * 6 个查询以查找每个报价的最佳定价。

When I execute those queries on the same server (no network latency etc.) the performance is not the worst (the whole job takes around 5 minutes (importing ~2.5m rows, refresh pricings, ...)), but since the data is growing, we want to split up database and webserver.当我在同一台服务器上执行这些查询时(没有网络延迟等),性能并不是最差的(整个工作大约需要 5 分钟(导入约 250 万行,刷新定价,...)),但是由于数据正在增长,我们要拆分数据库和网络服务器。 I now realized that the part where I'm refreshing the prices produces a lot of overhead with network and is very slow since each request from webserver to db server takes around 0.025s (25minutes only for refreshing prices).我现在意识到我刷新价格的部分会产生大量的网络开销并且非常慢,因为从网络服务器到数据库服务器的每个请求大约需要 0.025 秒(仅刷新价格需要 25 分钟)。

I thought about the following solution(s):我想到了以下解决方案:

  • move the table "offers" to the same database "price_list" = working, but still very slow since the database server is not on the same machine as the webserver = network latency is the bottleneck.将表“offers”移动到同一个数据库“price_list”=工作,但仍然很慢,因为数据库服务器与网络服务器不在同一台机器上=网络延迟是瓶颈。

  • write a stored procedure, which is triggered by PHP and the database server does the job.编写一个存储过程,由 PHP 触发,数据库服务器完成这项工作。

Does someone have experience with those recurring data loads and maybe a solution for my given problem?有人有处理这些重复数据加载的经验吗?也许是我给定问题的解决方案? The goal is to reduce timings and split web and database server.目标是减少时间并拆分 Web 和数据库服务器。

Thanks!谢谢!

You have a table called `real;您有一张名为“real”的表; the following will replace it with a new table containing the fresh data.以下将用包含新数据的新表替换它。

CREATE TABLE t_new LIKE real;
LOAD DATA INFILE new ...;
RENAME TABLE real TO t_old,
             t_new TO real;
DROP TABLE t_old;

Notes:笔记:

⚈  The LOAD DATA step can be replaced by whatever process you have for importing the data.
⚈  The Loading is the only slow step.
⚈  The RENAME is atomic, so real always exists.
⚈  You may choose to delay the DROP in case the new data might be bad and you want to revert.
⚈  FOREIGN KEYs can be a hassle; it might be good not to have such.

-- http://mysql.rjweb.org/doc.php/deletebig#optimal_reload_of_a_table -- http://mysql.rjweb.org/doc.php/deletebig#optimal_reload_of_a_table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM