简体   繁体   English

MySQL在值更改时插入新行

[英]MySQL insert new row on value change

For a personal project I'm working on right now I want to make a line graph of game prices on Steam, Impulse, EA Origins, and several other sites over time. 对于我正在进行的个人项目,我想在Steam,Impulse,EA Origins和其他几个站点上制作游戏价格线图。 At the moment I've modified a script used by SteamCalculator.com to record the current price (sale price if applicable) for every game in every country code possible or each of these sites. 目前,我已经修改了SteamCalculator.com使用的脚本,以记录每个国家/地区代码或每个网站中每个游戏的当前价格(如果适用的话,销售价格)。 I also have a column for the date in which the price was stored. 我还有一个列存储价格的日期。 My current tables look something like so: 我当前的表看起来像这样:

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
+----------+------+------+------+------+------+------+------------+

At the moment each country is updated separately (there's a for loop going through the countries), although if it would simplify it then this could be modified to temporarily store new prices to an array then update an entire row at a time. 目前,每个国家/地区都是单独更新的(有一个for循环通过这些国家/地区),但如果它会简化它,那么可以修改它以暂时将新价格存储到数组中,然后一次更新整行。 I'll likely be doing this eventually, anyway, for performance reasons. 无论如何,出于性能原因,我最终可能会这样做。

Now my issue is determining how to best update this table if one of the prices changes. 现在我的问题是确定如果其中一个价格发生变化,如何最好地更新此表。 For instance, let's suppose that on 8/22/2011 the game 112233 goes on sale in America for $4.99, Austria for 3.99€, and the other prices remain the same. 例如,假设2011年8月22日,游戏112233在美国上市,售价4.99美元,奥地利售价3.99欧元,其他价格保持不变。 I would need the table to look like so: 我需要这个表看起来像这样:

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
|  112233  |  499 |  399 |  999 | NULL |  899 |  699 |  2011-8-22 |
+----------+------+------+------+------+------+------+------------+

I don't want to create a new row EVERY time the price is checked, otherwise I'll end up having millions of rows of repeated prices day after day. 我不希望创建一个新的行价格检查一次,否则我将结束一天后,有上百万重复价格一天行。 I also don't want to create a new row per changed price like so: 我也不想为每个更改的价格创建一个新行,如下所示:

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
|  112233  |  499 |  899 |  999 | NULL |  899 |  699 |  2011-8-22 |
|  112233  |  499 |  399 |  999 | NULL |  899 |  699 |  2011-8-22 |
+----------+------+------+------+------+------+------+------------+

I can prevent the first problem but not the second by making each (steam_id, <country>) a unique index then adding ON DUPLICATE KEY UPDATE to every database query. 我可以通过使每个(steam_id, <country>)成为唯一索引然后将ON DUPLICATE KEY UPDATE添加到每个数据库查询来防止第一个问题而不是第二个问题。 This will only add a row if the price is different, however it will add a new row for each country which changes. 如果价格不同,这只会添加一行,但是会为每个更改的国家/地区添加新行。 It also does not allow the same price for a single game for two different days (for instance, suppose game 112233 goes off sale later and returns to $9.99) so this is clearly an awful option. 它也不允许两个不同日期的单个游戏的相同价格(例如,假设游戏112233稍后销售并返回到9.99美元),所以这显然是一个糟糕的选择。

I can prevent the second problem but not the first by making (steam_id, date) a unique index then adding ON DUPLICATE KEY UPDATE to every query. 我可以通过使(steam_id, date)成为唯一索引然后将ON DUPLICATE KEY UPDATE添加到每个查询来防止第二个问题而不是第一个问题。 Every single day when the script is run the date has changed, so it will create a new row. 运行脚本的每一天日期都已更改,因此它将创建一个新行。 This method ends up with hundreds of lines of the same prices from day to day. 这种方法每天都会有数百条相同价格的生产线。

How can I tell MySQL to create a new row if (and only if) any of the prices has changed since the latest date? 如果(并且仅当)自最近日期以来任何价格发生变化,我如何告诉MySQL创建新行?

UPDATE - 更新 -

At the recommendation of people in this thread I have changed the schema of my database to facilitate adding new country codes in the future and avoid the issue of needing to update entire rows at a time. 根据此线程中人员的建议,我已更改了数据库的架构,以便将来添加新的国家/地区代码,并避免一次需要更新整个行的问题。 The new schema looks something like: 新架构看起来像:

+----------+------+---------+------------+
| steam_id |  cc  |  price  |    date    |
+----------+------+---------+------------+
|  112233  |  us  |   999   |  2011-8-21 |
|  123456  |  uk  |   699   |  2011-8-20 |
|    ...   |  ... |   ...   |     ...    |
+----------+------+---------+------------+

On top of this new schema I have discovered that I can use the following SQL query to grab the price from the most recent update: 在这个新架构的基础上,我发现我可以使用以下SQL查询来获取最新更新的价格:

SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1

At this point my question boils down to this: 在这一点上,我的问题归结为:

Is it possible to (using only SQL rather than application logic) insert a row only if a condition is true? 是否可以(仅使用SQL而不是应用程序逻辑)仅在条件为真时插入行? For instance: 例如:

INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)

From the MySQL manual I can not find any way to do this. MySQL手册我找不到任何方法来做到这一点。 I have only found that you can ignore or update if a unique index is the same. 我发现如果唯一索引相同,您可以忽略或更新。 However if I made the price a unique index (allowing me to update the date if it was the same) then I would not be able to recognize when a game went on sale and then returned to its original price. 但是,如果我将价格作为一个独特的索引(允许我更新日期,如果它是相同的),那么我将无法识别游戏何时开始销售,然后返回其原始价格。 For instance: 例如:

+----------+------+---------+------------+
| steam_id |  cc  |  price  |    date    |
+----------+------+---------+------------+
|  112233  |  us  |   999   |  2011-8-20 |
|  112233  |  us  |   499   |  2011-8-21 |
|  112233  |  us  |   999   |  2011-8-22 |
|    ...   |  ... |   ...   |     ...    |
+----------+------+---------+------------+

Also, after just finding and reading MySQL Conditional INSERT , I created and tried the following query: 此外,在找到并阅读MySQL Conditional INSERT之后 ,我创建并尝试了以下查询:

INSERT INTO `steam_prices`(
    `steam_id`,
    `cc`,
    `update`,
    `price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
    `price`<>999
    AND `update` IN (
        SELECT `update`
        FROM `steam_prices`
        ORDER BY `update`
        ASC LIMIT 1
    )

The idea was to insert the row '7870', 'us', NOW(), 999 if (and only if) the price of the most recent update wasn't 999. When I ran this I got the following error: 想法是插入行'7870', 'us', NOW(), 999如果(并且仅当)最近updateprice不是999.当我运行它时,我得到以下错误:

1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' 1235 - 此版本的MySQL尚不支持'LIMIT&IN / ALL / ANY / SOME子查询'

Any ideas? 有任何想法吗?

You will probably find this easier if you simply change your schema to something like: 如果您只是将模式更改为以下内容,您可能会发现这更容易:

steam_id      integer
country       varchar(2)
date          date
price         float
primary key   (steam_id,country,date)

(with other appropriate indexes) and then only worrying about each country in turn. (与其他适当的指数)然后只是依次担心每个国家。

In other words, your for loop has a unique ID/country combo so it can simply query the latest-date record for that combo and add a new row if it's different. 换句话说,你的for循环有一个唯一的ID / country组合,所以它可以简单地查询该组合的最新日期记录,如果它不同则添加一个新行。

That will make your selections a little more complicated but I believe it's a better solution, especially if there's any chance at all that more countries may be added in future (it won't break the schema in that case). 这将使您的选择更复杂,但我相信这是一个更好的解决方案, 特别是如果将来有更多的国家可以添加更多的国家(在这种情况下它不会破坏架构)。

First, I suggest you store your data in a form that is is less hard-coded per country: 首先,我建议您将数据存储在每个国家/地区硬编码较少的表单中:

+----------+--------------+------------+-------+
| steam_id | country_code | date       | price |
+----------+--------------+------------+-------+
|   112233 | us           | 2011-08-20 | 12.45 |
|   112233 | uk           | 2011-08-20 | 12.46 |
|   112233 | de           | 2011-08-20 | 12.47 |
|   112233 | at           | 2011-08-20 | 12.48 |
|   112233 | us           | 2011-08-21 | 12.49 |
|   ...... | ..           | .......... | ..... |
+----------+--------------+------------+-------+

From here, you place a primary key on the first three columns... 从这里开始,在前三列放置一个主键......

Now for your question about not creating extra rows... That is what a simple transaction + application logic is great at. 现在关于不创建额外行的问题......这就是简单的事务+应用程序逻辑的优点。

  1. Start a transaction 开始交易
  2. Run a select to see if the record in question is there 运行选择以查看相关记录是否存在
  3. If not, insert one 如果没有,请插入一个

Was there a problem with that approach? 这种方法有问题吗?

Hope this helps. 希望这可以帮助。

After experimentation, and with some help from MySQL Conditional INSERT and http://www.artfulsoftware.com/infotree/queries.php#101 , I found a query that worked: 经过实验,并在MySQL Conditional INSERThttp://www.artfulsoftware.com/infotree/queries.php#101的帮助下,我找到了一个有效的查询:

INSERT INTO `steam_prices`( 
    `steam_id`, 
    `cc`, 
    `price`,
    `update` 
) 
SELECT 7870, 'us', 999, NOW() 
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE 
    p2.`steam_id` IS NULL
    AND p1.`steam_id`=7870
    AND p1.`cc`='us'
    AND (
        p1.`price`<>999
    )

The answer is to first return all rows where there is no earlier timestamp. 答案是首先返回没有早期时间戳的所有行。 This is done with a within-group aggregate . 这是通过组内聚合完成的 You join a table with itself only on rows where the timestamp is earlier. 只在时间戳较早的行上加入一个表。 If it fails to join (the timestamp was not earlier) then you know that row contains the latest timestamp. 如果它无法加入(时间戳不早),那么您知道该行包含最新的时间戳。 These rows will have a NULL id in the joined table (failed to join). 这些行在连接表中将具有NULL标识(无法连接)。

After you have selected all rows with the latest timestamp, grab only those rows where the steam_id is the steam_id you're looking for and where the price is different from the new price that you're entering. 选择具有最新时间戳的所有行后,仅抓取steam_id为您正在查找的steam_id的行以及价格与您输入的新价格不同的行。 If there are no rows with a different price for that game at this point then the price has not changed since the last update, so an empty set is returned. 如果此时该游戏的行没有不同的行,则自上次更新后价格没有变化,因此返回空集。 When an empty set is returned the SELECT statement fails and nothing is inserted. 返回空集时,SELECT语句将失败,并且不会插入任何内容。 If the SELECT statement succeeds (a different price was found) then it returns the row 7870, 'us', 999, NOW() which is inserted into our table. 如果SELECT语句成功(找到了不同的价格),则返回插入到表中的行7870, 'us', 999, NOW()

EDIT - I actually found a mistake with the above query a little while later and I have since revised it. 编辑 - 我实际上在一段时间后发现了上述查询的错误,我已经修改了它。 The query above will insert a new row if the price has changed since the last update, but it will not insert a row if there are currently no prices in the database for that item. 如果自上次更新后价格发生变化,则上述查询将插入新行,但如果数据库中当前没有该项目的价格,则不会插入行。

To resolve this I had to take advantage of the DUAL table (which always contains one row), then use an OR in the where clause to test for a different price OR an empty set 要解决这个问题,我必须利用DUAL表(它总是包含一行),然后在where子句中使用OR来测试不同的价格空集

INSERT INTO `steam_prices`( 
    `steam_id`, 
    `cc`, 
    `price`,
    `update` 
) 
SELECT 12345, 'us', 999, NOW() 
FROM DUAL
WHERE
    NOT EXISTS (
        SELECT `steam_id`
        FROM `steam_prices`
        WHERE `steam_id`=12345
    )
    OR
    EXISTS (
        SELECT p1.`steam_id`
        FROM `steam_prices` AS p1 
        LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
        WHERE 
            p2.`steam_id` IS NULL 
            AND p1.`steam_id`=12345 
            AND p1.`cc`='us' 
            AND ( 
                p1.`price`<>999
            )
    )

It's very long, it's very ugly, and it's very complicated. 它很长,非常难看,而且非常复杂。 But it works exactly as advertised. 但它完全像宣传的那样工作。 If there is no price in the database for a certain steam_id then it inserts a new row. 如果某个steam_id数据库中没有价格,那么它会插入一个新行。 If there is already a price then it checks the price with the most recent update and, if different, inserts a new row. 如果已有价格,则使用最新更新检查价格,如果不同,则插入新行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM