[英]SQL query based on subquery. Retrieve transactions with data > threshold
My db table is called transactions and is like this: 我的数据库表称为事务,如下所示:
Name | Date (DateTime) | Type | Stock | Volume | Price | Total
Tom 2014-05-24 12:00:00 Sell Barclays 100 2.2 220.0
Bob 2014-04-13 15:00:00 Buy Coca-Cola 10 12.0 120.0
varchar DateTime varchar varchar int float float
My initial problem was to remove from the table ALL the transactions that belong to a user whose first transaction is later than a certain threshold. 我最初的问题是从表中删除属于用户的所有事务,该用户的第一个事务晚于特定阈值。 My query was:
我的查询是:
DELETE FROM transactions WHERE name NOT IN (SELECT name FROM transactions2 WHERE date < CAST('2014-01-01 12:00:00.000' as DateTime));
Query OK, 35850 rows affected (3 hours 5 min 28.88 sec)
I think this is a poor solution, I had to duplicate the table to avoid deleting from the same table from where I am reading, and the execution took quite a long time (3 hours for a table containing ~170k rows) 我认为这是一个糟糕的解决方案,我不得不复制该表以避免从正在读取的同一表中删除该表,并且执行花费了相当长的时间(对于包含约170k行的表,该过程花费了3个小时)
Now I am trying to delete ALL the transactions that belong to a user whose latest transaction happened before a certain threshold date. 现在,我试图删除属于某个用户的所有交易,该用户的最新交易发生在某个阈值日期之前。
DELETE FROM transactions WHERE name IN (SELECT name FROM transactions HAVING max(date) < CAST('2015-01-01 12:00:00.000' as DateTime) );
Sadly, the subquery finds only one result: 可悲的是,子查询仅找到一个结果:
SELECT name FROM transactions HAVING max(date) < CAST('2015-01-01 12:00:00.000' as DateTime)';
+------------+
| name |
+------------+
| david |
+------------+
I guess I am getting only one result because of the max() function. 我猜因为max()函数,我只能得到一个结果。 I am not an expert of SQL but I understand quite well what I need in terms of sets and logic.
我不是SQL方面的专家,但是我很了解我在集合和逻辑方面的需求。 I would be really happy to have suggestions on how to rewrite my query.
我真的很乐意就如何重写查询提出建议。
EDIT: Here is a sqlfiddle with the schema and some data: http://sqlfiddle.com/#!2/389ede/2 编辑:这是与架构和一些数据的sqlfiddle: http ://sqlfiddle.com/#!2/389ede/2
I need to remove ALL the entries for alex, because his last transactions happened before a certain threshold (let's say 1 Jan 2013). 我需要删除alex的所有条目,因为他的上次交易发生在某个阈值之前(例如,2013年1月1日)。 Don't need to delete tom's transactions because he has his latest later than 1 Jan 2013.
不需要删除Tom的交易记录,因为他的最新交易时间晚于2013年1月1日。
Your first query can be formulated as: `delete users from transactions where it does not exist a transaction for that user before ?. 您的第一个查询可以表述为:`从?中删除用户,而在?之前该用户不存在该交易。 This is easy to transform to sql:
这很容易转换为sql:
delete from transactions t1
where not exists (
select 1 from transactions t2
where t1.name = t2.name
and t2.date < ?
)
mysql still does not support (AFAIK) deleting from a table that is referenced in a select, so we need to rewrite it as: mysql仍然不支持从select引用的表中删除(AFAIK),因此我们需要将其重写为:
delete t1.*
from transactions t1
left join transactions t2
on t1.name = t2.name
and t2.date < ?
where t2.name is null
date is a reserved word so you will have to quote that. date是保留字,因此您必须引用它。
Your second query can be solved the same way, delete from transaction where it does not exists a transaction after a certain date. 您的第二个查询可以用相同的方式解决,在某个日期之后不存在事务的情况下,从事务中删除。 I'll leave it as an exercise.
我将其保留为练习。
Alvin here is a simplified scenario from your fiddle with dates: 艾尔文(Alvin)是从您的提琴和日期中简化的场景:
CREATE TABLE transactions
( id int(11) NOT NULL AUTO_INCREMENT
, name varchar(30) NOT NULL
, value datetime NOT NULL
, PRIMARY KEY (id) ) ENGINE=InnoDB;
INSERT INTO transactions (name, value) VALUES ('alex', '2011-01-01 12:00:00')
, ('alex', '2012-06-01 12:00:00');
Let's investigate what happens in: 让我们研究一下发生了什么:
SELECT t1.name as t1_name, t1.value as t1_value
, t2.name as t2_name, t2.values as t2_value
FROM transactions t1
LEFT JOIN transactions t2
ON t1.name = t2.name
T1_NAME T1_VALUE T2_NAME T2_VALUE
alex January, 01 2011 12:00:00+0000 alex January, 01 2011 12:00:00+0000
alex January, 01 2011 12:00:00+0000 alex June, 01 2012 12:00:00+0000
alex June, 01 2012 12:00:00+0000 alex January, 01 2011 12:00:00+0000
alex June, 01 2012 12:00:00+0000 alex June, 01 2012 12:00:00+0000
Ie 4 rows. 即4行。 If we now add the join predicate:
如果现在添加联接谓词:
SELECT t1.name as t1_name, t1.value as t1_value
, t2.name as t2_name, t2.values as t2_value
FROM transactions t1
LEFT JOIN transactions t2
ON t1.name = t2.name
AND t2.value > CAST('2011-06-01 12:00.000' as DateTime)
This leaves us with two rows. 这给我们留下了两行。 If we change the time to '2012-06-01 12:00.000' we still have two rows due to the left join, but the t2 columns will be null.
如果将时间更改为'2012-06-01 12:00.000',则由于左连接,我们仍然有两行,但t2列将为空。
If we now add the WHERE clause: 如果现在添加WHERE子句:
SELECT t1.name as t1_name, t1.value as t1_value
, t2.name as t2_name, t2.values as t2_value
FROM transactions t1
LEFT JOIN transactions t2
ON t1.name = t2.name
AND t2.value > CAST('2012-06-01 12:00.000' as DateTime)
WHERE t2.name is null;
we still have two rows. 我们还有两排。 With CAST('2011-06-01 12:00.000' as DateTime) there are no rows.
使用CAST('2011-06-01 12:00.000'作为DateTime)时,没有行。
Remember that the construction is equivalent with: 请记住,该构造等效于:
SELECT t1.name as t1_name, t1.value as t1_value
FROM transactions t1
WHERE NOT EXISTS (
SELECT 1 FROM transactions t2
WHERE t1.name = t2.name
AND t2.value > CAST('2012-06-01 12:00.000' as DateTime)
);
So, if it does not exist a row for the name where value > '2012-06-01 12:00.000' we have a match. 因此,如果该名称不存在value>'2012-06-01 12:00.000'的行,则我们有一个匹配项。 Does that clarify?
这澄清了吗?
@Lennart, Alvin, consider the following... @ Lennart,Alvin,考虑以下问题...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,val INT NOT NULL);
INSERT INTO my_table (val) VALUES (1),(1),(2),(1),(3),(2),(3),(1),(4);
SELECT * FROM my_table;
+----+-----+
| id | val |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 3 |
| 6 | 2 |
| 7 | 3 |
| 8 | 1 |
| 9 | 4 |
+----+-----+
Let's delete the most recent result for each val
, ie the result of... 让我们删除每个
val
最新结果,即...的结果。
SELECT x.*
FROM my_table x
JOIN
( SELECT val, max(id) max_id FROM my_table GROUP BY val ) y
ON y.val = x.val
AND y.max_id = x.id;
+----+-----+
| id | val |
+----+-----+
| 8 | 1 |
| 6 | 2 |
| 7 | 3 |
| 9 | 4 |
+----+-----+
So... 所以...
DELETE x
FROM my_table x
JOIN ( SELECT val, max(id) max_id FROM my_table GROUP BY val ) y
ON y.val = x.val
AND y.max_id = x.id;
SELECT * FROM my_table;
+----+-----+
| id | val |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 3 |
+----+-----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.