简体   繁体   English

SQL-仅保留每天的第一条记录和最后一条记录

[英]SQL - Keep only the first and last record of each day

I have a table that stores simple log data: 我有一个存储简单日志数据的表:

CREATE TABLE chronicle (
    id INT auto_increment PRIMARY KEY, 
    data1 VARCHAR(256),
    data2 VARCHAR(256),
    time DATETIME
);

The table is approaching 1m records, so I'd like to start consolidating data. 该表正在接近100万条记录,因此我想开始合并数据。

I want to be able to take the first and last record of each DISTINCT(data1, data2) each day and delete all the rest. 我希望能够每天获取每个DISTINCT(data1, data2)的第一条记录和最后一条记录,并删除所有其余记录。

I know how to just pull in the data and process it in whatever language I want then delete the records with a huge IN (...) query , but it seems like a better alternative would to use SQL directly (am I wrong?) 我知道如何只提取数据并以我想要的任何语言处理它,然后使用巨大的IN (...)查询删除记录,但是似乎更好的选择是直接使用SQL(我错了吗?)

I have tried several queries, but I'm not very good with SQL beyond JOINs. 我已经尝试了几个查询,但是除了JOIN之外,我对SQL并不满意。

Here is what I have so far: 这是我到目前为止的内容:

SELECT id, Max(time), Min(time)
FROM   (SELECT id, data1 ,data2, time, Cast(time AS DATE) AS day
        FROM chronicle) AS initial
GROUP BY day;

This gets me the first and last time for each day, but it's not separated out by the data (ie I get the last record of each day, not the last record for each distinct set of data for each day.) Additionally, the id is just for the Min(time). 这使我获得了每天的第一时间和最后一次时间,但并没有被数据分开(即,我得到了每天的最后一条记录,而不是每天每一组不同数据的最后一条记录。)此外, id仅用于最小(时间)。

The information I've found on this particular problem is only for finding the the last record of the day, not each last record for sets of data. 我在此特定问题上发现的信息仅用于查找当天的最后一条记录,而不是查找数据集的每条最后一条记录。

IMPORTANT: I want the first/last record for each DISTINCT(data1, data2) for each day, not just the first/last record for each day in the table. 重要提示:我想要每天每个DISTINCT(data1, data2)的第一条/最后一条记录,而不仅仅是表中每一天的第一条/最后一条记录。 There will be more than 2 records for each day. 每天将有2条以上的记录。

Solution: My solution thanks to Jonathan Dahan and Gordon Linoff: 解决方案:感谢Jonathan Dahan和Gordon Linoff的解决方案:

SELECT o.data1, o.data2, o.time FROM chronicle AS o JOIN (
    SELECT Min(id) as id FROM chronicle GROUP BY DATE(time), data1, data2
    UNION SELECT Max(id) as id FROM test_chronicle GROUP BY DATE(time), data1. data2
) AS n ON o.id = n.id;

From here it's a simple matter of referencing the same table to delete rows. 从这里开始,只需引用同一张表即可删除行。

You have the right idea. 你有正确的主意。 You just need to join back to get the original information. 您只需要重新加入即可获取原始信息。

SELECT c.*
FROM chronicle c JOIN
     (SELECT date(time) as day, min(time) as mint, max(time) as maxt
      FROM chronicle
      GROUP BY date(time)
     ) cc
     ON c.time IN (cc.mint, cc.maxt);

Note that the join condition doesn't need to include day explicitly because it is part of the time . 请注意, join条件不需要明确地包括day因为它是time一部分。 Of course, you could add date(c.time) = cc.day if you wanted to. 当然,您可以根据需要添加date(c.time) = cc.day

Instead of deleting rows in your original table, I would suggest that you make a new table. 建议您创建一个新表,而不是删除原始表中的行。 Something lie this: 谎言是这样的:

create table ChronicleByDay like chronicle;

insert into ChronicleByDay
    SELECT c.*
    FROM chronicle c JOIN
         (SELECT date(time) as day, min(time) as mint, max(time) as maxt
          FROM chronicle
          GROUP BY date(time)
         ) cc
         ON c.time IN (cc.mint, cc.maxt);

That way, you can have the more detailed information if you ever need it. 这样,如果需要,您可以获取更详细的信息。

this will improve performance when searching on dates. 搜索日期时,这将提高性能。

ALTER TABLE chronicle
ADD INDEX `ix_chronicle_time` (`time` ASC);

This will delete the records: 这将删除记录:

CREATE TEMPORARY TABLE #tmp_ids (
  `id` INT NOT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO #tmp_ids (id)
SELECT
    min(id)
FROM
    chronicle
GROUP BY
    CAST(day as DATE),
    data1,
    data2
UNION
SELECT
    Max(id)
FROM
    chronicle
GROUP BY
    CAST(day as DATE),
    data1,
    data2;

DELETE FROM
    chronicle
WHERE
    ID not in (select id FROM #tmp_ids)
    AND date <= '2015-01-01'; -- if you want to consider all dates, then remove this condition

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM