简体   繁体   English

从多个列中仅选择唯一记录

[英]Select only unique records from multiple columns

I have a table that logs downloads by IP, version and platform. 我有一个表,按IP,版本和平台记录下载。 Looking at the table manually I see a lot of duplicates where all 3 of those values are the same. 手动查看该表,我看到很多重复项,其中所有三个值都相同。 (user is probably just impatient) I'd like to use a SELECT statement that filters out the duplicates and only returns one of the entries if all 3 of those values are the same. (用户可能只是不耐烦)我想使用SELECT语句来过滤出重复项,并且仅当所有三个值都相同时才返回条目之一。 Even more advanced, if possible, I also have a date/time field that uses CURRENT_TIMESTAMP. 如果可能的话,甚至更高级,我还有一个使用CURRENT_TIMESTAMP的日期/时间字段。 Would be nice if I could include duplicates if they are from different days, but not different times. 如果我可以包括重复的副本(如果它们来自不同的日期,但没有不同的时间),那将是很好的。 So I can see if the same user is downloading again on a different day. 因此,我可以查看同一位用户是否在另一天再次下载。

I'm mainly just trying to get statistics on how many unique people download each version each day. 我主要是想获取每天有多少独立用户下载每个版本的统计数据。 The structure of the DB table is simple... DB表的结构很简单...

key (AUTO_INCREMENT), date (CURRENT_TIMESTAMP), ip, user_agent, platform, version 密钥(AUTO_INCREMENT),日期(CURRENT_TIMESTAMP),IP,用户代理,平台,版本

The software has a Windows and Mac version (platform) and I offer both the current version and a few distinct past versions that were before major changes. 该软件具有Windows和Mac版本(平台),我提供了当前版本以及进行重大更改之前的一些不同的过去版本。

Just group by the fields you want to exclude from being duplicated, like 只需按要排除重复的字段分组,例如

SELECT ip, platform, version, COUNT(*) AS number_of_tries, max(download_date) AS last_download_date 
FROM downloads
GROUP BY ip, platform, version, DATE(download_date)

It would then be relatively easy to do some more advanced filtering over the result grouping by day, etc. 这样,相对于按天分组的结果进行一些更高级的过滤将相对容易,等等。

Is this what you want? 这是你想要的吗? It returns the first record on each date for the ip / platform / version combination: 它返回每个日期的ip / platform / version组合的第一条记录:

select t.*
from <tablename> t
where t.datetime = (select min(t2.datetime)
                    from <tablename> t2
                    where t2.ip = t.ip and
                          t2.platform = t.platform and
                          t2.version = t.version and
                          date(t2.datetime) = date(t.datetime)
                   );

mysql 8.0+ version you can use row_number() mysql 8.0+版本,您可以使用row_number()

select * from (select *,
row_number()over(partition by ip,platform,date(datetime) order by datetime) rn
       from table_name 
) a where a.rn=1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM