简体   繁体   English

过滤重复组合多个列

[英]Filter duplicates combining multiple columns

Example table 示例表

| name         | year | latitude | longitude |
|--------------|------|----------|-----------|
| Cleveland    | 1800 | 10       | 11        |
| Cleveland    | 1810 | 10       | 11        |
| Medina       | 1811 | 12       | 13        |
| Dayton       | 1812 | 14       | 15        |
| Sandusky     | 1105 | 50       | 50        |
| Mount Vernon | 1813 | 50       | 50        |

What I'm aiming to do 我的目标是什么

I want to select each unique combinations of latitude and longitude . 我想选择latitudelongitude每个独特组合。 So I want to filter out any duplicate pairs. 所以我想过滤掉任何重复的对。 I also need to filter out any records whose year is less than 1500 . 我还需要过滤掉年份小于1500任何记录。

This is the subset I'm trying to achieve: 这是我想要实现的子集:

| name         | year | latitude | longitude |
|--------------|------|----------|-----------|
| Cleveland    | 1800 | 10       | 11        |
| Medina       | 1811 | 12       | 13        |
| Dayton       | 1812 | 14       | 15        |
| Mount Vernon | 1813 | 50       | 50        |

Each records year is greater than 1500 and there aren't any duplicate lat,long pairs. 每个记录year大于1500,并且没有任何重复的纬度,长对。

What I've tried 我试过的

I've tried to find a way to use DISTINCT . 我试图找到一种方法来使用DISTINCT Nothing I've found has worked. 我发现的任何东西都没有用。

I also have tried using GROUP BY : 我也尝试过使用GROUP BY

SELECT *
FROM users
GROUP BY latitude, longitude
HAVING year > 1500;

The issue with the above query is that is eliminates both of the following records which contain the lat,long pair of 50,50: 上述查询的问题是消除了包含lat,long对50,50的以下两个记录:

| name         | year | latitude | longitude |
|--------------|------|----------|-----------|
| Sandusky     | 1105 | 50       | 50        |
| Mount Vernon | 1813 | 50       | 50        |

The group is eliminated because Sandusky's year is less than 1500. I don't want Sandusky's record, but I do want Mount Vernon. 由于桑达斯基的year不到1500 year该团体被淘汰了。我不想要桑达斯基的记录,但我确实想要弗农山。

I noticed that if if the two records where switched like so: 我注意到,如果这两个记录切换如此:

| name         | year | latitude | longitude |
|--------------|------|----------|-----------|
| Mount Vernon | 1813 | 50       | 50        |
| Sandusky     | 1105 | 50       | 50        |

...then the group's year is set as 1813 and the group is not eliminated. ......然后该组织的年份设定为1813年,该组织未被淘汰。 I thought maybe sorting by year would fix it, but it didn't: 我想也许按年分类会解决它,但它没有:

SELECT *
FROM users
GROUP BY latitude, longitude
HAVING year > 1500
ORDER BY year DESC;

Is what I'm attempting possible? 我正在尝试的是什么?

How about this? 这个怎么样?

SELECT `id`, `name`, MAX(users.year) as `year`, latitude, longitude
FROM users
WHERE year > 1500
GROUP BY latitude, longitude;

Results in: 结果是:

| 7 | Columbus     | 1978 | 7  | 8  
| 1 | Cleveland    | 1800 | 10 | 11 
| 3 | Medina       | 1811 | 12 | 13 
| 4 | Dayton       | 1812 | 14 | 15 
| 6 | Mount Vernon | 1813 | 50 | 50 

The only difference is where the WHERE / HAVING is, because it is before the GROUP BY statement, it will do the filtering BEFORE the grouping happens and thus you get the desired result. 唯一的区别是WHERE / HAVING位置,因为它在GROUP BY语句之前,它将在分组发生之前进行过滤,从而获得所需的结果。

The MAX(users.year) ensure that you always get the largest year on the set. MAX(users.year)确保您始终获得最大的一年。 If this doesn't matter to you, you can replace SELECT `id`, `name`, MAX(users.year) as `year`, latitude, longitude with SELECT * 如果这不要紧,你可以替换SELECT `id`, `name`, MAX(users.year) as `year`, latitude, longitudeSELECT *

Maybe I didn't understand the problem, but it would be this simple: 也许我不明白这个问题,但这很简单:

select * from users u where u.year > 1500;

I don't know what you want to do in case there are more than one pair of the same coordinates with a year greater than 1500. 我不知道你想要做什么,以防有多对相同的坐标,一年大于1500。

How about this unless it is a misread. 除非是误读,否则这个怎么样。 I did read. 我读过了。 It makes assumptions like you want to not eliminate a different name with same lat,long 它假设你想要不要使用相同的lat,long来消除不同的名称

create table users
(   id int auto_increment primary key,
    name varchar(50) not null,
    year int not null,
    latitude int not null,
    longitude int not null
);
truncate table users;
insert users (name,year,latitude,longitude) values
('Cleveland',1810,10,11),
('Medina',1811,12,13),
('Dayton',1812,14,15),
('Mount Vernon',1813,50,50),
('Sandusky',1105,50,50);

SELECT distinct name,year,latitude,longitude 
FROM users 
where year > 1500 
ORDER BY year;
+--------------+------+----------+-----------+
| name         | year | latitude | longitude |
+--------------+------+----------+-----------+
| Cleveland    | 1810 |       10 |        11 |
| Medina       | 1811 |       12 |        13 |
| Dayton       | 1812 |       14 |        15 |
| Mount Vernon | 1813 |       50 |        50 |
+--------------+------+----------+-----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM