简体   繁体   中英

SELECT DISTINCT + matching values from two columns = “unique”

I know the title is worded poorly, but I couldn't think of a better way to say it.

I'm learning Ruby and refreshing on MySQL. I'm using a historical list of completed flights as a practice data set, with about 100,000 rows to work with. Each flight record includes the origin and destination airports (fields 'origin' and 'dest'), and the total flight distance (field 'distance').

As an exercise I want to show the 10 longest routes sorted by distance descending. However, I want to consider each pair of endpoints as a single route, regardless of which is the origin and which is the destination. So for example JFK-LAX and LAX-JFK should be considered a single route.

When I run the query:

SELECT DISTINCT distance, origin, dest FROM flights ORDER BY distance DESC LIMIT 10;

of course I get this:

["2704", "BOS", "SFO"]
["2704", "SFO", "BOS"]
["2689", "BOS", "SJC"]
["2689", "SJC", "BOS"]
["2615", "LAX", "LIH"]
["2615", "LIH", "LAX"]
["2614", "HNL", "SAN"]
["2614", "SAN", "HNL"]
["2611", "BOS", "LAX"]
["2611", "LAX", "BOS"]

which is not what I want. I want to say, "Select the distance and endpoints of the 10 longest routes regardless of whether the airports are origins or destinations."

One thought I had was to sort each pair of endpoints alphabetically and join them together to create the unique route, eg, LAX and JFK = "JFKLAX". But I don't know how to do that and pass it to my original query, or even if that's the best way to go about this.

Can this be done purely in SQL / MySQL?

One simple way to approach this is to use GREATEST() and LEAST() return whichever value of those two columns sorts higher or lower according to the columns' collation. They then always return in the same position, and the DISTINCT will deduplicate them.

SELECT DISTINCT
  distance,
  LEAST(origin, dest) AS endpoint1,
  GREATEST(origin, dest) AS endpoint2
FROM flights f
ORDER BY distance DESC LIMIT 10

Here it is in action on sqlfiddle

For example, LEAST('BOS', 'SFO') will always return 'BOS' , while GREATEST('BOS', 'SFO') will always return 'SFO' . No matter the order, when the rows are juxtaposed the result is the same so the DISTINCT will apply correctly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM