简体   繁体   中英

SQL Query - Distinct on One Column for Distinct Value of Other (with INNER JOIN)

I appreciate that questions similar to this one have been asked on here before but I have thus far been unable to implement the answers provided into my code both because of wanting to distinguish duplicates in one column only whilst the other stays the same and the INNER JOIN in my code. The INNER JOIN is problematic because most of the provided answers use the PARTITION function and, being a novice with SQL, I do not know how to integrate this with it. Advice just on using INNER JOIN with PARTITION would be useful.

Whilst I could do this post-export in Python (where I will be using the desired output), this code currently outputs ~2 million rows, making it time-consuming to work with and check. Here is the code:

SELECT client_ip_address, language_enum_code
FROM vw_user_session_log AS usl
INNER JOIN vw_user_topic_ownership AS uto
ON usl.user_id = uto.user_id

Using SELECT DISTINCT instead of SELECT gets me closer to the desired output but rather than leaving one duplicate row behind it removes all of them. Advice on using this function whilst preserving one of the duplicate rows would be preferred. I am on a read-only connection to the database so the DELETE FROM approach seen here would only be viable if I could make a temporary query-able table from the query output which I don't think is possible and seems clumsy.

Raw data sample:

user_id:    client_ip_address:   language_enum_code:          (other stuff...)
    4          194:4:62:18              107
    2          101:9:23:34              14
    3          180:4:87:99              15
    3          194:4:62:18              15
    4          166:1:19:27              107
    2          166:1:19:27              14

Desired result:

user_id:    client_ip_address:   language_enum_code:          (other stuff...)
    4          194:4:62:18              107
    2          101:9:23:34              14
    3          180:4:87:99              15

As you can see, any id-enum combination should be filtered to occur only once. The reason this is not any ip-enum combination is that multiple users can connect through the same IP address.

Do you simply want aggregation?

SELECT client_ip_address, GROUP_CONCAT(DISTINCT language_enum_code)
FROM vw_user_session_log usl INNER JOIN
     vw_user_topic_ownership uto
     ON usl.user_id = uto.user_id
GROUP BY client_ip_address;

This will return one row per client_ip_address with each language code in a comma delimited list.

You can also use MIN() or MAX() to get an arbitrary value for language_enum_code for each client_ip_address .

If you don't care which IP address you keep for each user_id / enum combo, then something like this should do:

SELECT user_id, min(client_ip_address), language_enum_code
FROM vw_user_session_log AS usl
INNER JOIN vw_user_topic_ownership AS uto
ON usl.user_id = uto.user_id
where client_ip_address is not null
group by user_id, language_enum_code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM