简体   繁体   中英

SQL: Logic problems in psql

I am trying to develop a query to identify clients with multiple client IDs. Client ID is column 1, columns 19 and 20 hold unique person identifiers, you can think of them as some kind of Social Security number(lets call them SSN.19 and SSN.20)

My first thought was to look for every row with matching SSNs but different client ID, like so:

SELECT  
    a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1", 
    b."1", a."8"
FROM
    "clients_1" AS a,
    "clients_1" AS b 
WHERE a."19"=b."19" and a."20"=b."20" and a."1"<b."1" and a."1"='Value';

However, it returned 0 rows. To check if the table had indeed no duplicates I performed the following queries:

select distinct "19" as hk, count("19") as dl from "clients_1" group by "19" order by dl desc;

select distinct "20" as hk, count("20") as dl from "clients_1" group by "20" order by dl desc;

It turned out that on this specific table no client had an SSN19 associated with them, but there were several repeated SSN20 in the table. So I performed the following query to find clients with multiple IDs:

SELECT  
    a."5", a."3"||' '||a."4" as "3+4", a."20", a."21", a."1", 
    b."1", a."8"
FROM
    "clients_1" AS a,
    "clients_1" AS b 
WHERE a."20"=b."20" and a."1"<b."1" and a."7"='Value';

This one returned a table with several clients that had different IDs but the same SSN20. Afterwards I began to think of a way I could generalize this query for cases where clients had both SSN19 and SSN20 or just one of them, so I thought of the following:

SELECT                
    a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1", 
    b."1", a."8"
FROM
    "clients_1" AS a,
    "clients_1" AS b 
WHERE ((a."19"=b."19" and a."19" is not null) or (a."20"=b."20" and a."20" is not null)) and a."1"<b."1" and a."7"='Value';

However, this query takes forever, I had the query run for around 20min and nothing came back, whereas the previous attempt took around 2min at the most. What am I doing wrong?

I believe something like this will be better performing and give you more flexibility:

SELECT
    *
FROM
    (
        SELECT
            COUNT(*) OVER (PARTITION BY "19") as 19_matches,
            COUNT(*) OVER (PARTITION BY "20") as 20_matches,
            COUNT(*) OVER (PARTITION BY "19","20") as both_matches,
            clients_1.*
        FROM 
            clients_1
        WHERE "7" = 'value'
    )
WHERE 19_matches > 1 OR 20_matches > 1 or both_matches > 1
ORDER BY "19","20"

Except for the ugly column names, this is just the WHERE EXISTS(a similar record) solution:


SELECT *
FROM clients_1 AS a
WHERE EXISTS(
SELECT* FROM clients_1 AS b 
        WHERE (a."19" = b."19" OR a."20" = b."20" )
        AND a."1" <> b."1" 
        );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM