简体   繁体   中英

How to group by one column and limit to rows where another column has the same value for all rows in group?

I have a table like this

CREATE TABLE userinteractions
(
userid  bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);

My problem is that some of the data is polluted with multiple dobyr values for the same user.

The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.

I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions .

For example I start with data like this:

userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989

And I want to select from this to get a table like this:

userid,dobyr
1,1995
2,1999
4,1989

Is there an elegant, efficient way to get this in a single sql query?

I am using postgres.

EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.

Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:

  1. If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
  2. All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as 
select distinct on (userid,dobyr) * 
from userinteractions
where userid in ( 
    select   userid
    from     userinteractions
    group by userid
    having   count(distinct dobyr)=1 )
order by userid,dobyr;

This could also be done with an not in , not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by . Updated demo with tests and more rows.

If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid 's from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:

create table userinteractions_whitelist as
select   userid
from     userinteractions
group by userid
having   count(distinct dobyr)=1

Just use a HAVING clause to assert that all rows in a group must have the same dobyr.

SELECT
  userid,
  MAX(dobyr) AS dobyr
FROM
  userinteractions
GROUP BY
  userid
HAVING
  COUNT(DISTINCT dobyr) = 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM