简体   繁体   中英

How do I count occurrences with conditions in PostgreSQL?

I'm working in PostgreSQL. Suppose I have this Person table:

| id |          time         | name  |    type     |
----------------------------------------------------
|  1 | 2022-04-25 07:49:58.0 | Brian | Rejection 1 |
|  2 | 2022-04-25 07:49:58.0 | Brian | Rejection 2 |
|  3 | 2022-04-27 13:05:51.0 | Fredd | Rejection 1 |
|  4 | 2022-05-01 02:13:44.0 | Janet | Rejection 1 |
|  5 | 2022-05-01 03:45:06.0 | Janet | Rejection 2 |
|  6 | 2022-05-01 08:01:34.0 | Peter |   Approval  |
|  7 | 2022-05-01 12:12:53.0 | Frank | Rejection 2 |
|  8 | 2022-05-02 01:26:38.0 | Frank |   Approval  |

Note: We have 2 rejections types Rejection 1 and Rejection 2 .

I would like to make a query that counts the number of Rejections and the number of approvals for each name. However if there are 2 rejections at the same time, for the same name, like the two first rows in the example, it should only count as one.

Let me just add that it's possible for there to be one of each rejection types at the same time for the same name, but it's impossible for there to be two rejections of the same type at the same time for the same name.

So this is what I'm expecting it to return:

| name  | approvals | rejections |
----------------------------------
| Brian |     0     |     1      |
| Fredd |     0     |     1      |
| Janet |     0     |     2      |
| Peter |     1     |     0      |
| Frank |     1     |     1      |

The closest I could get to this is the following:

SELECT 
    name,
    COALESCE(SUM(CASE WHEN log_type = 'Approval' THEN 1 ELSE 0 END), 0) approvals,
    COALESCE(SUM(CASE WHEN log_type = 'Rejection 1' OR log_type = 'Rejection 2' THEN 1 ELSE 0 END), 0) rejections
FROM 
    person
GROUP BY 
    name

The problem with this is that it counts two rejections with same time and name as 2 instead of 1.

Use ROW_NUMBER to remove duplicates, then use a simple count query to find the counts:

SELECT
    name,
    COUNT(*) FILTER (WHERE log_type = 'Approval') approvals,
    COUNT(*) FILTER (WHERE log_type LIKE 'Rejection%') rejections
FROM
(
    SELECT *, ROW_NUMBER() OVER (PARTITION BY time, name, SUBSTRING(log_type FROM '\w+')) rn
    FROM person
) t
WHERE rn = 1
GROUP BY name;

You can use DISTINCT inside COUNT() to count the distinct time s if log_type is the 'Rejection X' :

SELECT name,
       COUNT(CASE WHEN log_type = 'Approval' THEN 1 END) approvals,
       COUNT(DISTINCT CASE WHEN log_type IN ('Rejection 1', 'Rejection 2') THEN time END) rejections
FROM person
GROUP BY name;

See the demo .

We can fetch the date in the CASE and then use DISTINCT COUNT , which ignores nulls.
I have given a first query to show the intermediate results and the counts with and without DISTINCT to show what it is doing. I have used the test LEFT(log_type,6) = 'Reject' to group the 2 rejection types.
I suggest that it would be a good idea to round the time so that 2 rejections close together will be treated as repetitions. We the current queries event 1 second different will be treated as a different rejection.

 create table person( id int, time date, name varchar(20), log_type varchar(20)); insert into person values ( 1,'2022-04-25 07:49:58.0','Brian','Rejection 1'), ( 2,'2022-04-25 07:49:58.0','Brian','Rejection 2'), ( 3,'2022-04-27 13:05:51.0','Fredd','Rejection 1'), ( 4,'2022-05-01 02:13:44.0','J.net','Rejection 1'), ( 5,'2022-05-01 03:45:06.0','J.net','Rejection 2'), ( 6,'2022-05-01 08:01:34.0','Peter','Approval'), ( 7,'2022-05-01 12:12:53.0','Frank','Rejection 2'), ( 8,'2022-05-02 01:26:38.0','Frank','Approval');
  

8 rows affected

SELECT name, CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END R, CASE WHEN log_type = 'Approval' THEN time END A FROM person;
 name | r |  a:---- |:--------- |:--------- Brian |  2022-04-25 |  null Brian |  2022-04-25 |  null Fredd |  2022-04-27 |  null J.net |  2022-05-01 |  null J.net |  2022-05-01 |  null Peter |  null |  2022-05-01 Frank |  2022-05-01 |  null Frank |  null |  2022-05-02 
 SELECT name, COUNT(CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END) all_rejections, COUNT(CASE WHEN log_type = 'Approval' THEN time END) all_approvals, COUNT(DISTINCT CASE WHEN LEFT(log_type,6) = 'Reject' THEN time END) distinct_rejections, COUNT(DISTINCT CASE WHEN log_type = 'Approval' THEN time END) distinct_approvals FROM person GROUP BY name;
 name | all_rejections | all_approvals | distinct_rejections |  distinct_approvals:---- |  -------------: |  ------------: |  ------------------: |  -----------------: Brian |  2 |  0 |  1 |  0 Frank |  1 |  1 |  1 |  1 Fredd |  1 |  0 |  1 |  0 J.net |  2 |  0 |  1 |  0 Peter |  0 |  1 |  0 |  1 

db<>fiddle here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM