简体   繁体   中英

Postgres: Simplify SQL query to get rid of subselects

I have an events table that contains various creation, completion and failure events. Each event has an ID (primary key in the table), but also a "entity_id", which links multiple events together.

For example, when a request is created and then completed, we will have two events:

  • request #42 has been created
  • request #42 has been completed

In the above example, 42 is the entity_id of the request.

CREATE TABLE IF NOT EXISTS events (
    id SERIAL PRIMARY KEY,
    entity_id INTEGER NOT NULL,
    type VARCHAR(255) NOT NULL,
    occurred_at TIMESTAMP NOT NULL
);

INSERT INTO events (entity_id, type, occurred_at) VALUES
(1, 'created', '2019-08-08 11:20:04.791592+00'),
(1, 'completed', '2019-08-08 11:20:05.791592+00'),
(2, 'created', '2019-08-08 11:20:06.791592+00'),
(2, 'failed', '2019-08-08 11:20:07.791592+00'),
(3, 'created', '2019-08-08 11:20:08.791592+00'),
(3, 'completed', '2019-08-08 11:20:09.791592+00');

I want to create a view of that table, so that each entity_id is associated with creation and completion/failure time.

A query on that view should return the following result:

 entity_id |         created_at         |        completed_at        |         failed_at          
-----------+----------------------------+----------------------------+----------------------------
         1 | 2019-08-08 11:20:04.791592 | 2019-08-08 11:20:05.791592 | 
         2 | 2019-08-08 11:20:06.791592 |                            | 2019-08-08 11:20:07.791592
         3 | 2019-08-08 11:20:08.791592 | 2019-08-08 11:20:09.791592 |

I tried with left join , but couldn't get any good result. So far, my best attempt is this:

SELECT
    e.entity_id,
    e.occurred_at as created_at,
    (SELECT occurred_at FROM events WHERE type = 'completed' AND entity_id = e.entity_id) AS completed_at,
    (SELECT occurred_at FROM events WHERE type = 'failed' AND entity_id = e.entity_id) AS failed_at
FROM events e
WHERE e.type = 'created';

That seems pretty inelegant to me, and probably inefficient as well.

Can you suggest a better alternative? I'm using postgres, and glad to use features that are postgres-specific.

You are looking for a pivot query:

SELECT
    entity_id,
    MAX(CASE WHEN type = 'created'   THEN occurred_at END) AS created_at,
    MAX(CASE WHEN type = 'completed' THEN occurred_at END) AS completed_at,
    MAX(CASE WHEN type = 'failed'    THEN occurred_at END) AS failed_at
FROM events
GROUP BY
    entity_id
ORDER BY
    entity_id;

在此处输入图片说明

Demo

You can use window functions:

SELECT e.*
FROM (SELECT e.entity_id,
             e.occurred_at as created_at,
             MAX(e.occurred_at) FILTER (WHERE type = 'completed') OVER (PARTITION BY e.entity_id) AS completed_at,
             MAX(e.occurred_at) FILTER (WHERE type = 'failed') OVER (PARTITION BY e.entity_id) AS failed_at
      FROM events e
     ) e
WHERE e.type = 'created';

But, aggregation is probably more appropriate:

SELECT e.entity_id,
       MAX(e.occurred_at) FILTER (WHERE type = 'created') as created_at,
       MAX(e.occurred_at) FILTER (WHERE type = 'completed') AS completed_at,
       MAX(e.occurred_at) FILTER (WHERE type = 'failed') AS failed_at
FROM events e
GROUP BY e.entity_id;

You could try using case and a (fake) aggregation for reduce the rows

SELECT
    entity_id,
    max(case when  type = 'created' then occurred_at end ) as created_at,
    max(case when  type = 'completed' then occurred_at end)  as completed_at,
    max(case when  type = 'failed' then occurred_at end ) as failed_at,
FROM events 
group by entity_id

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM