简体   繁体   中英

SQL/Postgres - collapse every N rows into 1 based on row position in group

I have a set of ordered results from a Postgres table, where every group of 4 rows represents a set of related data. I want to process this set of results further, so that every group of 4 rows are collapsed into 1 row with aliased column names where the value for each column is based on that row's position in the group - I'm close, but I can't quite get the query right (nor am I confident that I'm approaching this in the optimal manner). Here's the scenario:

I am collecting survey results - each survey has 4 questions, but each answer is stored in a separate row in the database. However, they are associated with each other by a submission event_id , and the results are guaranteed to be returned in a fixed order. A set of survey_results will look something like:

  event_id   |    answer
----------------------------
     a       |     10
     a       |     foo
     a       |     9
     a       |     bar
     b       |     2
     b       |     baz
     b       |     4
     b       |     zip

What I would like to be able to do is query this result so that the final output comes out with each set of 4 results on their own line, with aliased column names.

event_id  |  score_1  |  reason_1  |  score_2  |  reason_2
----------------------------------------------------------
    a     |   10      |    foo     |     9     |    bar
    b     |   2       |    baz     |     4     |    zip

The closest that I've been able to get is

SELECT survey_answers.event_id,
    (SELECT survey_answers.answer FROM survey_answers FETCH NEXT 1 ROWS ONLY) AS score_1,
    (SELECT survey_answers.answer FROM survey_answers OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_1
    (SELECT survey_answers.answer FROM survey_answers OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS score_2,
    (SELECT survey_answers.answer FROM survey_answers OFFSET 3 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_2
FROM survey_answers
GROUP BY survey_answers.event_id

But this, understandably, returns the correct number of rows, but with the same values (other than event_id ):

event_id  |  score_1  |  reason_1  |  score_2  |  reason_2
----------------------------------------------------------
    a     |   10      |    foo     |     9     |    bar
    b     |   10      |    foo     |     9     |    bar

How can I structure my query so that it applies the OFFSET / FETCH behaviors every batch of 4 rows, or, maybe more accurately, within every unique set of event_id s?

demo: db<>fiddle

First of all, this looks like a very bad design:

  1. There is no guaranteed order! Databases store their data in random order and call them in random order. You really need a order column. In this small case this might work for accident.

  2. You should generate two columns, one for score, one for reason. Mix up the types is not a good idea.

Nevertheless for this simple and short example this could be a solution (remember this is not recommended for productive tables):

WITH data AS (
    SELECT 
        *,
        row_number() OVER (PARTITION BY event_id)    -- 1
    FROM 
        survey_results
)
SELECT
    event_id,
    MAX(CASE WHEN row_number = 1 THEN answer END) AS score_1,    -- 2
    MAX(CASE WHEN row_number = 2 THEN answer END) AS reason_1,
    MAX(CASE WHEN row_number = 3 THEN answer END) AS score_2,
    MAX(CASE WHEN row_number = 4 THEN answer END) AS reason_2
FROM
    data
GROUP BY event_id
  1. The row_number() window function adds a row count for each event_id . In this case from 1 to 4. This can be used to identify the types of answer (see intermediate step in fiddle). In productive code you should use some order column to ensure the order. Then the window function would look like PARTITION BY event_id ORDER BY order_column
  2. This is a simple pivot on event_id and the type id (row_number) which does exactly what you expect

You need a column that specifies the ordering. In your case, that should probably be a serial column, which is guaranteed to be increasing for each insert. I would call such a column survey_result_id .

With such a column, you can do:

select event_id,
       max(case when seqnum = 1 then answer end) as score_1,
       max(case when seqnum = 2 then answer end) as reason_1,
       max(case when seqnum = 3 then answer end) as score_2,
       max(case when seqnum = 4 then answer end) as reason_2
from (select sr.*,
             row_number() over (partition by event_id order by survey_result_id) as seqnum
      from survey_results sr
     ) sr
group by event_id;

Without such a column, you cannot reliably do what you want, because SQL tables represent unordered sets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM