简体   繁体   English

SQL / Postgres-根据组中的行位置将每N行折叠为1

[英]SQL/Postgres - collapse every N rows into 1 based on row position in group

I have a set of ordered results from a Postgres table, where every group of 4 rows represents a set of related data. 我从Postgres表中获得了一组有序结果,其中每4行的组代表一组相关数据。 I want to process this set of results further, so that every group of 4 rows are collapsed into 1 row with aliased column names where the value for each column is based on that row's position in the group - I'm close, but I can't quite get the query right (nor am I confident that I'm approaching this in the optimal manner). 我想进一步处理这组结果,以便将每4行的组折叠为具有别名列名的1行,其中每列的值基于该行在组中的位置-我很接近,但是我可以不太正确的查询(我也不相信我正在以最佳方式解决这个问题)。 Here's the scenario: 这是场景:

I am collecting survey results - each survey has 4 questions, but each answer is stored in a separate row in the database. 我正在收集调查结果-每个调查都有4个问题,但是每个答案都存储在数据库的单独行中。 However, they are associated with each other by a submission event_id , and the results are guaranteed to be returned in a fixed order. 但是,它们通过提交event_id相互关联,并且保证结果以固定顺序返回。 A set of survey_results will look something like: 一组survey_results如下所示:

  event_id   |    answer
----------------------------
     a       |     10
     a       |     foo
     a       |     9
     a       |     bar
     b       |     2
     b       |     baz
     b       |     4
     b       |     zip

What I would like to be able to do is query this result so that the final output comes out with each set of 4 results on their own line, with aliased column names. 我想做的就是查询此结果,以便最终输出带有别名列名的每组4个结果在自己的行中。

event_id  |  score_1  |  reason_1  |  score_2  |  reason_2
----------------------------------------------------------
    a     |   10      |    foo     |     9     |    bar
    b     |   2       |    baz     |     4     |    zip

The closest that I've been able to get is 我能得到的最接近的是

SELECT survey_answers.event_id,
    (SELECT survey_answers.answer FROM survey_answers FETCH NEXT 1 ROWS ONLY) AS score_1,
    (SELECT survey_answers.answer FROM survey_answers OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_1
    (SELECT survey_answers.answer FROM survey_answers OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS score_2,
    (SELECT survey_answers.answer FROM survey_answers OFFSET 3 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_2
FROM survey_answers
GROUP BY survey_answers.event_id

But this, understandably, returns the correct number of rows, but with the same values (other than event_id ): 但是,可以理解的是,这返回正确的行数,但具有相同的值( event_id除外):

event_id  |  score_1  |  reason_1  |  score_2  |  reason_2
----------------------------------------------------------
    a     |   10      |    foo     |     9     |    bar
    b     |   10      |    foo     |     9     |    bar

How can I structure my query so that it applies the OFFSET / FETCH behaviors every batch of 4 rows, or, maybe more accurately, within every unique set of event_id s? 如何构造查询,以便每4行批处理(或更准确地说,在event_id的每个唯一集合中)应用OFFSET / FETCH行为?

demo: db<>fiddle 演示:db <> fiddle

First of all, this looks like a very bad design: 首先,这看起来是一个非常糟糕的设计:

  1. There is no guaranteed order! 没有保证的订单! Databases store their data in random order and call them in random order. 数据库以随机顺序存储数据,并以随机顺序调用它们。 You really need a order column. 您确实需要一个订单栏。 In this small case this might work for accident. 在这种小情况下,这可能会导致意外。

  2. You should generate two columns, one for score, one for reason. 您应该生成两列,一列得分,一列原因。 Mix up the types is not a good idea. 混合类型不是一个好主意。

Nevertheless for this simple and short example this could be a solution (remember this is not recommended for productive tables): 不过,对于这个简单而简短的示例,这可能是一个解决方案(请记住,不建议在生产性表中使用此方法):

WITH data AS (
    SELECT 
        *,
        row_number() OVER (PARTITION BY event_id)    -- 1
    FROM 
        survey_results
)
SELECT
    event_id,
    MAX(CASE WHEN row_number = 1 THEN answer END) AS score_1,    -- 2
    MAX(CASE WHEN row_number = 2 THEN answer END) AS reason_1,
    MAX(CASE WHEN row_number = 3 THEN answer END) AS score_2,
    MAX(CASE WHEN row_number = 4 THEN answer END) AS reason_2
FROM
    data
GROUP BY event_id
  1. The row_number() window function adds a row count for each event_id . row_number() 窗口函数为每个event_id添加行数。 In this case from 1 to 4. This can be used to identify the types of answer (see intermediate step in fiddle). 在这种情况下,范围是1到4。这可以用来识别answer的类型(请参阅小提琴中的中间步骤)。 In productive code you should use some order column to ensure the order. 在生产代码中,您应该使用一些订单列来确保订单。 Then the window function would look like PARTITION BY event_id ORDER BY order_column 然后,窗口函数将看起来像PARTITION BY event_id ORDER BY order_column
  2. This is a simple pivot on event_id and the type id (row_number) which does exactly what you expect 这是关于event_id和类型id(row_number)的简单枢轴,它确实可以实现您的期望

You need a column that specifies the ordering. 您需要一列来指定顺序。 In your case, that should probably be a serial column, which is guaranteed to be increasing for each insert. 在您的情况下,它可能应该是一个serial列,并保证每次插入都会增加。 I would call such a column survey_result_id . 我称这样的一栏survey_result_id

With such a column, you can do: 使用这样的列,您可以执行以下操作:

select event_id,
       max(case when seqnum = 1 then answer end) as score_1,
       max(case when seqnum = 2 then answer end) as reason_1,
       max(case when seqnum = 3 then answer end) as score_2,
       max(case when seqnum = 4 then answer end) as reason_2
from (select sr.*,
             row_number() over (partition by event_id order by survey_result_id) as seqnum
      from survey_results sr
     ) sr
group by event_id;

Without such a column, you cannot reliably do what you want, because SQL tables represent unordered sets. 没有这样的列,您将无法可靠地执行所需的操作,因为SQL表表示无序集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM