简体   繁体   English

将多行合并为一

[英]Combining multiple rows into one

I have a database structure in PostgreSQL that looks something like this:我在 PostgreSQL 中有一个数据库结构,看起来像这样:

DROP TABLE IF EXISTS  medium  CASCADE;
DROP TABLE IF EXISTS  works   CASCADE;
DROP DOMAIN IF EXISTS nameVal CASCADE;
DROP DOMAIN IF EXISTS numID   CASCADE;
DROP DOMAIN IF EXISTS alphaID CASCADE;

CREATE DOMAIN alphaID   AS VARCHAR(10);
CREATE DOMAIN numID     AS INT;
CREATE DOMAIN nameVal   AS VARCHAR(40);

CREATE TABLE works (
   w_alphaID    alphaID     NOT NULL,
   w_numID      numID       NOT NULL,
   w_title      nameVal     NOT NULL,
   PRIMARY KEY(w_alphaID,w_numID));


CREATE TABLE medium (
   m_alphaID    alphaID     NOT NULL,
   m_numID      numID       NOT NULL,
   m_title      nameVal     NOT NULL,
   FOREIGN KEY(m_alphaID,m_numID) REFERENCES 
      works ON UPDATE CASCADE ON DELETE CASCADE);

INSERT INTO works VALUES('AB',1,'Sunset'),
                        ('CD',2,'Beach'),
                        ('EF',3,'Flower');

INSERT INTO medium VALUES('AB',1,'Wood'),
                         ('AB',1,'Oil'),
                         ('CD',2,'Canvas'),
                         ('CD',2,'Oil'),
                         ('CD',2,'Bronze'),
                         ('EF',3,'Paper'),
                         ('EF',3,'Pencil');
SELECT * FROM works;
SELECT * FROM medium;

SELECT w_alphaID AS alphaID, w_numID AS numID, w_title AS
       Name_of_work, m_title AS Material_used 
     FROM works, medium WHERE 
       works.w_alphaID = medium.m_alphaID 
       AND works.w_numID = medium.m_numID;

The output looks something like this:输出如下所示:

 w_alphaid | w_numid | w_title 
-----------+---------+---------
 AB        |       1 | Sunset
 CD        |       2 | Beach
 EF        |       3 | Flower
(3 rows)

 m_alphaid | m_numid | m_title 
-----------+---------+---------
 AB        |       1 | Wood
 AB        |       1 | Oil
 CD        |       2 | Canvas
 CD        |       2 | Oil
 CD        |       2 | Bronze
 EF        |       3 | Paper
 EF        |       3 | Pencil
(7 rows)

 alphaid | numid | name_of_work | material_used 
---------+-------+--------------+---------------
 AB      |     1 | Sunset       | Wood
 AB      |     1 | Sunset       | Oil
 CD      |     2 | Beach        | Canvas
 CD      |     2 | Beach        | Oil
 CD      |     2 | Beach        | Bronze
 EF      |     3 | Flower       | Paper
 EF      |     3 | Flower       | Pencil
(7 rows)

Now my question is what query should I use to have the format of the last SELECT statement to look something like this:现在我的问题是我应该使用什么查询来使最后一个SELECT语句的格式看起来像这样:

 alphaid | numid | name_of_work | material_used_1 | material_used_2 | material_used_3 
---------+-------+--------------+-----------------+-----------------+---------------
 AB      |     1 | Sunset       | Wood            | Oil             |
 CD      |     2 | Beach        | Canvas          | Oil             | Bronze
 EF      |     3 | Flower       | Paper           | Pencil          |
(3 rows)

I looked into using string_agg() but that puts the values into one cell but I am looking to have a separate cell for each value.我研究过使用string_agg()但这会将值放入一个单元格中,但我希望为每个值设置一个单独的单元格。 I tried using join to see if I can achieve such output but with no success so far.我尝试使用 join 来查看是否可以实现这样的输出,但到目前为止没有成功。 I appreciate you taking the time to look at this question.我很感谢您花时间看这个问题。

You can use string_agg() in a subquery and then break the string into separate columns.您可以在子查询中使用 string_agg(),然后将字符串分成单独的列。 See also this question on how to split string into columns另请参阅有关如何将字符串拆分为列的问题

SELECT alphaID, numID, Name_of_Work
      ,split_part(Material_used, ',', 1) AS Material_used_1
      ,split_part(Material_used, ',', 2) AS Material_used_2
      ,split_part(Material_used, ',', 3) AS Material_used_3
      ,split_part(Material_used, ',', 4) AS Material_used_4
FROM (
    SELECT w_alphaID AS alphaID, w_numID AS numID, w_title AS Name_of_work,
           String_Agg( m_title, ',' ) AS Material_used 
    FROM works, medium 
    WHERE works.w_alphaID = medium.m_alphaID 
       AND works.w_numID = medium.m_numID 
    GROUP BY w_alphaID, w_numID, w_title ) t

This would be simpler with a simpler schema:使用更简单的架构会更简单:

  • No domain types (what's the purpose?)没有域类型(目的是什么?)
  • Add an actual PK to table medium将实际 PK 添加到表medium
  • Rather use a surrogate PKs (a serial column) instead of the multicolumn PK and FK over two domain types.而是在两种域类型上使用代理 PK( serial列)而不是多列 PK 和 FK。
    Or at least use the same (simpler) column name for columns with the same content: just alpha_id instead of m_alphaID and w_alphaID etc.或者至少对具有相同内容的列使用相同(更简单)的列名:只是alpha_id而不是m_alphaIDw_alphaID等。

That aside, here are solutions for your setup as is :这且不说,这里有您的设置解决方案

True crosstab() solution真正的crosstab()解决方案

There are several specific difficulties for your crosstab() query:您的crosstab()查询有几个具体的困难:

  • No single column that can serve as row_name .没有可以用作row_name 的单个列。
  • Multiple extra columns.多个额外的列。
  • No category column.没有类别栏。
  • No defined order for values (so I use arbitrary order instead).没有定义值的顺序(所以我使用任意顺序)。

Basics ( read this first! ):基础知识(请先阅读! ):

For your special case:对于您的特殊情况:

Solution:解决方案:

SELECT alphaid, numid, name_of_work, material_1, material_2, material_3
FROM   crosstab(
  'SELECT rn, w.alphaid, w.numid, w.name_of_work
        , row_number() OVER (PARTITION BY rn) AS mat_nr  -- order undefined!
        , m_title AS Material_used 
   FROM  (
      SELECT w_alphaID AS alphaid, w_numID AS numid, w_title AS name_of_work
           , row_number() OVER (ORDER BY w_alphaID, w_numID) AS rn
       FROM  works
      ) w
   JOIN   medium m ON w.alphaid = m.m_alphaID 
                  AND w.numid   = m.m_numID
   ORDER  BY rn, mat_nr'
 , 'VALUES (1), (2), (3)'  -- add more ...
)
 AS ct (
    rn bigint, alphaid text, numid int, name_of_work text
  , material_1 text, material_2 text, material_3 text  -- add more ...
   );

Poor man's crosstab with standard SQL带有标准 SQL 的穷人交叉表

If the additional module tablefunc cannot be installed or if top performance is not important, this simpler query does the same, slower:如果无法安装附加模块tablefunc或者顶级性能不重要,则这个更简单的查询执行相同的操作,但速度更慢:

SELECT w_alphaid AS alphaid, w_numid AS numid, w_title AS name_of_work
     , arr[1] AS material_used_1
     , arr[2] AS material_used_2
     , arr[3] AS material_used_3 -- add more?
FROM   works w
LEFT  JOIN (
   SELECT m_alphaid, m_numid, array_agg(m_title::text) AS arr
   FROM   medium
   GROUP  BY m_alphaid, m_numid
   ) m ON w.w_alphaid = m.m_alphaid 
      AND w.w_numid   = m.m_numid;
  • The cast to text (or varchar ...) is necessary because there is no predefined array type for your custom domain.强制转换为text (或varchar ...)是必要的,因为您的自定义域没有预定义的数组类型。 Alternatively you could define the missing array type.或者,您可以定义缺少的数组类型。

  • One subtle difference to the above: using LEFT JOIN here instead of just JOIN to preserve rows from works that have no related materials in medium at all.与上述的一个细微差别:在此处使用LEFT JOIN而不是仅使用JOIN来保留在medium根本没有相关材料的works中的行。

  • Since you return the whole table, it's cheaper to aggregate rows in medium before you join.由于您返回整个表,因此加入之前聚合medium行会更便宜。 For a small selection it might be cheaper to join first and then aggregate .对于少量选择,先加入然后聚合可能更便宜。 Related:有关的:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM