简体   繁体   English

复制具有n:m-relation的数据集

[英]Copy datasets with n:m-relation

I would like to use the single SQL-statement 我想使用单个SQL语句

insert into T (...) select ... from T where ...

to copy a lot of datasets. 复制很多数据集。 My problem is that there are N:M-relationships from table T to other tables and these have to be copied too. 我的问题是从表T到其他表有N:M关系,这些也必须被复制。 How can I do this, if I do not know which original dataset belongs to which copied dataset? 如果我不知道哪个原始数据集属于哪个复制数据集,我该怎么做呢? Let me demonstrate by example. 让我举例说明。

Content of the database before: 以前数据库的内容:

Table T : T

ID  | COL1 | COL2    
-----------------
1   | A    | B
2   | C    | D

N:M-table references table U from table T (table U is not shown): N:来自表T的M表引用表U(表U未显示):

T   | U              
---------
1   | 100
1   | 101
2   | 100
2   | 102

My copy operation where [???] is the part I do not know: 我的复制操作[???]是我不知道的部分:

insert into T (COL1, COL2) select COL1, COL2 from T
insert into NM (T, U) select [???]

Content of the database after: 以下数据库的内容:

Table T : T

ID  | COL1 | COL2
-----------------
1   | A    | B
2   | C    | D
3   | A    | B
4   | C    | D

N:M-table: N:M-表:

T   | U
---------
1   | 100
1   | 101
2   | 100
2   | 102
3   | 100
3   | 101
4   | 100
4   | 102

Notice: 注意:

  • I have thousands of datasets (not just two) 我有成千上万的数据集(不只是两个)
  • I want to use 'insert ... select' to get a better performance 我想使用'insert ... select'来获得更好的性能

If you are lucky enough to run the current PostgreSQL 9.1 , there is an elegant and fast solution with a single command using the new data-modifying CTEs . 如果你足够幸运地运行当前的PostgreSQL 9.1 ,那么使用新的数据修改CTE单个命令就可以实现优雅而快速的解决方案。

No such luck with MySQL which does not support Common Table Expressions (CTE) , not to mention data-modifying CTE. MySQL不支持公用表表达式(CTE) ,更不用说数据修改CTE了。

Assuming (col1, col2) is initially unique: 假设(col1, col2)最初是唯一的:

Query 1 查询1

  • You can easily pick arbitrary slices from the table in this case. 在这种情况下,您可以轻松地从表中选择任意切片。
  • No sequence numbers for t.id will be wasted. 不会浪费t.id序列号。

WITH s AS (
    SELECT id, col1, col2
    FROM   t
--  WHERE  some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2   -- I gather from comments that id is a serial column
    FROM   s
    RETURNING id, col1, col2
    )
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM   tu
JOIN   s ON tu.t = s.id
JOIN   i USING (col1, col2);

If (col1, col2) is not unique , I see two other ways: 如果(col1, col2) 不是唯一的 ,我会看到另外两种方式:

Query 2 查询2

  • Use the window function row_number() to make non-unique rows unique. 使用窗口函数row_number()使非唯一行唯一。
  • INSERT rows without holes in the t.id space just like in the query above. t.id空间中INSERT没有孔的行,就像上面的查询一样。

WITH s AS (
    SELECT id, col1, col2
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   t
--  WHERE some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2
    FROM   s
    RETURNING id, col1, col2
    )
    ,r AS (
    SELECT *
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   i
    )
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM   r
JOIN   s USING (col1, col2, rn)    -- match exactly one id per row
JOIN   tu ON tu.t = s.id;

Query 3 查询3

  • This is based on the same idea that @ypercube already supplied, but all in one query. 这基于@ypercube已经提供的相同的想法,但是在一个查询中。
  • If there are holes in numbers space for current t.id , sequence numbers will be burnt for the new rows accordingly. 如果当前t.id数字空间中存在漏洞,则相应的新行将被烧毁序列号。
  • Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in t that draw the default for id from the sequence. 不要忘记将序列重置为新的最大值,否则您将在t中获取重复的键错误,从而从序列中绘制id的默认值。 I integrated this as final step into the command. 我把它作为最后一步整合到命令中。 Fastest & safest this way. 这种方式最快,最安全。

WITH s AS (
    SELECT max(id) AS max_id
    FROM   t
    )
    ,i AS (
    INSERT INTO t (id, col1, col2)
    SELECT id + s.max_id, col1, col2
    FROM   t, s
    )
    ,j AS (
    INSERT INTO tu (t, u)
    SELECT tu.t + s.max_id, tu.u
    FROM   tu, s
    )
SELECT setval('t_id_seq', s.max_id + s.max_id)
FROM   s;

Details about setval() in the manual. 有关手册中setval()的详细信息。

Test setup 测试设置

For a quick test. 快速测试。

CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
INSERT INTO t (col1, col2) VALUES 
 ('A', 'B')
,('C', 'D');

CREATE TEMP TABLE tu (t int, u int);
INSERT INTO tu VALUES
 (1, 100)
,(1, 101)
,(2, 100)
,(2, 102);

SELECT * FROM t;
SELECT * FROM tu;

There was a somewhat similar question recently , where I provided a somewhat similar answer. 最近有一个类似的问题 ,我提供了一个类似的答案。 Plus alternatives for version 8.3 without CTEs and window functions. 对于没有CTE和窗口功能的8.3版本的替代品。

Step 1. Lock (both) tables or make sure that only this script is running. 步骤1.锁定(两个)表或确保仅运行此脚本。 Disable FK checks. 禁用FK检查。

Step 2. Use these two INSERT statements, in this order: 步骤2.按以下顺序使用这两个INSERT语句:

INSERT INTO NM 
    (T, U) 
  SELECT 
      T + maxID, U
  FROM 
      NM
    CROSS JOIN
      ( SELECT MAX(ID) AS maxID 
        FROM T
      ) AS m

INSERT INTO T 
    (ID, COL1, COL2) 
  SELECT 
      ID+maxID, COL1, COL2 
  FROM 
      T
    CROSS JOIN
      ( SELECT MAX(ID) AS maxID 
        FROM T
      ) AS m

Step 3. Re-enable FKs. 步骤3.重新启用FK。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM