简体   繁体   English

SQL UPDATE 查询 - 值取决于另一行

[英]SQL UPDATE query - value depends on another rows

There is a SQL Server database temporary table, let it be TableA.有一个SQL Server数据库临时表,假设为TableA。 And the table structure is following:表结构如下:

CREATE TABLE #TableA 
( 
  ID BIGINT IDENTITY (1, 1) PRIMARY KEY,
  MapVal1 BIGINT NOT NULL,
  MapVal2 BIGINT NOT NULL,
  IsActual BIT NULL
)

The table is already filled with some mappings of MapVal1 to MapVal2.该表已经填充了一些 MapVal1 到 MapVal2 的映射。 The issue is that not all the mappings should be flagged as Actual.问题是并非所有映射都应标记为实际。 For this reason should be used IsActual column.为此应使用 IsActual 列。 Currently IsActual is set to NULL for every row.目前 IsActual 被设置为 NULL 每行。 The task is to create the query for updating IsActual column value.任务是创建用于更新 IsActual 列值的查询。 UPDATE query should follow next conditions: UPDATE 查询应遵循以下条件:

  1. If MapVal1 is unique and MapVal2 is unique (one-to-one mapping) - then this mapping should be flagged as Actual, so IsActual = 1;如果 MapVal1 是唯一的并且 MapVal2 是唯一的(一对一映射)——那么这个映射应该被标记为 Actual,所以 IsActual = 1;
  2. If MapVal1 is not unique - then Actual should be the mapping of current MapVal1 to smallest MapVal2, and this MapVal2 must be not mapped to any other MapVal1 that is smaller than current MapVal1;如果 MapVal1 不是唯一的——那么 Actual 应该是当前 MapVal1 到最小 MapVal2 的映射,并且这个 MapVal2 一定不能映射到任何其他小于当前 MapVal1 的 MapVal1;
  3. If MapVal2 is not unique - then Actual should be the mapping of current MapVal2 to smallest MapVal1, and this MapVal1 must be not mapped to any other MapVal2 that is smaller than current MapVal2;如果 MapVal2 不是唯一的——那么 Actual 应该是当前 MapVal2 到最小 MapVal1 的映射,并且这个 MapVal1 一定不能映射到任何其他小于当前 MapVal2 的 MapVal2;
  4. All rows that are not fulfill any of 1), 2) or 3) conditions - should be flagged as inactual, so IsActual = 0. I believe there is relation between Condition 2) and Condition 3).所有不满足 1)、2) 或 3) 条件的所有行都应标记为不实际,因此 IsActual = 0。我相信条件 2) 和条件 3) 之间存在关系。 For every row they both are fulfilled or both are not.对于每一行,它们都满足或都不满足。

To make it clear, here is an example of result I want to obtain:为了清楚起见,这是我想要获得的结果示例:

在此处输入图像描述

Result should be that every MapVal1 is mapped to just one MapVal2 and vice varsa every MapVal2 is mapped to just one MapVal1.结果应该是每个 MapVal1 都映射到一个 MapVal2,反之亦然,每个 MapVal2 都映射到一个 MapVal1。

I have created sql-query to resolve my task:我创建了 sql-query 来解决我的任务:

IF OBJECT_ID('tempdb..#TableA') IS NOT NULL
BEGIN  
  DROP TABLE #TableA
END

CREATE TABLE #TableA 
( 
  ID BIGINT IDENTITY (1, 1) PRIMARY KEY,
  MapVal1 BIGINT NOT NULL,
  MapVal2 BIGINT NOT NULL,
  IsActual BIT NULL
)

-- insert input data
INSERT INTO #TableA (MapVal1, MapVal2) 
SELECT 1, 1
UNION ALL SELECT 1, 3
UNION ALL SELECT 1, 4
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 2, 4
UNION ALL SELECT 3, 3
UNION ALL SELECT 3, 4
UNION ALL SELECT 4, 3
UNION ALL SELECT 4, 4
UNION ALL SELECT 6, 7
UNION ALL SELECT 7, 8
UNION ALL SELECT 7, 9
UNION ALL SELECT 8, 8
UNION ALL SELECT 8, 9
UNION ALL SELECT 9, 8
UNION ALL SELECT 9, 9


CREATE NONCLUSTERED INDEX IX_Mapping_MapVal1 ON #TableA (MapVal1); 
CREATE NONCLUSTERED INDEX IX_Mapping_MapVal2 ON #TableA (MapVal2); 


-- UPDATE of #TableA is starting here 

-- every one-to-one mapping should be actual
UPDATE m1 SET
  m1.IsActual = 1
FROM #TableA m1
LEFT JOIN #TableA m2
  ON m1.MapVal1 = m2.MapVal1 AND m1.ID <> m2.ID
LEFT JOIN #TableA m3
  ON m1.MapVal2 = m3.MapVal2 AND m1.ID <> m3.ID
WHERE m2.ID IS NULL AND m3.ID IS NULL


-- update for every one-to-many or many-to-many mapping is more complicated
-- would be great to change this part of query to make it witout any LOOP
DECLARE @MapVal1 BIGINT
DECLARE @MapVal2 BIGINT

DECLARE @i BIGINT
DECLARE @iMax BIGINT
DECLARE @LoopCount INT = 0 
SELECT 
  @iMax = MAX (m.ID)
FROM #TableA m

SELECT 
  @i = MIN (m.ID)
FROM #TableA m
WHERE m.IsActual IS NULL

WHILE @i <= @iMax
BEGIN  
  
  SELECT @LoopCount = @LoopCount + 1

  SELECT
    @MapVal1 = m.MapVal1,
    @MapVal2 = m.MapVal2
  FROM #TableA m
  WHERE m.ID = @i

  IF EXISTS 
  (
    SELECT * 
    FROM #TableA m 
    WHERE 
      m.ID < @i 
      AND 
        (m.MapVal1 = @MapVal1 
        OR m.MapVal2 = @MapVal2)
      AND m.IsActual IS NULL     
  ) 
  BEGIN
    UPDATE m SET 
      m.IsActual = 0 
    FROM #TableA m 
    WHERE m.ID = @i
  END

  SELECT @i = MIN (m.ID)
  FROM #TableA m
  WHERE 
    m.ID > @i 
    AND m.IsActual IS NULL 
  
END

UPDATE m SET  
  m.IsActual = 1
FROM #TableA m
WHERE m.IsActual IS NULL


SELECT * FROM #TableA

but as it was expected performance of the query with LOOP is very bad, specially when input table keep millions of rows.但正如预期的那样,使用 LOOP 的查询性能非常糟糕,特别是当输入表保留数百万行时。 I spent a lot of time trying to produce query without LOOP to get reduce execution time of my query but unsuccessfully.我花了很多时间尝试在没有 LOOP 的情况下生成查询以减少查询的执行时间但没有成功。

Could anybody advice me how to improve performance of my query.谁能建议我如何提高查询的性能。 It would be great to get query without LOOP.如果没有 LOOP 就可以获取查询。

Using a loop does not imply you need to update the table one record at a time.使用循环并不意味着您需要一次更新表中的一条记录。 It may help if each individual UPDATE statement updates multiple records.如果每个单独的UPDATE语句更新多条记录,这可能会有所帮助。

Consider all possible combinations of MapVal1 and MapVal2 as a matrix.将 MapVal1 和 MapVal2 的所有可能组合视为一个矩阵。 Every time you flag a cell as 'actual', you can flag an entire row and an entire column as 'not actual'.每次将一个单元格标记为“实际”时,您可以将整行和整列标记为“非实际”。

The simplest way to do this, is by following these steps.执行此操作的最简单方法是执行以下步骤。

  1. Of all mappings with IsActual = NULL, take the first one (smallest MapVal1, together with the smallest MapVal2 it is mapped to).在 IsActual = NULL 的所有映射中,取第一个(最小的 MapVal1,以及它映射到的最小的 MapVal2)。
  2. Flag this mapping as actual (IsActual = 1).将此映射标记为实际 (IsActual = 1)。
  3. Flag all other mappings with the same MapVal1 as non-actual (IsActual = 0).将具有相同 MapVal1 的所有其他映射标记为非实际 (IsActual = 0)。
  4. Flag all other mappings with the same MapVal2 as non-actual (IsActual = 0).将具有相同 MapVal2 的所有其他映射标记为非实际 (IsActual = 0)。
  5. Repeat from step 1 until no more records with IsActual = NULL exist.从步骤 1 开始重复,直到不再有 IsActual = NULL 的记录存在。

Here's an implementation:这是一个实现:

SELECT 0    -- force @@ROWCOUNT initially 1

WHILE @@ROWCOUNT > 0
    WITH MakeActual AS (
        SELECT TOP 1 MapVal1, MapVal2
        FROM #TableA
        WHERE IsActual IS NULL
        ORDER BY MapVal1, MapVal2
    )
    UPDATE a
    SET IsActual = CASE WHEN a.MapVal1 = m.MapVal1 AND a.MapVal2 = m.MapVal2 THEN 1 ELSE 0 END
    FROM #TableA a
    INNER JOIN MakeActual m ON a.MapVal1 = m.MapVal1 OR a.MapVal2 = m.MapVal2

The number of loop iterations equals the number of 'actual' mappings.循环迭代次数等于“实际”映射的次数。 The actual performance gain depends a lot on the data.实际性能增益在很大程度上取决于数据。 If the majority of mappings is one-to-one (ie hardly any non-actual mappings), then my algorithm will make little difference.如果大多数映射是一对一的(即几乎没有任何非实际映射),那么我的算法将没有什么区别。 Therefore, it may be wise to keep the initial UPDATE statement from your own code sample (the one with the comment "every one-to-one mapping should be actual").因此,明智的做法是保留您自己的代码示例中的初始UPDATE语句(注释为“每个一对一映射都应该是实际的”的示例)。

It may also help to play around with the indexes.玩转索引也可能有所帮助。 This one should help to further optimize the clause ORDER BY MapVal1, MapVal2 :这应该有助于进一步优化子句ORDER BY MapVal1, MapVal2

CREATE NONCLUSTERED INDEX IX_MapVals ON #TableA (MapVal1, MapVal2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM