简体   繁体   English

如何为外部连接产生的空值选择列中的前一个非空值

[英]How to chose the previous non null value in a column for null values resulting from an outer join

Say I have a table A, which has null values in a particular column say Value as a result of an outer join.假设我有一个表 A,它在特定列中有空值,比如外连接的结果。 Now if A represents a value such as cumsum, it doesnt make sense for it to drop to null in the middle现在如果 A 代表一个值,比如 cumsum,它在中间下降到 null 是没有意义的

| ID  | Value |
| --- | ----- |
| 1   | null  |
| 2   | 576   |
| 3   | null  |
| 4   | 695   |
| 5   | null  |

so is it possible to transform the table A, to table B which might look like this那么是否可以将表 A 转换为可能看起来像这样的表 B

| ID  | Value |
| --- | ----- |
| 1   | 0     |
| 2   | 576   |
| 3   | 576   |
| 4   | 695   |
| 5   | 695   |

Therefore is it possible to replace the null values in a column with either the previous non null value, or if in case that is not available a default value 0. This transformation also has to be done across all columns in a table, with 10 or so columns.因此,是否可以将列中的空值替换为先前的非空值,或者如果不可用,则使用默认值 0。此转换还必须在表中的所有列中完成,使用 10 或所以列。

Should be compatible with general SQL, definitely works on postgreSQL:应该与一般 SQL 兼容,绝对适用于 postgreSQL:

SELECT a1.id, COALESCE(a2.value, 0) FROM
(
  SELECT a1.id AS id, MAX(a2.id) as idref FROM a a1 LEFT JOIN a a2
  ON a2.id <= a1.id AND a2.value IS NOT NULL GROUP BY a1.id
) AS a1
LEFT JOIN a a2 ON a2.id = a1.idref 

Double join required – doesn't appear optimal to me, but at least works .需要双重加入 - 对我来说似乎不是最佳选择,但至少有效

Concerning your comment: The pattern could be extended, but you'd need to add yet another two joins for each further column if the respective existing values might come from different rows.关于您的评论:该模式可以扩展,但如果相应的现有值可能来自不同的行,则您需要为每一列再添加两个连接。

Edit:编辑:

While still requiring two sub-queries (instead of joins), this variant seems to perform slightly better on postgre SQL:虽然仍然需要两个子查询(而不是连接),但这个变体似乎在 postgre SQL 上表现得稍微好一些:

SELECT
  a1.id,
  COALESCE
  (
    (
      SELECT a2.value FROM a a2 WHERE a2.id =
        (SELECT MAX(a3.id) FROM a a3 WHERE a3.id <= a1.id AND a3.value IS NOT NULL)
    ),
    0
  )
  FROM a a1

You'll still need a set of two sub-queries for every additional column (unless all data comes can be retrieved from the same row), though performance gain likely will accumulate then.对于每个额外的列,您仍然需要一组两个子查询(除非所有数据都可以从同一行中检索到),尽管性能提升可能会在那时累积。

WITH cte AS (
    SELECT
        id,
        value,
        count(value) OVER (ORDER BY id)
    FROM (
    VALUES (1, NULL),
    (2, 576),
(3, NULL),
(4, 695),
(5, NULL),
(6, NULL)) g (id, value) ORDER BY id
)
SELECT
    id,
    value,
    coalesce(first_value(value) OVER (PARTITION BY count ORDER BY id), 0)
FROM
    cte
ORDER BY
    id;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM