[英]Snowflake Precision Problems in Query Result
问题:我该如何解决这个问题? 尝试过铸造和各种技巧,但似乎没有什么可以在保持桌子不变的情况下修复它。
奖励问题:为什么会发生这种情况,它是否发生在其他 RDBMS 中? 我应该后悔雪花吗?
我的查询有一个问题,应该等于 0 的东西显示为非常非常小的数字。 我可以在示例数据集上重现该问题:
设置自己试试:
drop table if exists "TESTBUG";
create table "TESTBUG" (id number, val float);
insert into "TESTBUG" values(1,0.000);
insert into "TESTBUG" values(2,0.000);
insert into "TESTBUG" values(3,0.001);
insert into "TESTBUG" values(4,0.000);
insert into "TESTBUG" values(5,0.000);
insert into "TESTBUG" values(6,0.000);
insert into "TESTBUG" values(7,0.000);
insert into "TESTBUG" values(8,0.000);
insert into "TESTBUG" values(9,0.000);
insert into "TESTBUG" values(10,0.000);
insert into "TESTBUG" values(11,0.000);
insert into "TESTBUG" values(12,0.000);
insert into "TESTBUG" values(13,0.000);
insert into "TESTBUG" values(14,0.000);
insert into "TESTBUG" values(15,0.000);
insert into "TESTBUG" values(16,0.000);
insert into "TESTBUG" values(17,0.000);
insert into "TESTBUG" values(18,0.000);
insert into "TESTBUG" values(19,0.000);
insert into "TESTBUG" values(20,0.000);
我们在这里处理的是 20 行假数据:
ID | 值 |
---|---|
1个 | 0 |
2个 | 0 |
3个 | 0.001 |
4个 | 0 |
5个 | 0 |
6个 | 0 |
7 | 0 |
8个 | 0 |
9 | 0 |
10 | 0 |
11 | 0 |
12 | 0 |
13 | 0 |
14 | 0 |
15 | 0 |
16 | 0 |
17 | 0 |
18 | 0 |
19 | 0 |
20 | 0 |
这是产生非零结果的 SQL。 看起来MOVING_AVG
是罪魁祸首? 我不知道。
同样奇怪的是,当您查看结果时,只有 ID=18 具有非零结果。 第 19 行实际上应该是相同的,因为 window 只有 14 个周期长。
WITH LAG AS (
SELECT *,
LAG(val,1) OVER(ORDER BY id) AS lag_val
FROM "RASGO.PUBLIC.TESTBUG"
),
DIFF AS (
SELECT *,
val - lag_val as diff_lag_val
from LAG
),
MOVING_AVG AS (
SELECT *,
avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 14 PRECEDING AND CURRENT ROW) AS moving_avg_diff
FROM DIFF
)
SELECT * FROM MOVING_AVG WHERE id > 14 AND moving_avg_diff < 0
ID | 值 | LAG_VAL | DIFF_LAG_VAL | MOVING_AVG_DIFF |
---|---|---|---|---|
18 | 0 | 0 | 0 | -6.666666667e-05 |
所以把它打包成一个块:
with testbug(id, val) as (
select * from values
(1, 0.000::float),
(2, 0.000::float),
(3, 0.001::float),
(4, 0.000::float),
(5, 0.000::float),
(6, 0.000::float),
(7, 0.000::float),
(8, 0.000::float),
(9, 0.000::float),
(10, 0.000::float),
(11, 0.000::float),
(12, 0.000::float),
(13, 0.000::float),
(14, 0.000::float),
(15, 0.000::float),
(16, 0.000::float),
(17, 0.000::float),
(18, 0.000::float),
(19, 0.000::float),
(20, 0.000::float)
), diff AS (
SELECT
*,
LAG(val,1) OVER(ORDER BY id) AS lag_val,
val - lag_val as diff_lag_val
from TESTBUG
)
SELECT
*,
avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 14 PRECEDING AND CURRENT ROW) AS moving_avg_diff
FROM DIFF
--QUALIFY id > 14 AND moving_avg_diff < 0
order by id;
给出:
ID | 值 | LAG_VAL | DIFF_LAG_VAL | MOVING_AVG_DIFF |
---|---|---|---|---|
1个 | 0 | |||
2个 | 0 | 0 | 0 | 0 |
3个 | 0.001 | 0 | 0.001 | 0.0005 |
4个 | 0 | 0.001 | -0.001 | 0 |
5个 | 0 | 0 | 0 | 0 |
6个 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 |
8个 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 |
10 | 0 | 0 | 0 | 0 |
11 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 0 |
13 | 0 | 0 | 0 | 0 |
14 | 0 | 0 | 0 | 0 |
15 | 0 | 0 | 0 | 0 |
16 | 0 | 0 | 0 | 0 |
17 | 0 | 0 | 0 | 0 |
18 | 0 | 0 | 0 | -0.000066 |
19 | 0 | 0 | 0 | 0 |
20 | 0 | 0 | 0 | 0 |
所以问题出在第 4 行,在那里你得到一个负值,然后 18 之前的所有行的移动平均值正负值抵消了..
可以用一个较小的玩具问题来表示:
with testbug(id, val) as (
select * from values
(1, 0.000::float),
(2, 0.000::float),
(3, 0.001::float),
(4, 0.000::float),
(5, 0.000::float),
(6, 0.000::float)
), diff AS (
SELECT
*,
LAG(val,1) OVER(ORDER BY id) AS lag_val,
val - lag_val as diff_lag_val
from TESTBUG
)
SELECT
*,
avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_diff
FROM DIFF
order by id;
ID | 值 | LAG_VAL | DIFF_LAG_VAL | MOVING_AVG_DIFF |
---|---|---|---|---|
1个 | 0 | |||
2个 | 0 | 0 | 0 | 0 |
3个 | 0.001 | 0 | 0.001 | 0.0005 |
4个 | 0 | 0.001 | -0.001 | 0 |
5个 | 0 | 0 | 0 | 0 |
6个 | 0 | 0 | 0 | -0.0003333333333 |
SELECT
*,
sum(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_sum_diff,
count(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_countdiff,
avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_diff,
div0(moving_sum_diff, moving_countdiff) as man_avg
FROM DIFF
order by id;
我们现在添加手动步骤以手动进行平均,我们得到:
ID | 值 | LAG_VAL | DIFF_LAG_VAL | MOVING_SUM_DIFF | MOVING_COUNTDIFF | MOVING_AVG_DIFF | MAN_AVG |
---|---|---|---|---|---|---|---|
1个 | 0 | 0 | |||||
2个 | 0 | 0 | 0 | 0 | 1个 | 0 | 0 |
3个 | 0.001 | 0 | 0.001 | 0.001 | 2个 | 0.0005 | 0.0005 |
4个 | 0 | 0.001 | -0.001 | 0 | 3个 | 0 | 0 |
5个 | 0 | 0 | 0 | 0 | 3个 | 0 | 0 |
6个 | 0 | 0 | 0 | -0.001 | 3个 | -0.0003333333333 | -0.0003333333333 |
所以数学是正确的。
所以这不是一个浮点数表示问题,而是你正在对数字进行数学运算,而不是你期望的那样,但发生的事情是有道理的,尽管它可能不是你想要做的。
也就是正在发生的事情:
select column1, column2, div0(column1,column2)
from values
( 0, 18 ),
( 0.001, 2 ),
( -0.001, 15 );
第 1 列 | 第 2 列 | DIV0(第 1 列,第 2 列) |
---|---|---|
0 | 18 | 0 |
0.001 | 2个 | 0.0005 |
-0.001 | 15 | -0.000066666 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.