简体   繁体   中英

Snowflake Precision Problems in Query Result

Question: How do I fix this? Tried casting and all sorts of tricks but nothing seems to fix it while keeping the table the same.

Bonus Question: Why is this happening, and does it happen in other RDBMS? Should I be regretting Snowflake?

My query has a problem where something that should be equal to 0, is showing up as a very, very, small number. I can reproduce the issue on a sample dataset:

Setup to try it yourself:

drop table if exists "TESTBUG";
create table "TESTBUG" (id number, val float);

insert into "TESTBUG" values(1,0.000);
insert into "TESTBUG" values(2,0.000);
insert into "TESTBUG" values(3,0.001);
insert into "TESTBUG" values(4,0.000);
insert into "TESTBUG" values(5,0.000);
insert into "TESTBUG" values(6,0.000);
insert into "TESTBUG" values(7,0.000);
insert into "TESTBUG" values(8,0.000);
insert into "TESTBUG" values(9,0.000);
insert into "TESTBUG" values(10,0.000);
insert into "TESTBUG" values(11,0.000);
insert into "TESTBUG" values(12,0.000);
insert into "TESTBUG" values(13,0.000);
insert into "TESTBUG" values(14,0.000);
insert into "TESTBUG" values(15,0.000);
insert into "TESTBUG" values(16,0.000);
insert into "TESTBUG" values(17,0.000);
insert into "TESTBUG" values(18,0.000);
insert into "TESTBUG" values(19,0.000);
insert into "TESTBUG" values(20,0.000);

What we're dealing with here is 20 fake rows of data:

ID VAL
1 0
2 0
3 0.001
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0

Here is the SQL which produces a non-zero result. It looks like the MOVING_AVG is the culprit? I am not sure.

Also it's strange that when you look at the results, it is only ID=18 that has a non-zero result. Row 19 should effectively be the same since the window is only 14 periods long.

WITH LAG AS (
  SELECT *, 
  LAG(val,1) OVER(ORDER BY id) AS lag_val
  FROM "RASGO.PUBLIC.TESTBUG"
),
DIFF AS (
  SELECT *,
  val - lag_val as diff_lag_val
  from LAG
),
MOVING_AVG AS (
  SELECT *,
  avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 14 PRECEDING AND CURRENT ROW) AS moving_avg_diff
  FROM DIFF
)
SELECT * FROM MOVING_AVG WHERE id > 14 AND moving_avg_diff < 0
ID VAL LAG_VAL DIFF_LAG_VAL MOVING_AVG_DIFF
18 0 0 0 -6.666666667e-05

So packing it into one block:


with testbug(id, val) as (
    select * from values
    (1,  0.000::float),
    (2,  0.000::float),
    (3,  0.001::float),
    (4,  0.000::float),
    (5,  0.000::float),
    (6,  0.000::float),
    (7,  0.000::float),
    (8,  0.000::float),
    (9,  0.000::float),
    (10, 0.000::float),
    (11, 0.000::float),
    (12, 0.000::float),
    (13, 0.000::float),
    (14, 0.000::float),
    (15, 0.000::float),
    (16, 0.000::float),
    (17, 0.000::float),
    (18, 0.000::float),
    (19, 0.000::float),
    (20, 0.000::float)
), diff AS (
  SELECT 
        *,
        LAG(val,1) OVER(ORDER BY id) AS lag_val,
        val - lag_val as diff_lag_val
  from TESTBUG
)
SELECT 
    *,
    avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 14 PRECEDING AND CURRENT ROW) AS moving_avg_diff
FROM DIFF
--QUALIFY  id > 14 AND moving_avg_diff < 0
order by id;

gives:

ID VAL LAG_VAL DIFF_LAG_VAL MOVING_AVG_DIFF
1 0
2 0 0 0 0
3 0.001 0 0.001 0.0005
4 0 0.001 -0.001 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
16 0 0 0 0
17 0 0 0 0
18 0 0 0 -0.000066
19 0 0 0 0
20 0 0 0 0

so the problem is on line 4 where you get a negative value, and then the moving average for all lines prior to 18 the positive and negative value cancel out..

which can be shown with a smaller toy problem:


with testbug(id, val) as (
    select * from values
    (1,  0.000::float),
    (2,  0.000::float),
    (3,  0.001::float),
    (4,  0.000::float),
    (5,  0.000::float),
    (6,  0.000::float)
), diff AS (
  SELECT 
        *,
        LAG(val,1) OVER(ORDER BY id) AS lag_val,
        val - lag_val as diff_lag_val
  from TESTBUG
)
SELECT 
    *,
    avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_diff
FROM DIFF
order by id;
ID VAL LAG_VAL DIFF_LAG_VAL MOVING_AVG_DIFF
1 0
2 0 0 0 0
3 0.001 0 0.001 0.0005
4 0 0.001 -0.001 0
5 0 0 0 0
6 0 0 0 -0.0003333333333

Check that math:

SELECT 
    *,
    sum(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_sum_diff,
    count(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_countdiff,
    avg(diff_lag_val) OVER(ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_diff,
    div0(moving_sum_diff, moving_countdiff) as man_avg 
FROM DIFF
order by id;

we now add the manual steps to do the average by hand, and we get:

ID VAL LAG_VAL DIFF_LAG_VAL MOVING_SUM_DIFF MOVING_COUNTDIFF MOVING_AVG_DIFF MAN_AVG
1 0 0
2 0 0 0 0 1 0 0
3 0.001 0 0.001 0.001 2 0.0005 0.0005
4 0 0.001 -0.001 0 3 0 0
5 0 0 0 0 3 0 0
6 0 0 0 -0.001 3 -0.0003333333333 -0.0003333333333

so the math is correct.

Final words:

So this is not a floating point number representation problem, it is a you are doing math over numbers that are not how you expected them to be, but what is happening makes sense, albeit, it might not be what you are wanting to do.

aka here is what is happening:

select column1, column2, div0(column1,column2) 
from values
    ( 0,  18 ),
    ( 0.001,  2 ),
    ( -0.001,  15 );
COLUMN1 COLUMN2 DIV0(COLUMN1,COLUMN2)
0 18 0
0.001 2 0.0005
-0.001 15 -0.000066666

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM