[英]Snowflake Java UDF using BigInteger returning indeterministic results
我在 Snowflake 中創建了一個 Java UDF,它使用 Java 中的 BigInteger 在 SQL 中接收一個 bigint,然后將其作為字符串返回。
create or replace function pass_through_print(divisor bigint)
returns string
language java
handler='TestClass.pass_through_print'
as $$
import java.math.*;
class TestClass {
public static String pass_through_print(BigInteger divisor) {
return divisor.toString();
}
}
$$;
然后我進行了以下測試
select pass_through_val, count(*)
from (
select
pass_through_print(9) as pass_through_val
from (
select seq8() as val
from table(generator(rowcount => 100000))
)
)
group by pass_through_val
結果(按預期工作)
PASS_THROUGH_VAL COUNT(*)
9 100,000
select pass_through_val, count(*)
from (
select
case
when val % 2 = 0 then null
else pass_through_print(9)
end as pass_through_val
from (
select seq8() as val
from table(generator(rowcount => 100000))
)
)
group by pass_through_val
PASS_THROUGH_VAL COUNT(*)
null 50,000
9 50,000
通過結果隨機變化。
PASS_THROUGH_VAL COUNT(*)
null 50,000
-25 50,000
PASS_THROUGH_VAL COUNT(*)
null 50,000
-1 50,000
PASS_THROUGH_VAL COUNT(*)
null 50,000
95 50,000
在 Java UDF 中將 BigInteger 更改為 int 對所有測試都按預期工作
create or replace function pass_through_int_print(val int)
returns string
language java
handler='TestClass.pass_through_int_print'
as $$
import java.math.*;
class TestClass {
public static String pass_through_int_print(int val) {
return String.valueOf(val);
}
}
$$;
似乎 case 語句與 BigInteger 轉換相結合正在導致事情中斷。 有誰知道會發生什么?
更新更奇怪的行為是,如果我在 case 語句之外調用pass_through_print
,它會按預期工作,並且它也會使 case 語句中的那個也起作用。
select pass_through_val_correct, pass_through_val, count(*)
from (
select
pass_through_print(123) as pass_through_val_correct, -- added this line
case
when val % 2 = 0 then null
else pass_through_print(9)
end as pass_through_val
from (
select seq8() as val
from table(generator(rowcount => 1000000))
)
)
group by pass_through_val_correct, pass_through_val
結果
PASS_THROUGH_VAL_CORRECT PASS_THROUGH_VAL COUNT(*)
123 null 500,000
123 9 500,000
When you change the query (and call pass_through_print outside of the case statement), the execution plan changes, and the function is executed in a stand-alone operation (ExtensionFunction), but in your original query, the function is defined as a volatile function並作為投影操作的一部分執行(這是聚合的子步驟)。
以某種方式調用 function 作為投影步驟的一部分會阻止處理 BigInteger(可能還有其他庫受到影響)。 請向 Snowflake 支持提交案例。 這看起來像一個錯誤。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.