簡體   English   中英

Snowflake Java UDF 使用 BigInteger 返回不確定的結果

[英]Snowflake Java UDF using BigInteger returning indeterministic results

我在 Snowflake 中創建了一個 Java UDF,它使用 Java 中的 BigInteger 在 SQL 中接收一個 bigint,然后將其作為字符串返回。

create or replace function pass_through_print(divisor bigint) 
returns string 
language java 
handler='TestClass.pass_through_print'
as $$
import java.math.*;
class TestClass {
  public static String pass_through_print(BigInteger divisor) {
      return divisor.toString();
  }
}
$$;

然后我進行了以下測試

測試 1(按預期工作)

select pass_through_val, count(*)
from (
    select
    pass_through_print(9) as pass_through_val
    from (
        select seq8() as val
        from table(generator(rowcount => 100000))
      )
  )
group by pass_through_val

結果(按預期工作)

PASS_THROUGH_VAL    COUNT(*)
9           100,000

測試 2(不確定)

select pass_through_val, count(*)
from (
    select
    case
        when val % 2 = 0 then null
        else pass_through_print(9)
    end as pass_through_val
    from (
        select seq8() as val
        from table(generator(rowcount => 100000))
      )
  )
group by pass_through_val

預期結果

PASS_THROUGH_VAL    COUNT(*)
null    50,000
9       50,000

多次運行的實際結果

通過結果隨機變化。

PASS_THROUGH_VAL    COUNT(*)
null            50,000
-25             50,000
PASS_THROUGH_VAL    COUNT(*)
null            50,000
-1              50,000
PASS_THROUGH_VAL    COUNT(*)
null            50,000
95              50,000

在 Java UDF 中將 BigInteger 更改為 int 對所有測試都按預期工作

create or replace function pass_through_int_print(val int) 
returns string 
language java 
handler='TestClass.pass_through_int_print'
as $$
import java.math.*;
class TestClass {
  public static String pass_through_int_print(int val) {
      return String.valueOf(val);
  }
}
$$;

似乎 case 語句與 BigInteger 轉換相結合正在導致事情中斷。 有誰知道會發生什么?

更新更奇怪的行為是,如果我在 case 語句之外調用pass_through_print ,它會按預期工作,並且它也會使 case 語句中的那個也起作用。

新測試

select pass_through_val_correct, pass_through_val, count(*)
from (
    select
    pass_through_print(123) as pass_through_val_correct, -- added this line
    case
        when val % 2 = 0 then null
        else pass_through_print(9)
    end as pass_through_val
    from (
        select seq8() as val
        from table(generator(rowcount => 1000000))
      )
  )
group by pass_through_val_correct, pass_through_val

結果

PASS_THROUGH_VAL_CORRECT    PASS_THROUGH_VAL    COUNT(*)
123 null    500,000
123 9       500,000

When you change the query (and call pass_through_print outside of the case statement), the execution plan changes, and the function is executed in a stand-alone operation (ExtensionFunction), but in your original query, the function is defined as a volatile function並作為投影操作的一部分執行(這是聚合的子步驟)。

執行計划

以某種方式調用 function 作為投影步驟的一部分會阻止處理 BigInteger(可能還有其他庫受到影響)。 請向 Snowflake 支持提交案例。 這看起來像一個錯誤。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM