Snowflake：是否可以将少数列（DataFrame）传递到 Snowpark UDTF（Python 语言）？

Question

I wrote UDTF on Snowpark/Python which receives one Column as argument, all works fine.我在 Snowpark/Python 上编写了 UDTF，它接收一个 Column 作为参数，一切正常。 Is it possible (no documentation regarding this feature) to pass few columns (ie DataFrame) into UDTF?是否可以（没有关于此功能的文档）将少数列（即 DataFrame）传递到 UDTF？

My code below dosn't work, exception is "TypeError: 'TABLE FUNCTION' expected Column or str, got: <class 'snowflake.snowpark.dataframe.DataFrame'>"我下面的代码不起作用，异常是“TypeError: 'TABLE FUNCTION' expected Column or str, got: <class 'snowflake.snowpark.dataframe.DataFrame'>”

Can anybody suggest how to do this (except concatenating few columns into one and pass one column into UDTF)?任何人都可以建议如何执行此操作（除了将几列连接成一列并将一列传递给 UDTF）？

import uuid
@udtf(output_schema=["c1","c2","x"], 
      input_types =[StringType(), StringType(), IntegerType()],
      name="udft_two_col_test", 
      replace=True, 
      session=ses)
class udft_two_col_test:
    def process(self, c1:str, c2:str, n: int) -> Iterable[Tuple[str, str, str]]: 
        for i in range(n):
            yield (c1, c2, f'{n}-{c1}-{c2}')

            
df = ses.create_dataframe([str(uuid.uuid4()).split('-') for i in range(1,10,1)], schema=['c1','c2','c3','c4','c5'])
df.sort('c1','c2').show()

------------------------------------------------
|"C1"      |"C2"  |"C3"  |"C4"  |"C5"          |
------------------------------------------------
|125a9845  |f7e2  |48dd  |b51c  |42ba82531fe7  |
|136da5dc  |62cb  |47c0  |98f9  |4182421e6d2b  |
|300380e2  |b365  |4d6a  |8d6b  |1092e4c24ec8  |
|3d9d9882  |0fb2  |4209  |bf11  |4341b0336946  |
|43c4147d  |1603  |4548  |ad8e  |4df50cddd682  |
|9e1024ca  |61d5  |404d  |88f8  |79393083eb30  |
|bf25e899  |5697  |4c36  |8533  |e3009c68ce9b  |
|d6dd677f  |035b  |49e7  |9236  |316741579f3c  |
|f4b83587  |26e1  |48cf  |8563  |0586ccb6602e  |
------------------------------------------------

df.join_table_function("udft_two_col_test", df["c1","c2"], lit(3)).sort('c1','c2').show(100)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
---> 17 df.join_table_function("udft_two_col_test", df["c1","c2"], lit(3)).sort('c1','c2').show(100)
...
TypeError: 'TABLE FUNCTION' expected Column or str, got: <class 'snowflake.snowpark.dataframe.DataFrame'>

Answer 1

Try passing the columns one by one:尝试一一传递列：

df.join_table_function(udft_two_col_test_dec("c1", "c2", lit(3))).show()
# or
df.join_table_function(udft_two_col_test_dec.name, "c1", "c2", lit(3)).show()

In the documentation of join_table_function you see an example like this:在join_table_function的文档中，您会看到如下示例：

df.join_table_function(split_to_table(df["addresses"], lit(" "))).show()

where df["addresses"] is a single column of the dataframe, and lit(" ") is another column.其中df["addresses"]是 dataframe 的单列，而lit(" ")是另一列。

Cheers!干杯!

Answer 2

It is possible with UDTFs (User Defined Table Functions) which comes with v0.7.0可以使用 v0.7.0 附带的 UDTF（用户定义的表函数）

Here is an example:这是一个例子：

from collections import Counter
from typing import Iterable, Tuple
from snowflake.snowpark.functions import lit
class MyWordCount:
    def __init__(self):
        self._total_per_partition = 0

    def process(self, s1: str) -> Iterable[Tuple[str, int]]:
        words = s1.split()
        self._total_per_partition = len(words)
        counter = Counter(words)
        yield from counter.items()

    def end_partition(self):
        yield ("partition_total", self._total_per_partition)

udtf_name = "word_count_udtf"
word_count_udtf = session.udtf.register(
    MyWordCount, ["word", "count"], name=udtf_name, is_permanent=False, replace=True)


# Call it by its name
df1 = session.table_function(udtf_name, lit("w1 w2 w2 w3 w3 w3"))
df1.show()
-----------------------------
|"WORD"           |"COUNT"  |
-----------------------------
|w1               |1        |
|w2               |2        |
|w3               |3        |
|partition_total  |6        |
-----------------------------

# Call it by the returned callable instance
df2 = session.table_function(word_count_udtf(lit("w1 w2 w2 w3 w3 w3")))
df2.show()
-----------------------------
|"WORD"           |"COUNT"  |
-----------------------------
|w1               |1        |
|w2               |2        |
|w3               |3        |
|partition_total  |6        |
-----------------------------

Snowflake：是否可以将少数列（DataFrame）传递到 Snowpark UDTF（Python 语言）？

问题描述

2 个解决方案

解决方案1
0 2023-01-17 13:33:36

解决方案2
-1 2022-06-12 08:28:22

Snowflake：是否可以将少数列（DataFrame）传递到 Snowpark UDTF（Python 语言）？

问题描述

2 个解决方案

解决方案1 0 2023-01-17 13:33:36

解决方案2 -1 2022-06-12 08:28:22

解决方案1
0 2023-01-17 13:33:36

解决方案2
-1 2022-06-12 08:28:22