[英]Snowflake: Is it possible to pass few Columns (DataFrame) into Snowpark UDTF ( Python language)?
I wrote UDTF on Snowpark/Python which receives one Column as argument, all works fine.我在 Snowpark/Python 上编写了 UDTF,它接收一个 Column 作为参数,一切正常。 Is it possible (no documentation regarding this feature) to pass few columns (ie DataFrame) into UDTF?
是否可以(没有关于此功能的文档)将少数列(即 DataFrame)传递到 UDTF?
My code below dosn't work, exception is "TypeError: 'TABLE FUNCTION' expected Column or str, got: <class 'snowflake.snowpark.dataframe.DataFrame'>"我下面的代码不起作用,异常是“TypeError: 'TABLE FUNCTION' expected Column or str, got: <class 'snowflake.snowpark.dataframe.DataFrame'>”
Can anybody suggest how to do this (except concatenating few columns into one and pass one column into UDTF)?任何人都可以建议如何执行此操作(除了将几列连接成一列并将一列传递给 UDTF)?
import uuid
@udtf(output_schema=["c1","c2","x"],
input_types =[StringType(), StringType(), IntegerType()],
name="udft_two_col_test",
replace=True,
session=ses)
class udft_two_col_test:
def process(self, c1:str, c2:str, n: int) -> Iterable[Tuple[str, str, str]]:
for i in range(n):
yield (c1, c2, f'{n}-{c1}-{c2}')
df = ses.create_dataframe([str(uuid.uuid4()).split('-') for i in range(1,10,1)], schema=['c1','c2','c3','c4','c5'])
df.sort('c1','c2').show()
------------------------------------------------
|"C1" |"C2" |"C3" |"C4" |"C5" |
------------------------------------------------
|125a9845 |f7e2 |48dd |b51c |42ba82531fe7 |
|136da5dc |62cb |47c0 |98f9 |4182421e6d2b |
|300380e2 |b365 |4d6a |8d6b |1092e4c24ec8 |
|3d9d9882 |0fb2 |4209 |bf11 |4341b0336946 |
|43c4147d |1603 |4548 |ad8e |4df50cddd682 |
|9e1024ca |61d5 |404d |88f8 |79393083eb30 |
|bf25e899 |5697 |4c36 |8533 |e3009c68ce9b |
|d6dd677f |035b |49e7 |9236 |316741579f3c |
|f4b83587 |26e1 |48cf |8563 |0586ccb6602e |
------------------------------------------------
df.join_table_function("udft_two_col_test", df["c1","c2"], lit(3)).sort('c1','c2').show(100)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
...
---> 17 df.join_table_function("udft_two_col_test", df["c1","c2"], lit(3)).sort('c1','c2').show(100)
...
TypeError: 'TABLE FUNCTION' expected Column or str, got: <class 'snowflake.snowpark.dataframe.DataFrame'>
Try passing the columns one by one:尝试一一传递列:
df.join_table_function(udft_two_col_test_dec("c1", "c2", lit(3))).show()
# or
df.join_table_function(udft_two_col_test_dec.name, "c1", "c2", lit(3)).show()
In the documentation of join_table_function
you see an example like this:在
join_table_function
的文档中,您会看到如下示例:
df.join_table_function(split_to_table(df["addresses"], lit(" "))).show()
where df["addresses"]
is a single column of the dataframe, and lit(" ")
is another column.其中
df["addresses"]
是 dataframe 的单列,而lit(" ")
是另一列。
Cheers!干杯!
It is possible with UDTFs (User Defined Table Functions) which comes with v0.7.0可以使用 v0.7.0 附带的 UDTF(用户定义的表函数)
Here is an example:这是一个例子:
from collections import Counter
from typing import Iterable, Tuple
from snowflake.snowpark.functions import lit
class MyWordCount:
def __init__(self):
self._total_per_partition = 0
def process(self, s1: str) -> Iterable[Tuple[str, int]]:
words = s1.split()
self._total_per_partition = len(words)
counter = Counter(words)
yield from counter.items()
def end_partition(self):
yield ("partition_total", self._total_per_partition)
udtf_name = "word_count_udtf"
word_count_udtf = session.udtf.register(
MyWordCount, ["word", "count"], name=udtf_name, is_permanent=False, replace=True)
# Call it by its name
df1 = session.table_function(udtf_name, lit("w1 w2 w2 w3 w3 w3"))
df1.show()
-----------------------------
|"WORD" |"COUNT" |
-----------------------------
|w1 |1 |
|w2 |2 |
|w3 |3 |
|partition_total |6 |
-----------------------------
# Call it by the returned callable instance
df2 = session.table_function(word_count_udtf(lit("w1 w2 w2 w3 w3 w3")))
df2.show()
-----------------------------
|"WORD" |"COUNT" |
-----------------------------
|w1 |1 |
|w2 |2 |
|w3 |3 |
|partition_total |6 |
-----------------------------
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.