简体   繁体   English

Snowpark-Python动态连接

[英]Snowpark-Python Dynamic Join

I have searched through a large amount of documentation to try to find an example of what I'm trying to do.我搜索了大量文档,试图找到我正在尝试做的事情的例子。 I admit that the bigger issue may be my lack of python expertise.我承认更大的问题可能是我缺乏 python 专业知识。 So i'm reaching out here in hopes that someone can point me in the right direction.所以我在这里伸出援手,希望有人能指出我正确的方向。 I am trying to create a python function that dynamically queries tables based on a function parameters.我正在尝试创建一个 python function 来动态查询基于 function 参数的表。 Here is an example of what i'm trying to do:这是我正在尝试做的一个例子:

def validateData(_ses, table_name,sel_col,join_col, data_state, validation_state):
 
    sdf_t1 = _ses.table(table_name).select(sel_col).filter(col('state') == data_state)
    sdf_t2 = _ses.table(table_name).select(sel_col).filter(col('state') == validation_state)
    df_join = sdf_t1.join(sdf_t2, [sdf_t1[i] == sdf_t2[i] for i in join_col],'full')
    return df_join.to_pandas()

This would be called like this:这将被称为这样的:

df = validateData(ses,'table_name',[col('c1'),col('c2')],[col('c2'),col('c3')],'AZ','TX')

this issue i'm having is with line 5 from the funtion:我遇到的这个问题与函数的第 5 行有关:

df_join = sdf_t1.join(sdf_t2, [col(sdf_t1[i]) == col(sdf_t2[i]) for i in join_col],'full')

I know that code is incorrect, but I'm hoping it explains what i'm trying to do.我知道代码不正确,但我希望它能解释我正在尝试做的事情。 If anyone has any advice on if this is possible or how, I would greatly appreciate it.如果有人对这是否可能或如何进行有任何建议,我将不胜感激。

Instead of joining in data frame, i think its easier to use a direct SQL and pull the data in a snow frame and convert it to a pandas data frame.与其加入数据框,我认为使用直接 SQL 并将数据拉入雪框并将其转换为 pandas 数据框更容易。

from snowflake.snowpark import Session
import pandas as pd

#snow df creation using SQL
data = session.sql("select t1.col1, t2.col2, t2.col2 from mytable t1 full outer join mytable2 t2 on t1.id=t2.id where t1.col3='something'")

#Convert snow DF to Pandas DF. You can use this pandas data frame.
data= pd.DataFrame(data.collect())

Essentially what you need is to create a python expression from two lists of variables.本质上,您需要的是从两个变量列表创建一个 python 表达式。 I don't have a better idea than using eval.我没有比使用 eval 更好的主意了。

Maybe try eval(" & ".join(["(col(sdf_t1[i]) == col(sdf_t2[i]))" for i in join_col]) . Be mindful that I have not completely test this but just to toss an idea.也许尝试eval(" & ".join(["(col(sdf_t1[i]) == col(sdf_t2[i]))" for i in join_col]) 。请注意,我还没有完全测试这个,只是为了抛出一个主意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM