将火花 dataframe 名称存储为变量

Question

我想打印 function 中使用的语句，例如：

def some_function(df1,df2):
    new_df = df1.union(df2)
    print (f'dataframe {df1} merged with {df2}')

到目前为止，当调用该数据帧时，它会打印完整的数据帧。

所需的 output：

some_function(product_data1,product_data2)

结果 output::

 'dataframe product_data1 merged with product_data2'

如何将 dataframe 名称存储为 spark 中的变量？ 在 python 它是这样完成的

dataframe_name = df.name

Answer 1

根据您的评论，我认为要求是让some_function中的print()打印提供给 function 的 dataframe 名称。 虽然您无法打印 dataframe 名称，但您可以调整 function 以接受 dataframe 名称作为字符串。 下面是一个例子。

def some_function(df1, df2):
    assert (type(df1) == str) and (type(df2) == str), 'Provide the dataframe names in string only - e.g., "df1"'

    new_df = eval(f'{df1}.union({df2})')  # eval() will evaluate the strings as objects
    print (f'dataframe {df1} merged with {df2}')

    return new_df

# union_df is the new appended dataframe
union_sdf = some_function('data1_sdf', 'data2_sdf')  # passed as strings
# dataframe data1_sdf merged with data2_sdf

假设有人在 function 中传递了实际变量（不是字符串）。 function 将引发错误。

union_sdf = some_function(data1_sdf, data2_sdf)  # not strings
# AssertionError: Provide the dataframe names in string only - e.g., "df1"

将火花 dataframe 名称存储为变量

问题描述

1 个解决方案

解决方案1
0 2022-08-08 07:52:46

将火花 dataframe 名称存储为变量

问题描述

1 个解决方案

解决方案1 0 2022-08-08 07:52:46

解决方案1
0 2022-08-08 07:52:46