函数不返回pyspark DataFrame

Question

我已经定义了一个函数，它返回作为输入给出的所有数据帧的交集的数据帧。 但是，当我将函数的输出存储在某个变量中时，它不会存储在变量中。 它显示为非类型对象

def intersection(list1, intersection_df,i):
    if (i == 1):
        intersection_df = list1[0]
        print(type(intersection_df))
        intersection(list1, intersection_df, i+1)
    elif (i>len(list1)):
        print(type(intersection_df))
        a = spark.createDataFrame(intersection_df.rdd)
        a.show()
        return a
    else:
        intersection_df = intersection_df.alias('intersection_df')
        tb = list1[i-1]
        tb = tb.alias('tb')
        intersection_df = intersection_df.join(tb, intersection_df['value'] == tb['value']).where(col('tb.value').isNotNull()).select(['intersection_df.value'])
        print(type(intersection_df))
        intersection(list1, intersection_df, i+1)

例如，如果我给出如下输入，

list1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
list2 = [3,4,5,6,7,8,9,10,11,12,13,14,15,16]
list3 = [6,7,8,9,10,11,12,13,4,16,343]
df1 = spark.createDataFrame(list1, StringType())
df2 = spark.createDataFrame(list2, StringType())
df3 = spark.createDataFrame(list3, StringType())
list4 = [df1,df2,df3]
empty_df = []
intersection_df = intersection(list4, empty_df, 1)

我希望以下输出存储在interesection_df中

 +-----+
 |value|
 +-----+
 | 7   |
 | 11  |
 | 8   |
 | 6   |
 | 9   |
 | 10  |
 | 4   |
 | 12  |
 | 13  |
 +-----+

Answer 1

我认为你受到了递归诅咒的打击。

问题：
您正在递归地调用intersection但仅在if条件之一中返回。 因此，当它返回你的df时，它无处可去（回想一下：每个函数调用都会创建一个堆栈）。

解：
从if和else条件调用intersection时返回。 对于if条件中的ex return intersection(list1, intersection_df, i+1) 。

函数不返回pyspark DataFrame

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-05-31 07:04:30

函数不返回pyspark DataFrame

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-05-31 07:04:30

解决方案1
0 已采纳 2019-05-31 07:04:30