简体   繁体   English

pyspark中的内部定义函数

[英]inner defined functions in pyspark

Initially I put date_compare_(date1, date2) as a method of the whole class, but it is keeping to report error. 最初,我将date_compare_(date1,date2)用作整个类的方法,但它一直在报告错误。 Does that mean we could not call function outside the function itself in the map or filter? 这是否意味着我们无法在地图或过滤器中的函数本身之外调用函数? like method in class? 喜欢上课的方法吗? Specifically, initially I put date_compare_(date1, date2) as a class method but it does not work. 具体来说,最初我将date_compare_(date1,date2)用作类方法,但它不起作用。 It seems that now it does not work neither, only when I put all things into one function 现在看来,这一切都不起作用,只有当我将所有东西都放在一个函数中时

def extract_neighbors_from_users_network(self):
    end_date = self.arguments_dict['end_day']
    print("===================================")
    print(end_date)
    print("===================================")
    print("===================================")

    def date_compare_(date1, date2):
        date1_arr = date1.split("-")
        date2_arr = date2.split("-")
        for i in range(len(date1_arr)):
            if date1_arr[i] < date2_arr[i]:
                return True
            elif date1_arr[i] > date2_arr[i]:
                return False
        return True

    def date_filter_(date, start_date, end_date):
        return date_compare_(start_date, date) and self.date_compare_(date, end_date)

    def date_filter1(x):
        return date_filter_(x[0], "0000-00-00", end_date)

    rdd = self.sc.textFile(action_file).map(lambda x: x.split(',')).filter(lambda x: date_filter1(x)).filter(lambda x: x[4] == 'F')

From rdd = self.sc.textFile I guess your initially class is something like : rdd = self.sc.textFile我想你最初的类是这样的:

class YourClass():
    def __init__(self):
        self.sc = SparkContext()

    def date_compare_(self, date1, date2):
        pass

    def extract_neighbors_from_users_network(self):
        rdd = self.sc.textFile().map(self.date_compare_())

If so, you should change date_compare_ to 如果是这样,则应将date_compare_更改为

@staticmethod
def date_compare_(date1, date2):
    pass

And: 和:

def extract_neighbors_from_users_network(self):
    rdd = self.sc.textFile().map(YourClass.date_compare_())

UPDATE: 更新:

If you reference self.date_compare_ inside rdd.map() , spark will send the whole instance of YourClass to executors as tasks. 如果在rdd.map()引用self.date_compare_ ,spark将把YourClass的整个实例作为任务发送给执行者。 That is ok. 那没问题。

But! 但! Instance of YourClass contain SparkContext() which can not be referenced on executors.That is why the error come out. YourClass实例包含SparkContext() ,无法在执行程序上引用。这就是为什么会出现错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM