[英]inner defined functions in pyspark
Initially I put date_compare_(date1, date2) as a method of the whole class, but it is keeping to report error. 最初,我将date_compare_(date1,date2)用作整个类的方法,但它一直在报告错误。 Does that mean we could not call function outside the function itself in the map or filter?
这是否意味着我们无法在地图或过滤器中的函数本身之外调用函数? like method in class?
喜欢上课的方法吗? Specifically, initially I put date_compare_(date1, date2) as a class method but it does not work.
具体来说,最初我将date_compare_(date1,date2)用作类方法,但它不起作用。 It seems that now it does not work neither, only when I put all things into one function
现在看来,这一切都不起作用,只有当我将所有东西都放在一个函数中时
def extract_neighbors_from_users_network(self):
end_date = self.arguments_dict['end_day']
print("===================================")
print(end_date)
print("===================================")
print("===================================")
def date_compare_(date1, date2):
date1_arr = date1.split("-")
date2_arr = date2.split("-")
for i in range(len(date1_arr)):
if date1_arr[i] < date2_arr[i]:
return True
elif date1_arr[i] > date2_arr[i]:
return False
return True
def date_filter_(date, start_date, end_date):
return date_compare_(start_date, date) and self.date_compare_(date, end_date)
def date_filter1(x):
return date_filter_(x[0], "0000-00-00", end_date)
rdd = self.sc.textFile(action_file).map(lambda x: x.split(',')).filter(lambda x: date_filter1(x)).filter(lambda x: x[4] == 'F')
From rdd = self.sc.textFile
I guess your initially class is something like : 从
rdd = self.sc.textFile
我想你最初的类是这样的:
class YourClass():
def __init__(self):
self.sc = SparkContext()
def date_compare_(self, date1, date2):
pass
def extract_neighbors_from_users_network(self):
rdd = self.sc.textFile().map(self.date_compare_())
If so, you should change date_compare_
to 如果是这样,则应将
date_compare_
更改为
@staticmethod
def date_compare_(date1, date2):
pass
And: 和:
def extract_neighbors_from_users_network(self):
rdd = self.sc.textFile().map(YourClass.date_compare_())
UPDATE: 更新:
If you reference self.date_compare_
inside rdd.map()
, spark will send the whole instance of YourClass
to executors as tasks. 如果在
rdd.map()
引用self.date_compare_
,spark将把YourClass
的整个实例作为任务发送给执行者。 That is ok. 那没问题。
But! 但! Instance of
YourClass
contain SparkContext()
which can not be referenced on executors.That is why the error come out. YourClass
实例包含SparkContext()
,无法在执行程序上引用。这就是为什么会出现错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.