[英]Pyspark - DataFrame not updated when applying functions in a loop
I'm trying to apply different functions to various columns of a DataFrame depending on a condition.我正在尝试根据条件将不同的功能应用于 DataFrame 的各个列。 When I do this in a loop,
fn1
is applied successfully on the first iteration.当我在循环中执行此操作时,
fn1
在第一次迭代中成功应用。 But the df
turns None
on the second iteration.但是
df
在第二次迭代中变为None
。 I guess the problem is the way I'm initializing the df
in the scope of a loop.我想问题是我在循环的 scope 中初始化
df
的方式。
df = spark.createDataFrame([(10,4,2,3),(20,1,3,4),(30,7,4,5),(40,2,1,9)], schema=['id','metric_1','metric_2', 'metric_3'])
cols_info = [{'name':'metric_1','apply_func':'True','method':'fn1'},{'name':'metric_2','apply_func':'True','method':'fn2'}, {'name':'metric_3','apply_func':'True','method':'fn3'}]
def fn1(df, col):
return df.withColumn(col, F.pow(df[col], 2))
def fn2(df, col):
return df.withColumn(col, F.hash(df[col]))
def fn3(df, col):
return df.withColumn(col, F.log2(df[col]))
def process_data(df, columns):
for col in columns:
if col["apply_func"] == "True":
if column["method"] == "fn1":
df = fn1(df, col["name"])
if column["method"] == "fn2":
df = fn2(df, col["name"])
if column["method"] == "fn3":
df = fn3(df, col["name"])
return df
What is the correct way to apply such transformations with Pyspark DataFrame API?使用 Pyspark DataFrame API 应用此类转换的正确方法是什么?
Can you try to write the functions in this way.你能尝试用这种方式编写函数吗? This way worked for me:
这种方式对我有用:
def fn1(df, col):
df = df.withColumn(col, F.pow(df[col], 2))
return df
def fn2(df, col):
df = df.withColumn(col, F.hash(df[col]))
return df
def fn3(df, col):
df = df.withColumn(col, F.log2(df[col]))
return df
def process_data(df, columns):
for col in columns:
if col["apply_func"] == "True":
if col["method"] == "fn1":
df = fn1(df, col["name"])
if col["method"] == "fn2":
df = fn2(df, col["name"])
if col["method"] == "fn3":
df = fn3(df, col["name"])
return df
I think the assignment is necessary but not very sure.我认为分配是必要的,但不是很确定。 Someone could improve on my answer
有人可以改进我的答案
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.