如何将参数传递给用户定义函数？

Question

I have a user-defined function: 我有一个用户定义的函数：

calc = udf(calculate, FloatType())

param1 = "A"

result = df.withColumn('col1', calc(col('type'), col('pos'))).groupBy('pk').sum('events')

def calculate(type, pos):
   if param1=="A":
       a, b = [ 0.05, -0.06 ]
   else:
       a, b = [ 0.15, -0.16 ]
   return a * math.pow(type, b) * max(pos, 1)

I need to pass a parameter param1 to this udf . 我需要将参数param1传递给此udf 。 How can I do it? 我该怎么做？

Answer 1

You can use lit or typedLit as a parameter for your udf like this: 您可以使用lit或typedLit为您的参数udf这样的：

In Python: 在Python中：

from pyspark.sql.functions import udf, col, lit
mult = udf(lambda value, multiplier: value * multiplier)
df = spark.sparkContext.parallelize([(1,),(2,),(3,)]).toDF()
df.select(mult(col("_1"), lit(3)))

In Scala: 在斯卡拉：

import org.apache.spark.sql.functions.{udf, col, lit}
val mult = udf((value: Double, multiplier: Double) => value * multiplier)
val df = sparkContext.parallelize((1 to 10)).toDF
df.select(mult(col("value"), lit(3)))

Answer 2

If you insist, 如果你坚持，

def calculate(param1):
    return param1 * param1

sqlContext.udf.register("square", calculate)

Answer 3

I am not sure this will work or not...but can you try these ? 我不确定这是否会起作用......但你可以尝试这些吗？

calc = udf(calculate, FloatType())

param1 = "A"
#If not !A Give some dummy value other than A

result = df.withColumn('col1', calc(col('type'), col('pos'),param1)).groupBy('pk').sum('events')

def calculate(type, pos,param1):
   if param1=="A":
       a, b = [ 0.05, -0.06 ]
   else:
       a, b = [ 0.15, -0.16 ]
   return a * math.pow(type, b) * max(pos, 1)

如何将参数传递给用户定义函数？

问题描述

3 个解决方案

解决方案1
6 已采纳 2017-11-13 09:53:45

解决方案2
0 2017-11-13 09:38:25

解决方案3
-1 2017-11-13 09:50:16

如何将参数传递给用户定义函数？

问题描述

3 个解决方案

解决方案1 6 已采纳 2017-11-13 09:53:45

解决方案2 0 2017-11-13 09:38:25

解决方案3 -1 2017-11-13 09:50:16

解决方案1
6 已采纳 2017-11-13 09:53:45

解决方案2
0 2017-11-13 09:38:25

解决方案3
-1 2017-11-13 09:50:16