简体   繁体   English

如何传递数据框以激发udf?

[英]How to pass dataframe to spark udf?

I want to define a udf. 我想定义一个udf。 In the function body, it will search data from external dataframe. 在函数主体中,它将从外部数据框中搜索数据。 How can I do that? 我怎样才能做到这一点? I tried to pass the dataframe to udf. 我试图将数据框传递给udf。 But it cannot work. 但这行不通。

Sample code: 样例代码:

val countryDF = spark.read
  .option("inferSchema", "true")
  .option("header", "true")
  .csv("Country.csv")

val geo = (originString: String, dataFrame: DataFrame) => {
  // Search data from countryDF
  val row = dataFrame.where(col("CountryName") === originString)
  if (row != Nil){
    // set data to row index 2
    row.getAs[String](2)
  }
  else{
    "0"
  }
}
val udfGeo = udf(geo)

val cLatitudeAndLongitude = udfGeo(countryTestDF.col("CountryName"), lit(countryDF))

countryTestDF = countryTestDF.withColumn("Latitude", cLatitudeAndLongitude)

If you want to use a UDF, you have to work on columns, not on dataframe object You have to create a new column that take the output of the UDF. 如果要使用UDF,则必须处理列,而不要处理数据框对象。必须创建一个采用UDF输出的新列。

def geo(originString : String, CountryName: String) : Int = {

    if (CountryName == originString){
      return 1}
    else{
      return 0}
  }

val geoUDF = udf(geo _)

val newData = countryDF.withColum("isOrignOrNot", geoUDF(col("originString"),col("CountryName"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM