简体   繁体   English

在 spark 数据帧中运行 UDF 时,不支持获取 org.apache.spark.sql.Column 类型的架构

[英]getting Schema for type org.apache.spark.sql.Column is not supported while running UDF in spark dataframe

I am trying to concatenate the Array of column in spark dataframe i am recieving the Array of column through a spark scala UDF.我正在尝试连接 spark 数据框中的列数组,我正在通过 spark scala UDF 接收列数组。

Here's my code :这是我的代码:

val aaa = Map(("00","DELHI") -> (List("key1","key2","key3"),List("a")))

   val sampleDf = sparksession.createDataFrame(
      List(("00", "DELHI", "111", "222", "333"), ("00", "SP", "123123123", "231231231", "312312312")
      )).toDF("RecordType", "CITY", "key1", "key2", "key3")  //.printSchema() //.show(100,false)

    val test2 = sampleDf.withColumn("primayKEY",concat(getprimakey(aaa)(col("RecordType"),col("CITY")))).show()//.printSchema()//show(false)


  def getprimakey (mapconfig: Map[(String, String), (List[String], List[String])]) =  udf((rec:String ,layout:String) => {
    println(rec+""+layout)
    val s = mapconfig(rec,layout)._1.map(x => col(x)).toArray//.map(x => col(x))
    s
  })

Below is the error I am getting下面是我得到的错误

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Column is not supported
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:733)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:693)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:671)
    at org.apache.spark.sql.functions$.udf(functions.scala:3088)
    at com.rupesh.TEST_UDF$.getprimakey(TEST_UDF.scala:29)
    at com.rupesh.TEST_UDF$.main(TEST_UDF.scala:19)
    at com.rupesh.TEST_UDF.main(TEST_UDF.scala)

You can only access fields in the UDF which you pass to the UDF.您只能访问传递给 UDF 的 UDF 中的字段。 So you need the entire row for your logic, this can be done by passing struct("*") :所以你的逻辑需要整行,这可以通过传递struct("*")来完成:

def getprimakey(mapconfig: Map[(String, String), (List[String], List[String])]) = udf((rec: String, layout: String, entireRow:Row) => {
  mapconfig.get(rec,layout).map(_._1)
  .map(k => k.map(entireRow.getAs[String](_)))
  .map(_.mkString)
})

sampleDf.withColumn("primayKEY", getprimakey(aaa)(col("RecordType"), col("CITY"), struct("*"))).show() 

+----------+-----+---------+---------+---------+---------+
|RecordType| CITY|     key1|     key2|     key3|primayKEY|
+----------+-----+---------+---------+---------+---------+
|        00|DELHI|      111|      222|      333|111222333|
|        00|   SP|123123123|231231231|312312312|     null|
+----------+-----+---------+---------+---------+---------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将列表项映射到 org.apache.spark.sql.Column 类型 - Mapping List items to org.apache.spark.sql.Column type PySpark到Scala:具有StructType,GenericRowWithSchema的UDF无法转换为org.apache.spark.sql.Column - PySpark to Scala: UDF with StructType, GenericRowWithSchema cannot be cast to org.apache.spark.sql.Column 如何按 Seq[org.apache.spark.sql.Column] 按降序对 spark DataFrame 进行排序? - How to sort spark DataFrame by Seq[org.apache.spark.sql.Column] in descending order? 值一元 ~ org.apache.spark.sql.Column - value unary ~ org.apache.spark.sql.Column 在 org.apache.spark.sql.Column 中使用 rlike - Using rlike in org.apache.spark.sql.Column java.io.NotSerializableException: org.apache.spark.sql.Column 当我使用带有 UDF 的条件创建新列时 - java.io.NotSerializableException: org.apache.spark.sql.Column when I created a new column using a condition with a UDF 错误:类型不匹配; 找到:org.apache.spark.sql. 所需列:Int - error: type mismatch; found : org.apache.spark.sql.Column required: Int Spark Dataframe UDF - 不支持 Any 类型的架构 - Spark Dataframe UDF - Schema for type Any is not supported Spark/Scala - 在执行 withColumn 操作时加入结果集给出类型不匹配错误; 发现:org.apache.spark.sql.Column required: Boolean - Spark/Scala - Join resultset giving type mismatch error while performing withColumn operation; found : org.apache.spark.sql.Column required: Boolean 不支持类型 org.apache.spark.sql.types.DataType 的模式 - Schema for type org.apache.spark.sql.types.DataType is not supported
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM