简体   繁体   English

使用Scala,Spark UDF中的类型多态性将一系列地图平展到地图

[英]Flatten a Seq of Maps to Map using Type polymorphism in Scala, Spark UDF

I have the following function that flattens a sequence of maps of string to double. 我有以下函数将字符串映射序列展平为double。 How can I make type string to double generic? 如何使类型字符串加倍泛型?

val flattenSeqOfMaps = udf { values: Seq[Map[String, Double]] => values.flatten.toMap }
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))

I need something like, 我需要类似的东西

val flattenSeqOfMaps[S,D] = udf { values: Seq[Map[S, D]] => values.flatten.toMap }

Thanks. 谢谢。

Edit 1: I'm using spark 2.3. 编辑1:我正在使用spark 2.3。 I am aware of higher order functions in spark 2.4 我知道Spark 2.4中的高阶函数

Edit 2: I got a bit closer. 编辑2:我走近一点。 What do I need in place of f _ in val flattenSeqOfMaps = udf { f _} . 我需要做什么来代替f _val flattenSeqOfMaps = udf { f _} Please compare joinMap type signature and flattenSeqOfMaps type signature below 请在下面比较joinMap类型签名和flattenSeqOfMaps类型签名

scala> val joinMap = udf { values: Seq[Map[String, Double]] => values.flatten.toMap }
joinMap: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))

scala> def f[S,D](values: Seq[Map[S, D]]): Map[S,D] = { values.flatten.toMap}
f: [S, D](values: Seq[Map[S,D]])Map[S,D]

scala> val flattenSeqOfMaps = udf { f _}
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(NullType,NullType,true),Some(List(ArrayType(MapType(NullType,NullType,true),true))))

Edit 3: the following code worked for me. 编辑3:以下代码为我工作。

scala> val flattenSeqOfMaps = udf { f[String,Double] _}
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))

While you could define your function as 虽然您可以将函数定义为

import scala.reflect.runtime.universe.TypeTag

def flattenSeqOfMaps[S : TypeTag, D: TypeTag] = udf { 
  values: Seq[Map[S, D]] => values.flatten.toMap
}

and then use specific instances: 然后使用特定实例:

val df = Seq(Seq(Map("a" -> 1), Map("b" -> 1))).toDF("val")

val flattenSeqOfMapsStringInt = flattenSeqOfMaps[String, Int]

df.select($"val", flattenSeqOfMapsStringInt($"val") as "val").show
+--------------------+----------------+
|                 val|             val|
+--------------------+----------------+
|[[a -> 1], [b -> 1]]|[a -> 1, b -> 1]|
+--------------------+----------------|

it is also possible to use built-in functions, without any need for explicit generics: 也可以使用内置函数,而无需显式泛型:

import org.apache.spark.sql.functions.{expr, flatten, map_from_arrays}

def flattenSeqOfMaps_(col: String) = {
  val keys = flatten(expr(s"transform(`$col`, x -> map_keys(x))"))
  val values = flatten(expr(s"transform(`$col`, x -> map_values(x))"))
  map_from_arrays(keys, values)
}

df.select($"val", flattenSeqOfMaps_("val") as "val").show
+--------------------+----------------+
|                 val|             val|
+--------------------+----------------+
|[[a -> 1], [b -> 1]]|[a -> 1, b -> 1]|
+--------------------+----------------+

The following code worked for me. 以下代码对我有用。

scala> def f[S,D](values: Seq[Map[S, D]]): Map[S,D] = { values.flatten.toMap}
f: [S, D](values: Seq[Map[S,D]])Map[S,D]

scala> val flattenSeqOfMaps = udf { f[String,Double] _}
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM