简体   繁体   English

Spark Scala | 动态创建数据框

[英]Spark Scala | create Dataframe Dyanmically

I would like to create dataframe names dynamically from a collection. 我想从集合中动态创建数据框名称。

Please see below: 请看下面:

val set1 = Set("category1","category2","category3")

The following is a UDF which takes a string x from the set as input and generate the dataframe accordingly: 以下是UDF,它从集合中获取字符串x作为输入并相应地生成数据帧:

def catDfgen(x: String): DataFrame = {
    spark.sql(s"select * from table where col1 = '$x'")
}

Now I need help here, to create not only DataFrame but also the DataFrame name should be dynamically generated in order to achieve 现在我需要帮助,不仅要创建DataFrame,还应该动态生成DataFrame名称,以实现

val category1DF = catDfgen($x)
val category2DF = catDfgen($x)

...etc. ...等等。 Would it be possible to do it using the code below? 是否可以使用下面的代码来做到这一点?

set1.map( x =>  val $x+"DF" = catDfgen($x))

If not please suggest an effective method. 如果没有,请提出一种有效的方法。

Suman, I believe the below might help your use-case Suman,我相信以下内容可能会对您的用例有所帮助

import org.apache.spark.sql.{DataFrame, SparkSession}

object Test extends App {

  val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()

  val set1 = Set("category1","category2","category3")

  val dfs: Map[String, DataFrame] = set1.map(x =>
    (s"${x}DF", spark.sql(s"select * from table where col1 = '$x'").alias(s"${x}DF").toDF())
  ).toMap

  dfs("category1DF").show()

  spark.stop()
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM