简体   繁体   中英

How to convert rdd object to dataframe in Scala

I read data from ElasticSearch and save into an RDD.

val es_rdd = sc.esRDD("indexname/typename",query="?q=*")

The rdd has the next example data:

(uniqueId,Map(field -> value))
(uniqueId2,Map(field2 -> value2))

How can I convert this RDD (String, Map to a Dataframe (String, String, String)?

You can use explode to achieve it.

  import spark.implicits._
  import org.apache.spark.sql.functions._

  val rdd = sc.range(1, 10).map(s => (s, Map(s -> s)))
  val ds = spark.createDataset(rdd)
  val df = ds.toDF()
  df.printSchema()
  df.show()

  df.select('_1,explode('_2)).show()

output:

root
 |-- _1: long (nullable = false)
 |-- _2: map (nullable = true)
 |    |-- key: long
 |    |-- value: long (valueContainsNull = false)

+---+--------+
| _1|      _2|
+---+--------+
|  1|[1 -> 1]|
|  2|[2 -> 2]|
|  3|[3 -> 3]|
|  4|[4 -> 4]|
|  5|[5 -> 5]|
|  6|[6 -> 6]|
|  7|[7 -> 7]|
|  8|[8 -> 8]|
|  9|[9 -> 9]|
+---+--------+

+---+---+-----+
| _1|key|value|
+---+---+-----+
|  1|  1|    1|
|  2|  2|    2|
|  3|  3|    3|
|  4|  4|    4|
|  5|  5|    5|
|  6|  6|    6|
|  7|  7|    7|
|  8|  8|    8|
|  9|  9|    9|
+---+---+-----+

I readed directly in Spark.SQL format using the next call to elastic:

val df = spark.read.format("org.elasticsearch.spark.sql")
      .option("query", "?q=*")
      .option("pushdown", "true")
      .load("indexname/typename")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM