簡體   English   中英

Spark dataframe map 根鍵與字符串類型的另一列數組的元素

[英]Spark dataframe map root key with elements of array of another column of string type

實際上我遇到了一個問題,我有一個 dataframe,其中 2 列具有架構

    scala> df1.printSchema
    root
     |-- actions: string (nullable = true)
     |-- event_id: string (nullable = true)

actions 列實際上包含對象數組,但它的類型是字符串,因此我不能在這里使用 explode

樣本數據:

------------------------------------------------------------------------------------------------------------------
| event_id |                                        actions                                                       |
------------------------------------------------------------------------------------------------------------------
|    1     | [{"name": "Vijay", "score": 843},{"name": "Manish", "score": 840}, {"name": "Mayur", "score": 930}]  |
------------------------------------------------------------------------------------------------------------------

每個 object 操作中還有一些其他鍵,但為簡單起見,我在這里取了 2 個。

我想將其轉換為以下格式

OUTPUT:-

---------------------------------------
| event_id | name      |    score      |
---------------------------------------
|   1      | Vijay     |    843        |
---------------------------------------
|   2      | Manish    |    840        |
---------------------------------------
|   3      | Mayur     |    930        |
---------------------------------------

我如何使用 spark dataframe 執行此操作?

我嘗試使用閱讀操作列

val df2= spark.read.option("multiline",true).json(df1.rdd.map(row => row.getAs[String]("actions")))

但在這里我無法在每一行中使用 map event_id。

您可以使用from_json function 來執行此操作。此 function 有 2 個輸入:

  • 我們要從中讀取 json 字符串的列
  • 用於解析 json 字符串的架構

這看起來像這樣:

import spark.implicits._
import org.apache.spark.sql.types._

// Reading in your data
val df = spark.read.option("sep", ";").option("header", "true").csv("./csvWithJson.csv")

df.show(false)
+--------+---------------------------------------------------------------------------------------------------+                                                                                                                                                                  
|event_id|actions                                                                                            |                                                                                                                                                                  
+--------+---------------------------------------------------------------------------------------------------+                                                                                                                                                                  
|1       |[{"name": "Vijay", "score": 843},{"name": "Manish", "score": 840}, {"name": "Mayur", "score": 930}]|                                                                                                                                                                  
+--------+---------------------------------------------------------------------------------------------------+


// Creating the necessary schema for the from_json function
val actionsSchema = ArrayType(
  new StructType()
    .add("name", StringType)
    .add("score", IntegerType)
  )

// Parsing the json string into our schema, exploding the column to make one row
// per json object in the array and then selecting the wanted columns,
// unwrapping the parsedActions column into separate columns
val parsedDf = df
  .withColumn("parsedActions",explode(from_json(col("actions"), actionsSchema)))
  .drop("actions")
  .select("event_id", "parsedActions.*")

parsedDf.show(false)
+--------+------+-----+                                                                                                                                                                                                                                                         
|event_id|  name|score|                                                                                                                                                                                                                                                         
+--------+------+-----+                                                                                                                                                                                                                                                         
|       1| Vijay|  843|                                                                                                                                                                                                                                                         
|       1|Manish|  840|                                                                                                                                                                                                                                                         
|       1| Mayur|  930|                                                                                                                                                                                                                                                         
+--------+------+-----+

希望這可以幫助!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM