使用 spark scala 從 json 結構中獲取電影類別

Question

我有一個 df_movies 和 col of geners 看起來像 json 格式。

|流派|
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 37, 'name': 'Western'}]

如何提取“名稱”的第一個字段：val？

方式#1

df_movies.withColumn
    ("genres_extract",regexp_extract(col("genres"),
    """ 'name': (\w+)""",1)).show(false)

方式#2

df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':\s\d,\s 'name':\s(\w+)""",1))

例外：行動

Answer 1

您可以使用get_json_object function：

  Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
    .toDF("genres")
    .withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
    .show()


+--------------------+--------------+
|              genres|genres_extract|
+--------------------+--------------+
|[{"id": 28, "name...|        Action|
+--------------------+--------------+

Answer 2

另一種可能性是將from_json function 與自定義模式一起使用。 這允許您將 json 結構“解包”為 dataframe，其中包含所有數據，以便您可以隨心所欲地使用它！

類似於以下內容：

import org.apache.spark.sql.types._

Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
  .toDF("genres")


// Creating the necessary schema for the from_json function
val moviesSchema = ArrayType(
  new StructType()
    .add("id", StringType)
    .add("name", StringType)
  )

// Parsing the json string into our schema, exploding the column to make one row
// per json object in the array and then selecting the wanted columns,
// unwrapping the parsedActions column into separate columns
val parsedDf = df
  .withColumn("parsedMovies", explode(from_json(col("genres"), moviesSchema)))
  .select("parsedMovies.*")

parsedDf.show(false)
+---+---------+                                                                                                                                                                                                                                                                 
| id|     name|                                                                                                                                                                                                                                                                 
+---+---------+                                                                                                                                                                                                                                                                 
| 28|   Action|                                                                                                                                                                                                                                                                 
| 12|Adventure|                                                                                                                                                                                                                                                                 
| 37|  Western|                                                                                                                                                                                                                                                                 
+---+---------+

使用 spark scala 從 json 結構中獲取電影類別

問題描述

2 個解決方案

解決方案1
1 已采納 2022-12-12 07:21:50

解決方案2
0 2022-12-12 07:53:48

使用 spark scala 從 json 結構中獲取電影類別

問題描述

2 個解決方案

解決方案1 1 已采納 2022-12-12 07:21:50

解決方案2 0 2022-12-12 07:53:48

解決方案1
1 已采納 2022-12-12 07:21:50

解決方案2
0 2022-12-12 07:53:48