Scala Spark：具有JSON列的数据集

Question

Hello from a Spark beginner! 您好，Spark初学者！

I have a DataFrame that includes several columns, let's say ID, name, and properties. 我有一个DataFrame，其中包含几列，例如ID，名称和属性。 All of them are of type string. 它们都是字符串类型。 The last column, properties, includes a JSON representation of some properties of the object. 最后一列，属性，包括对象某些属性的JSON表示。

I am looking for some way to iterate over the DataFrame, parse the JSON, and extract a specific JSON field out of each item - and append that to the row of the DataFrame. 我正在寻找某种方法来遍历DataFrame，解析JSON并从每个项目中提取特定的JSON字段-并将其附加到DataFrame的行中。

So far, a bit lost - I know that Spark can import JSON datasets (that's not what I have..) and that there's a net.liftweb.json library, but unfortunately I haven't found a way to make it work - 到目前为止，有点失落了-我知道Spark可以导入JSON数据集（这不是我所拥有的..），并且有一个net.liftweb.json库，但是不幸的是，我还没有找到一种使其工作的方法-

val users = sqlContext.table("user")
  .withColumn("parsedProperties", parse($"properties"))

returns a TypeMismatch - parse() function expects a String, and i'm sending it a column name. 返回TypeMismatch-parse（）函数需要一个String，并且我正在为其发送列名。

Note that I do NOT have a set schema for this JSON column. 请注意，我没有此JSON列的设置模式。

Thank you in advance! 先感谢您！

Answer 1

You need to create a udf here, from the function parse, and then apply the udf on the column. 您需要在此处从函数解析中创建一个udf，然后将udf应用于该列。

import org.apache.spark.sql.functions.udf
val parse_udf = udf( parse _ )

val users = sqlContext.table("user")
  .withColumn("parsedProperties", parse_udf($"properties"))

Answer 2

Working now! 现在工作！ Thank you! 谢谢！

val getEmail: String => String = parse(_).asInstanceOf[JObject].values.getOrElse("email", "").toString 
val getEmailUDF = udf(getEmail)
val users = sqlContext.table("user")
  .withColumn("email", getEmailUDF($"properties"))

Scala Spark：具有JSON列的数据集

问题描述

2 个解决方案

解决方案1
1 2017-03-14 02:43:11

解决方案2
0 2017-03-14 03:13:08

Scala Spark：具有JSON列的数据集

问题描述

2 个解决方案

解决方案1 1 2017-03-14 02:43:11

解决方案2 0 2017-03-14 03:13:08

解决方案1
1 2017-03-14 02:43:11

解决方案2
0 2017-03-14 03:13:08