简体   繁体   中英

Extract a Json from an array inside a json in spark

I have a complicated JSON column whose structure is :

story{ cards: [{story-elements: [{...}{...}{...}}]}

The length of the story-elements is variable. I need to extract a particular JSON block from the story-elements array. For this, I first need to extract the story-elements.

Here is the code which I have tried, but it is giving error:

import org.json4s.{DefaultFormats, MappingException}
import org.json4s.jackson.JsonMethods._
import org.apache.spark.sql.functions._

def getJsonContent(jsonstring: String): (String) = {
implicit val formats = DefaultFormats
val parsedJson = parse(jsonstring)
val value1 = (parsedJson\"cards"\"story-elements").extract[String]
value1
}
val getJsonContentUDF = udf((jsonstring: String) => 
getJsonContent(jsonstring))

input.withColumn("cards",getJsonContentUDF(input("storyDataFrame")))

According to json you provided, story-elements is a an array of json objects, but you trying to extract array as a string ( (parsedJson\\"cards"\\"story-elements").extract[String] ).

You can create case class representing on story (like case class Story(description: String, pageUrl: String, ...) ) and then instead of extract[String] , try extract[List[Story]] or extract[Array[Story]] If you need just one piece of data from story (eg descrition), then you can use xpath-like syntax to get that and then extract List[String]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM