I have a complicated JSON column whose structure is :
story{ cards: [{story-elements: [{...}{...}{...}}]}
The length of the story-elements is variable. I need to extract a particular JSON block from the story-elements array. For this, I first need to extract the story-elements.
Here is the code which I have tried, but it is giving error:
import org.json4s.{DefaultFormats, MappingException}
import org.json4s.jackson.JsonMethods._
import org.apache.spark.sql.functions._
def getJsonContent(jsonstring: String): (String) = {
implicit val formats = DefaultFormats
val parsedJson = parse(jsonstring)
val value1 = (parsedJson\"cards"\"story-elements").extract[String]
value1
}
val getJsonContentUDF = udf((jsonstring: String) =>
getJsonContent(jsonstring))
input.withColumn("cards",getJsonContentUDF(input("storyDataFrame")))
According to json you provided, story-elements
is a an array of json objects, but you trying to extract array as a string ( (parsedJson\\"cards"\\"story-elements").extract[String]
).
You can create case class representing on story (like case class Story(description: String, pageUrl: String, ...)
) and then instead of extract[String]
, try extract[List[Story]]
or extract[Array[Story]]
If you need just one piece of data from story (eg descrition), then you can use xpath-like syntax to get that and then extract List[String]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.