I was trying to get the data from json data which I got it from wiki api
I was able to print the schema of that exactly
scala> data.printSchema
root
|-- batchcomplete: string (nullable = true)
|-- query: struct (nullable = true)
| |-- pages: struct (nullable = true)
| | |-- 28597189: struct (nullable = true)
| | | |-- ns: long (nullable = true)
| | | |-- pageid: long (nullable = true)
| | | |-- revisions: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- *: string (nullable = true)
| | | | | |-- contentformat: string (nullable = true)
| | | | | |-- contentmodel: string (nullable = true)
| | | |-- title: string (nullable = true)
I want to extract the data of the key "*" |-- *: string (nullable = true)
Please suggest me a solution.
One problem is
pages: struct (nullable = true)
| | |-- 28597189: struct (nullable = true)
the number 28597189 is unique to every title.
First we need to parse the json to get the key (28597189) dynamically then use this to extract the data from spark dataframe like below
val keyName = dataFrame.selectExpr("query.pages.*").schema.fieldNames(0)
println(s"Key Name : $keyName")
this will give you the key dynamically:
Key Name : 28597189
Then use this to extract the data
var revDf = dataFrame.select(explode(dataFrame(s"query.pages.$keyName.revisions")).as("revision")).select("revision.*")
revDf.printSchema()
Output:
root
|-- *: string (nullable = true)
|-- contentformat: string (nullable = true)
|-- contentmodel: string (nullable = true)
and we will be renaming the column *
with some key name like star_column
revDf = revDf.withColumnRenamed("*", "star_column")
revDf.printSchema()
Output:
root
|-- star_column: string (nullable = true)
|-- contentformat: string (nullable = true)
|-- contentmodel: string (nullable = true)
and once we have our final dataframe we will call show
revDf.show()
Output:
+--------------------+-------------+------------+
| star_column|contentformat|contentmodel|
+--------------------+-------------+------------+
|{{EngvarB|date=Se...| text/x-wiki| wikitext|
+--------------------+-------------+------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.