I am new to Apache Spark and I am trying to compare two json files. My requirement is to find out that which key/value is added, removed or modified and what is its path.
To explain my problem, I am sharing the code which I have tried with a small json sample here.
Sample Json 1 is:
{
"employee": {
"name": "sonoo",
"salary": 57000,
"married": true
} }
Sample Json 2 is:
{
"employee": {
"name": "sonoo",
"salary": 58000,
"married": true
} }
My code is:
//Compare two multiline json files
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Load first json file
val jsonData_1 = sqlContext.read.json(sc.wholeTextFiles("D:\\File_1.json").values)
//Load second json file
val jsonData_2 = sqlContext.read.json(sc.wholeTextFiles("D:\\File_2.json").values)
//Compare both json files
jsonData_2.except(jsonData_1).show(false)
The output which I get on executing this code is:
+--------------------+
|employee |
+--------------------+
|{true, sonoo, 58000}|
+--------------------+
But here only one field ie salary was modified so output should be only the updated field with its path.
Below is the expected output details:
[
{
"op" : "replace",
"path" : "/employee/salary",
"value" : 58000
}
]
Can anyone point me in the right direction?
Assuming each json has an identifier, and that you have two json groups (eg folders), you need to compare b/w the jsons in the two groups:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.