My Json:
{"apps": {"app": [{"id": "id1","user": "hdfs"}, {"id": "id2","user": "yarn"}]}}
Schema:
root
|-- apps: struct (nullable = true)
| |-- app: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- id: String (nullable = true)
| | | |-- name: String (nullable = true)
My code:
StructType schema = new StructType()
.add("apps",(new StructType()
.add("app",(new StructType()))
.add("element",new StructType().add("id",new StringType())add("user",new StringType())
)));
Dataset<Row> df = sparkSession.read().schema(schema).json(<path_to_json>);
It Gives me this error:
Exception in thread "main" scala.MatchError: org.apache.spark.sql.types.StringType@1fca53a7 (of class org.apache.spark.sql.types.StringType)
df.show()
should show me:
id user
id1 hdfs
id2 yarn
You do not need to provide a schema when reading the data, Spark can infer the schema automatically. However, to get the wanted output, some manipulation is necessary.
First, read the data:
Dataset<Row> df = sparkSession.read().json("<path_to_json>");
Use explode
to put each Array element on its own row, then use select
to unpack the data into separate columns.
df.withColumn("app", explode($"apps.app"))
.select("app.*")
This should give you a dataframe in the expected format.
@saidu answer is correct. Though spark will infer the schema automatically but it's advisable to provide schema explicitly. In this scenario it will work as both the types are string. Take an example where first value of id is an integer. So in inferschema it will consider it as long.
I had a similar issue, and using auto-inferred schema was not a solution (inferior performance). Apparently, the error happens because you are using new StringType()
to construct your native types. Instead, you should use the public members of DataTypes
singleton:
StructType schema = new StructType()
.add("apps", new StructType()
.add("app", new ArrayType(new StructType()
.add("id", DataTypes.StringType)
.add("name", DataTypes.StringType))
));
Dataset<Row> df = sparkSession
.read()
.schema(schema)
.json("<path_to_json>");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.