[英]Scala - read JSON file as a single String with Spark
I have JSON files describing a table structure. 我有描述表结构的JSON文件。 I want to read each file from S3 as a single String in order to then apply a
fromJson()
method of apache.spark.sql.types.DataType
我想读从S3每个文件以然后应用一个单独的字符串
fromJson()
的方法apache.spark.sql.types.DataType
DataType.fromJson(jsonString).asInstanceOf[StructType]
But for now I only managed to read the files into a DataFrame: 但是现在我只设法将文件读入DataFrame中:
val testJsonData = sqlContext.read.option("multiline", "true").json("/s3Bucket/metrics/metric1.json")
But I don't need a df.schema
, instead I need to parse the contents of a JSON string to a StructType. 但是我不需要
df.schema
,而是需要将JSON字符串的内容解析为StructType。
The contents of a JSON file: JSON文件的内容:
{
"type" : "struct",
"fields" : [ {
"name" : "metric_name",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "metric_time",
"type" : "long",
"nullable" : true,
"metadata" : { }
}, {
"name" : "metric_value",
"type" : "string",
"nullable" : true,
"metadata" : { }
}]
}
It looks like what you want to use is sc.wholeTextFiles
(sc is a SparkContext
in this case). 看起来您要使用的是
sc.wholeTextFiles
(在这种情况下,sc是SparkContext
)。
This results in an RDD[(String, String)]
where ._1
is the file name, and ._2
is the entire file content. 这将导致
RDD[(String, String)]
,其中._1
是文件名, ._2
是整个文件内容。 Maybe you can try: 也许您可以尝试:
val files = sc.wholeTextFiles("/s3Bucket/metrics/", 16).toDS()
files.map(DataType.fromJson(_._2).asInstanceOf[StructType])
Which, in theory, would give you an Dataset[StructType]
. 从理论上讲,它将为您提供
Dataset[StructType]
。 Unfortunately, I'm not finding a similar function in the pure spark sql API, but this may work. 不幸的是,我没有在纯Spark sql API中找到类似的功能,但这可能有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.