[英]Add json data to multiline string in scala to process using spark
I am trying to utilize some parameters which are in multiline single json object in a json file stored on s3. 我试图利用存储在s3上的json文件中的多行单个json对象中的一些参数。 However, because I am facing several issues for reading and parsing json in spark(honestly, its pain...), I tried using jackson converted a hardcoded multiline json to map as:
但是,由于我在读取和解析spark中的json时遇到了几个问题(老实说,这很痛苦...),我尝试使用杰克逊将硬编码的多行json转换为映射为:
Following is my json hardcoded as multiline string: 以下是我的json硬编码为多行字符串:
val jsonString =
"""
{
myJSON
}
"""
I used jackson binder to decode it: 我使用杰克逊活页夹对其进行解码:
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
mapper.readValue(jsonString, classOf[Map[String, String]])
Now I can use a map very easily. 现在我可以很容易地使用地图了。 Unfortunately all the code base uses a map, hence this method seems preferable to me.
不幸的是,所有代码库都使用映射,因此这种方法对我来说似乎更可取。
So I wanted to know if there is a way to create a multiline string with a json file in spark-scala? 所以我想知道是否有一种方法可以在spark-scala中用json文件创建多行字符串? I will be fetching my json file from s3.
我将从s3获取我的json文件。
If you are not bounded by jackson, then you can try do it much easy and faster with jsoniter_scala . 如果您不受杰克逊( Jackson)的束缚,那么可以尝试使用jsoniter_scala轻松,快速地完成操作 。 Add dependencies to your build script.
将依赖项添加到您的构建脚本中。 Import and use them like here:
像下面这样导入和使用它们:
// import required packages
import java.io._
import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._
// create JSON codec for your map
val codec = JsonCodecMaker.make[Map[String, String]](CodecMakerConfig())
// then read JSON file using it
val map = {
val in: InputStream = // <- here can be any input stream implementation, no buffering required
new FileInputStream("/tmp/input.json")
try JsonReader.read(codec, in)
finally in.close()
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.