简体   繁体   English

Apache Spark to_json选项参数

[英]Apache Spark to_json options parameter

I either don't know what I'm looking for or the documentation is lacking. 我或者不知道我在寻找什么,或者缺少文档。 The latter seems to be the case, given this: 鉴于这种情况,后者似乎是这样:

http://spark.apache.org/docs/2.2.2/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column-java.util.Map- http://spark.apache.org/docs/2.2.2/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column-java.util.Map-

"options - options to control how the struct column is converted into a json string. accepts the same options and the json data source." “ options-用于控制如何将struct列转换为json字符串的选项。接受相同的选项和json数据源。”

Great! 大! So, what are my options? 那么,我有什么选择呢?

I'm doing something like this: 我正在做这样的事情:

Dataset<Row> formattedReader = reader
    .withColumn("id", lit(id))
    .withColumn("timestamp", lit(timestamp))
    .withColumn("data", to_json(struct("record_count")));

...and I get this result: ...我得到这个结果:

{
  "id": "ABC123",
  "timestamp": "2018-11-16 20:40:26.108",
  "data": "{\"record_count\": 989}"
}

I'd like this (remove back-slashes and quotes from "data"): 我想要这样(从“数据”中删除反斜杠和引号):

{
  "id": "ABC123",
  "timestamp": "2018-11-16 20:40:26.108",
  "data": {"record_count": 989}
}

Is this one of the options by chance? 这是偶然的选择之一吗? Is there a better guide out there for Spark? 是否有更好的Spark指南? The most frustrating part about Spark hasn't been getting it to do what I want, it's been a lack of good information on what it can do. 关于Spark的最令人沮丧的部分不是让它做我想要的事情,而是缺少关于它可以做什么的良好信息。

You are json encoding twice for the record_count field. 您对record_count字段进行了两次json编码。 Remove to_json. 删除to_json。 struct alone should be sufficient. 仅使用struct就足够了。

As in change your code to something like this. 如将您的代码更改为这样。

Dataset<Row> formattedReader = reader
    .withColumn("id", lit(id))
    .withColumn("timestamp", lit(timestamp))
    .withColumn("data", struct("record_count"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM