繁体 English 中英

在Apache Spark中读取漂亮的打印json文件

[英]Reading pretty print json files in Apache Spark

原文 2016-09-12 15:23:23 9 1 python/ json/ apache-spark/ amazon-s3

我的S3存储桶中有很多json文件，我希望能够读取它们并查询这些文件。 问题是它们印刷得很漂亮。 一个json文件只有一个庞大的字典，但它不在一行中。 根据这个线程，json文件中的字典应该在一行中，这是Apache Spark的限制。 我没有这样的结构。

我的JSON架构看起来像这样 -

{
    "dataset": [
        {
            "key1": [
                {
                    "range": "range1", 
                    "value": 0.0
                }, 
                {
                    "range": "range2", 
                    "value": 0.23
                }
             ]
        }, {..}, {..}
    ],
    "last_refreshed_time": "2016/09/08 15:05:31"
}

这是我的问题 -

我可以避免转换这些文件以匹配Apache Spark所需的架构（文件中每行一个字典）并仍能读取它吗？
如果没有，在Python中最好的方法是什么？ 我每天都有一堆这些文件。 存储桶按日分区。
有没有其他工具更适合查询Apache Spark以外的这些文件？ 我在AWS堆栈上，所以可以尝试使用Zeppelin笔记本的任何其他建议工具。

1 个解决方案

你可以使用sc.wholeTextFiles()这是一篇相关的帖子。

或者，您可以使用简单的函数重新格式化json并加载生成的文件。

def reformat_json(input_path, output_path):
    with open(input_path, 'r') as handle:
        jarr = json.load(handle)

    f = open(output_path, 'w')
    for entry in jarr:
        f.write(json.dumps(entry)+"\n")
    f.close()

漂亮的打印 JSON 转储

[英]Pretty print JSON dumps

漂亮的打印 JSON python

[英]Pretty print JSON python

JSON 漂亮打印多行

[英]JSON pretty print multiple lines

漂亮的打印复杂的JSON输入

[英]Pretty print complex JSON input

在Python 3.5中漂亮地打印JSON

[英]Pretty print a JSON in Python 3.5

Apache spark：从 csv 文件读取并创建 RDDes

[英]Apache spark: reading from csv files and creating RDDes

如何在不删除unicode的情况下从外壳中使用json.tool来验证和打印漂亮的语言文件？

[英]How to use json.tool from the shell to validate and pretty-print language files without removing the unicode?

使用Spark读取和访问json文件中的嵌套字段

[英]Reading and accessing nested fields in json files using spark

在 Spark 中读取一百万个 JSON 文件时，AWS Glue 中出现 StackOverflowError

[英]StackOverflowError in AWS Glue while reading a million JSON files in Spark

有没有办法在 FastAPI 中漂亮地打印/美化 JSON 响应？

[英]Is there a way to pretty print / prettify a JSON response in FastAPI?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 漂亮的打印 JSON 转储漂亮的打印 JSON python JSON 漂亮打印多行漂亮的打印复杂的JSON输入在Python 3.5中漂亮地打印JSON Apache spark：从 csv 文件读取并创建 RDDes 如何在不删除unicode的情况下从外壳中使用json.tool来验证和打印漂亮的语言文件？使用Spark读取和访问json文件中的嵌套字段在 Spark 中读取一百万个 JSON 文件时，AWS Glue 中出现 StackOverflowError 有没有办法在 FastAPI 中漂亮地打印/美化 JSON 响应？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM