簡體 English 中英

在Apache Spark中讀取漂亮的打印json文件

[英]Reading pretty print json files in Apache Spark

原文 2016-09-12 15:23:23 0 1 python/ json/ apache-spark/ amazon-s3

我的S3存儲桶中有很多json文件，我希望能夠讀取它們並查詢這些文件。 問題是它們印刷得很漂亮。 一個json文件只有一個龐大的字典，但它不在一行中。 根據這個線程，json文件中的字典應該在一行中，這是Apache Spark的限制。 我沒有這樣的結構。

我的JSON架構看起來像這樣 -

{
    "dataset": [
        {
            "key1": [
                {
                    "range": "range1", 
                    "value": 0.0
                }, 
                {
                    "range": "range2", 
                    "value": 0.23
                }
             ]
        }, {..}, {..}
    ],
    "last_refreshed_time": "2016/09/08 15:05:31"
}

這是我的問題 -

我可以避免轉換這些文件以匹配Apache Spark所需的架構（文件中每行一個字典）並仍能讀取它嗎？
如果沒有，在Python中最好的方法是什么？ 我每天都有一堆這些文件。 存儲桶按日分區。
有沒有其他工具更適合查詢Apache Spark以外的這些文件？ 我在AWS堆棧上，所以可以嘗試使用Zeppelin筆記本的任何其他建議工具。

1 個解決方案

你可以使用sc.wholeTextFiles()這是一篇相關的帖子。

或者，您可以使用簡單的函數重新格式化json並加載生成的文件。

def reformat_json(input_path, output_path):
    with open(input_path, 'r') as handle:
        jarr = json.load(handle)

    f = open(output_path, 'w')
    for entry in jarr:
        f.write(json.dumps(entry)+"\n")
    f.close()

漂亮的打印 JSON 轉儲

[英]Pretty print JSON dumps

漂亮的打印 JSON python

[英]Pretty print JSON python

JSON 漂亮打印多行

[英]JSON pretty print multiple lines

漂亮的打印復雜的JSON輸入

[英]Pretty print complex JSON input

在Python 3.5中漂亮地打印JSON

[英]Pretty print a JSON in Python 3.5

Apache spark：從 csv 文件讀取並創建 RDDes

[英]Apache spark: reading from csv files and creating RDDes

如何在不刪除unicode的情況下從外殼中使用json.tool來驗證和打印漂亮的語言文件？

[英]How to use json.tool from the shell to validate and pretty-print language files without removing the unicode?

使用Spark讀取和訪問json文件中的嵌套字段

[英]Reading and accessing nested fields in json files using spark

在 Spark 中讀取一百萬個 JSON 文件時，AWS Glue 中出現 StackOverflowError

[英]StackOverflowError in AWS Glue while reading a million JSON files in Spark

有沒有辦法在 FastAPI 中漂亮地打印/美化 JSON 響應？

[英]Is there a way to pretty print / prettify a JSON response in FastAPI?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 漂亮的打印 JSON 轉儲漂亮的打印 JSON python JSON 漂亮打印多行漂亮的打印復雜的JSON輸入在Python 3.5中漂亮地打印JSON Apache spark：從 csv 文件讀取並創建 RDDes 如何在不刪除unicode的情況下從外殼中使用json.tool來驗證和打印漂亮的語言文件？使用Spark讀取和訪問json文件中的嵌套字段在 Spark 中讀取一百萬個 JSON 文件時，AWS Glue 中出現 StackOverflowError 有沒有辦法在 FastAPI 中漂亮地打印/美化 JSON 響應？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM