Python：如何從壓縮的 json.gz 文件中讀取並寫入 json 文件

Question

我想從compressed.json.gz文件中讀取並將其解碼文件寫入.json文件

.json.gz 文件：

數據/sample1.gz
數據/sample2.gz

寫入.json 文件

數據/樣本1.json
數據/樣本2.json

Answer 1

我有一個要求，我有一個壓縮 json.gz 文件的列表。 我需要將其解壓縮並將其轉換回具有相同文件名的 json 文件。 下面提到的代碼正在工作。

將此腳本放在包含 .gz 文件的文件夾中，並使用 python3 運行它。 它會起作用的。

文件：腳本.py

import gzip
import os

def get_file_names_by_extension(path = ".", file_extension = ".gz"):
    file_names = []
    for x in os.listdir(path):
        if x.endswith(file_extension):
            file_names.append(x)
    return file_names

def write_file(data, destination_path, file_name, encoding = "utf-8"):
    output_file_name = "/".join([destination_path, file_name])
    print(output_file_name)
    with open(output_file_name, "w") as outfile:
        outfile.write(data.encode(encoding))

def decompress_files(files, destination_path, output_format = ".json", encoding = "utf-8"):
    for file in files:
        _file = gzip.GzipFile(file, "rb")
        content = _file.read()
        content = content.decode(encoding)
        output_file_name = "".join([file.split(".")[0], output_format])
        write_file(content, destination_path, output_file_name, encoding)

        
files = get_file_names_by_extension(path=".", file_extension=".gz")
decompress_files(files, ".", ".json")

Answer 2

Pyspark 可以從文件名中推斷出 json 文件是 gzip 的。 您可以讀取數據然后在不進行任何壓縮的情況下將其寫回以獲得您想要的結果。 在 Spark 中這樣做的好處是它可以使用多個 worker 並行讀取/寫入數據，尤其是當數據在 S3 中時。

df = spark.read.json("data/")
df.write.json("data/", mode="append", compression="none")

Python：如何從壓縮的 json.gz 文件中讀取並寫入 json 文件

問題描述

2 個解決方案

解決方案1
0 2021-12-15 17:13:54

解決方案2
0 2021-12-15 18:08:54

Python：如何從壓縮的 json.gz 文件中讀取並寫入 json 文件

問題描述

2 個解決方案

解決方案1 0 2021-12-15 17:13:54

解決方案2 0 2021-12-15 18:08:54

解決方案1
0 2021-12-15 17:13:54

解決方案2
0 2021-12-15 18:08:54