簡體   English   中英

使用帶有新標頭的 python 將 CSV 轉換為 JSON

[英]Convert CSV to JSON using python with new headers

我是 python 新手,想轉換以下 csv 文件

1.0,100.0,303.0,619.0,figure  
338.0,162.0,143.0,423.0,text  
85.0,768.0,554.0,39.0,text  
504.0,164.0,24.0,238.0,subtitle  
120.0,727.0,182.0,13.0,caption  
540.0,165.0,62.0,428.0,title  
614.0,163.0,23.0,133.0,tagline  
317.0,629.0,113.0,113.0,figure  
443.0,629.0,112.0,113.0,figure  
568.0,628.0,121.0,114.0,figure  

成這種格式

{
    "record_01": {
        "x": "1.0", 
        "y": "100.0", 
        "width": "303.0", 
        "height": "619.0", 
        "tag": "figure"
    }, 
    "record_02": {
        "x": "338.0", 
        "y": "162.0",
        "width": "143.0", 
        "height": "423.0", 
        "tag": "text"
    }, 
    "record_03": {
        "x": "85.0", 
        "y": "768.0", 
        "width": "554.0", 
        "height": "39.0", 
        "tag": "text"
    }, .... and so on }

這是當前代碼

import csv
import json

def convert_json(csvPath, jsonPath):
    fieldnames = ["x", "y", "width", "height", "tag"]
    with open(csvPath, "r", encoding="utf-8") as csvFile:
        csvReader = csv.DictReader(csvFile, fieldnames)
        data = []
        for rows in csvReader:
            data.append(rows)
    with open(jsonPath, "w", encoding="utf-8") as jsonFile:
        jsonFile.write(json.dumps(data, indent=4))

輸出看起來像這樣

[
    {
        "x": "1.0",
        "y": "100.0",
        "width": "303.0",
        "height": "619.0",
        "tag": "figure"
    },
    {
        "x": "338.0",
        "y": "162.0",
        "width": "143.0",
        "height": "423.0",
        "tag": "text"
    },
    {
        "x": "85.0",
        "y": "768.0",
        "width": "554.0",
        "height": "39.0",
        "tag": "text"
    }, ..... ]

如何確保 json 文件位於大括號而不是 '[ ]' 並為每個條目添加帶有編號的記錄? 我嘗試使用data={}但它不適用於data.append(rows)

編輯:感謝安東尼奧提供的解決方案和解釋,我更改了代碼並輸出了預期的結果。

import csv
import json
fieldnames = ["x", "y", "width", "height", "tag"]
def convert_json(csvPath, jsonPath):
    with open(csvPath, "r", encoding="utf-8") as csvFile:
        csvReader = csv.DictReader(csvFile, fieldnames)
        data = {}
        for record, rows in enumerate(csvReader, start=1):
            data.update({"record_{:02d}".format(record): rows})
    with open(jsonPath, "w", encoding="utf-8") as jsonFile:
        json.dump(data, jsonFile, indent=4)


csvPath = "data.csv"
jsonPath = "data.json"
convert_json(csvPath, jsonPath)

當您需要使用dictionary時,您正在創建一個列表 您必須在添加元素之前聲明一個字典,或者您也可以使用字典推導以更Python的方式創建您的字典。 要創建記錄編號,您可以使用零填充格式化整數。 要獲取當前記錄號,您可以使用enumerate(item)

import csv
import json

def convert_json(csvPath, jsonPath, fieldnames):

    with open(csvPath, "r", encoding="utf-8") as csvFile:
        csvReader = csv.DictReader(csvFile, fieldnames)
        data = {}
        for record, rows in enumerate(csvReader):
            data.update({"record_{:02d}".format(record): rows})
    with open(jsonPath, "w", encoding="utf-8") as jsonFile:
        json.dump(data, jsonFile, indent=4)


convert_json("./data.csv", "json_file.json", ["x", "y", "width", "height", "tag"])

編輯:帶有字典理解的版本:

import csv
import json

def convert_json(csvPath, jsonPath, fieldnames):

    with open(csvPath, "r", encoding="utf-8") as csvFile:
        csvReader = csv.DictReader(csvFile, fieldnames)
    
        data = {"record_{:02d}".format(record): rows for record, rows in enumerate(csvReader)}
    with open(jsonPath, "w", encoding="utf-8") as jsonFile:
        json.dump(data, jsonFile, indent=4)

convert_json("./data.csv", "json_file.json", ["x", "y", "width", "height", "tag"])

如果您的數據集很小或適合內存,則可以使用 pandas 更輕松地完成以下操作:

from pathlib import Path
import pandas as pd

DATA_PATH = Path("data").joinpath("data.csv")
WRITE_PATH = Path("data").joinpath("data.json")
COL_SCHEMA = ["x", "y", "width", "height", "tag"]

df = pd.read_csv(DATA_PATH, header=None)
df.columns = COL_SCHEMA
df["id"] = "record_" + df.index.astype(str)
df = df.set_index("id")
df.to_json(WRITE_PATH, orient="index", indent=2)

在這里,我在 csv 文件的data目錄中有相同的數據。 為了使函數與平台無關,我現在使用pathlib ,因為代碼可以在 Windows/Unix/Linux 中運行而無需更改。 然后我將數據加載到數據框中,並在其中添加了一個新的 ID 列。 然后我將該 ID 列設置為數據框的索引。 我已將數據以正確的方向寫回同一目錄,對於 JSON,我使用indent = 2只是為了更好地美化。

不過,您必須使用pip install pandas命令安裝 pandas。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM