[英]Convert CSV to JSON using python with new headers
我是 python 新手,想轉換以下 csv 文件
1.0,100.0,303.0,619.0,figure
338.0,162.0,143.0,423.0,text
85.0,768.0,554.0,39.0,text
504.0,164.0,24.0,238.0,subtitle
120.0,727.0,182.0,13.0,caption
540.0,165.0,62.0,428.0,title
614.0,163.0,23.0,133.0,tagline
317.0,629.0,113.0,113.0,figure
443.0,629.0,112.0,113.0,figure
568.0,628.0,121.0,114.0,figure
成這種格式
{
"record_01": {
"x": "1.0",
"y": "100.0",
"width": "303.0",
"height": "619.0",
"tag": "figure"
},
"record_02": {
"x": "338.0",
"y": "162.0",
"width": "143.0",
"height": "423.0",
"tag": "text"
},
"record_03": {
"x": "85.0",
"y": "768.0",
"width": "554.0",
"height": "39.0",
"tag": "text"
}, .... and so on }
這是當前代碼
import csv
import json
def convert_json(csvPath, jsonPath):
fieldnames = ["x", "y", "width", "height", "tag"]
with open(csvPath, "r", encoding="utf-8") as csvFile:
csvReader = csv.DictReader(csvFile, fieldnames)
data = []
for rows in csvReader:
data.append(rows)
with open(jsonPath, "w", encoding="utf-8") as jsonFile:
jsonFile.write(json.dumps(data, indent=4))
輸出看起來像這樣
[
{
"x": "1.0",
"y": "100.0",
"width": "303.0",
"height": "619.0",
"tag": "figure"
},
{
"x": "338.0",
"y": "162.0",
"width": "143.0",
"height": "423.0",
"tag": "text"
},
{
"x": "85.0",
"y": "768.0",
"width": "554.0",
"height": "39.0",
"tag": "text"
}, ..... ]
如何確保 json 文件位於大括號而不是 '[ ]' 並為每個條目添加帶有編號的記錄? 我嘗試使用data={}
但它不適用於data.append(rows)
編輯:感謝安東尼奧提供的解決方案和解釋,我更改了代碼並輸出了預期的結果。
import csv
import json
fieldnames = ["x", "y", "width", "height", "tag"]
def convert_json(csvPath, jsonPath):
with open(csvPath, "r", encoding="utf-8") as csvFile:
csvReader = csv.DictReader(csvFile, fieldnames)
data = {}
for record, rows in enumerate(csvReader, start=1):
data.update({"record_{:02d}".format(record): rows})
with open(jsonPath, "w", encoding="utf-8") as jsonFile:
json.dump(data, jsonFile, indent=4)
csvPath = "data.csv"
jsonPath = "data.json"
convert_json(csvPath, jsonPath)
當您需要使用dictionary時,您正在創建一個列表。 您必須在添加元素之前聲明一個字典,或者您也可以使用字典推導以更Python的方式創建您的字典。 要創建記錄編號,您可以使用零填充格式化整數。 要獲取當前記錄號,您可以使用enumerate(item)
。
import csv
import json
def convert_json(csvPath, jsonPath, fieldnames):
with open(csvPath, "r", encoding="utf-8") as csvFile:
csvReader = csv.DictReader(csvFile, fieldnames)
data = {}
for record, rows in enumerate(csvReader):
data.update({"record_{:02d}".format(record): rows})
with open(jsonPath, "w", encoding="utf-8") as jsonFile:
json.dump(data, jsonFile, indent=4)
convert_json("./data.csv", "json_file.json", ["x", "y", "width", "height", "tag"])
編輯:帶有字典理解的版本:
import csv
import json
def convert_json(csvPath, jsonPath, fieldnames):
with open(csvPath, "r", encoding="utf-8") as csvFile:
csvReader = csv.DictReader(csvFile, fieldnames)
data = {"record_{:02d}".format(record): rows for record, rows in enumerate(csvReader)}
with open(jsonPath, "w", encoding="utf-8") as jsonFile:
json.dump(data, jsonFile, indent=4)
convert_json("./data.csv", "json_file.json", ["x", "y", "width", "height", "tag"])
如果您的數據集很小或適合內存,則可以使用 pandas 更輕松地完成以下操作:
from pathlib import Path
import pandas as pd
DATA_PATH = Path("data").joinpath("data.csv")
WRITE_PATH = Path("data").joinpath("data.json")
COL_SCHEMA = ["x", "y", "width", "height", "tag"]
df = pd.read_csv(DATA_PATH, header=None)
df.columns = COL_SCHEMA
df["id"] = "record_" + df.index.astype(str)
df = df.set_index("id")
df.to_json(WRITE_PATH, orient="index", indent=2)
在這里,我在 csv 文件的data
目錄中有相同的數據。 為了使函數與平台無關,我現在使用pathlib
,因為代碼可以在 Windows/Unix/Linux 中運行而無需更改。 然后我將數據加載到數據框中,並在其中添加了一個新的 ID 列。 然后我將該 ID 列設置為數據框的索引。 我已將數據以正確的方向寫回同一目錄,對於 JSON,我使用indent = 2
只是為了更好地美化。
不過,您必須使用pip install pandas
命令安裝 pandas。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.