簡體   English   中英

添加字符並刪除JSON文件中的最后一個逗號

[英]Add character and remove the last comma in a JSON file

我正在嘗試通過CSV創建JSON文件。 下面的代碼創建了數據,但是並不是我想要的那樣。 我在python中有一些經驗。 根據我的理解,JSON文件應這樣寫[{},{},...,{}]。

我如何?:

  1. 我可以插入',',但是如何刪除最后一個','?

  2. 如何在開頭插入'[',在結尾插入']? 我嘗試將其插入outputfile.write('['... etc),它顯示了太多地方。

  3. 不包括json文件第一行的標頭。

Names.csv:

id,team_name,team_members
123,Biology,"Ali Smith, Jon Doe"
234,Math,Jane Smith 
345,Statistics ,"Matt P, Albert Shaw"
456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
678,Physics,"Joe Doe, Jane Smith, Ali Smith "

碼:

import csv
import json
import os

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         outfile.write("," + "\n" )

到目前為止的輸出:

{"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
{"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
{"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
{"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
{"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
{"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},

首先,如何跳過標題? 這很容易:

next(infile) # skip the first line
for line in infile:

但是,您可能要考慮使用csv.DictReader作為輸入。 它處理讀取標題行,並使用那里的信息為每一行創建一個dict,並為您拆分行(以及處理您可能沒有想到的情況,例如可以在CSV中出現的帶引號或轉義的文本文件):

for row in csv.DictReader(infile):
    jsondump(row,outfile)

現在進入更困難的問題。

更好的解決方案可能是使用迭代JSON庫,該庫可以將迭代器轉儲為JSON數組。 然后,您可以執行以下操作:

def rows(infile):
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         yield row

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    genjson.dump(rows(infile), outfile)

stdlib json.JSONEncoder在文檔中有一個示例可以做到這一點-盡管效率不是很高,因為它首先消耗了整個迭代器來構建列表,然后轉儲該列表:

class GenJSONEncoder(json.JSONEncoder):
    def default(self, o):
       try:
           iterable = iter(o)
       except TypeError:
           pass
       else:
           return list(iterable)
       # Let the base class default method raise the TypeError
       return json.JSONEncoder.default(self, o)

j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    outfile.write(j.encode(rows(infile)))

實際上,如果您願意構建一個完整的列表而不是逐行編碼,則只需進行明確的列表化可能會更簡單:

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    json.dump(list(rows(infile)))

您還可以通過覆蓋iterencode方法來iterencode ,但這將變得不那么瑣碎了,您可能想在PyPI上尋找一種有效且經過良好測試的流式迭代JSON庫,而不是從json模塊自己構建它。


但是,與此同時,這是您問題的直接解決方案,與現有代碼的更改盡可能少:

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # keep track of the index, just to distinguish line 0 from the rest
    for i, line in enumerate(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         # add the ,\n _before_ each row except the first
         if i:
             outfile.write(',\n')
         json.dump(row,outfile)
    # write the final ]
    outfile.write('\n]')

這個技巧-處理第一個元素而不是最后一個元素-簡化了許多此類問題。


到簡化事情的另一個方法是在相鄰的一對線,使用關於一個較小的變化實際迭代pairwise在示例itertools文檔:

def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # iterate pairs of lines
    for line, nextline in pairwise(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         # add the , if there is a next line
         if nextline is not None:
             outfile.write(',')
         outfile.write('\n')
    # write the final ]
    outfile.write(']')

這與以前的版本一樣有效,並且在概念上更簡單-但更加抽象。

只需對代碼進行最少的編輯,您就可以在Python中創建一個字典列表,並將其立即轉儲為JSON文件(假設數據集足夠小以適合內存):

import csv
import json
import os

rows = []  # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         rows.append(row)  # Append row to list

    json.dump(rows[1:], outfile)  # Write entire list to file (except first row)

id說一句,您不應在Python中將id用作變量名,因為它是內置函數。

熊貓可以輕松解決此問題:

df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
                      .map(lambda s: s.split(','))
                      .map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)

似乎使用csv.DictReader重新發明輪子要容易csv.DictReader

import csv
import json

data = []
with open('names.csv', 'r', newline='') as infile:
    for row in csv.DictReader(infile):
        data.append(row)

with open('names1.json','w') as outfile:
    json.dump(data, outfile, indent=4)

執行以下names1.json文件的內容(我使用indent=4只是為了使其更易於閱讀):

[
    {
        "id": "123",
        "team_name": "Biology",
        "team_members": "Ali Smith, Jon Doe"
    },
    {
        "id": "234",
        "team_name": "Math",
        "team_members": "Jane Smith"
    },
    {
        "id": "345",
        "team_name": "Statistics ",
        "team_members": "Matt P, Albert Shaw"
    },
    {
        "id": "456",
        "team_name": "Chemistry",
        "team_members": "Andrew M, Matt Shaw, Ali Smith"
    },
    {
        "id": "678",
        "team_name": "Physics",
        "team_members": "Joe Doe, Jane Smith, Ali Smith"
    }
]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM