[英]Add character and remove the last comma in a JSON file
我正在嘗試通過CSV創建JSON文件。 下面的代碼創建了數據,但是並不是我想要的那樣。 我在python中有一些經驗。 根據我的理解,JSON文件應這樣寫[{},{},...,{}]。
我如何?:
我可以插入',',但是如何刪除最后一個','?
如何在開頭插入'[',在結尾插入']? 我嘗試將其插入outputfile.write('['... etc),它顯示了太多地方。
不包括json文件第一行的標頭。
Names.csv:
id,team_name,team_members
123,Biology,"Ali Smith, Jon Doe"
234,Math,Jane Smith
345,Statistics ,"Matt P, Albert Shaw"
456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
678,Physics,"Joe Doe, Jane Smith, Ali Smith "
碼:
import csv
import json
import os
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
outfile.write("," + "\n" )
到目前為止的輸出:
{"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
{"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
{"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
{"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
{"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
{"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},
首先,如何跳過標題? 這很容易:
next(infile) # skip the first line
for line in infile:
但是,您可能要考慮使用csv.DictReader
作為輸入。 它處理讀取標題行,並使用那里的信息為每一行創建一個dict,並為您拆分行(以及處理您可能沒有想到的情況,例如可以在CSV中出現的帶引號或轉義的文本文件):
for row in csv.DictReader(infile):
jsondump(row,outfile)
現在進入更困難的問題。
更好的解決方案可能是使用迭代JSON庫,該庫可以將迭代器轉儲為JSON數組。 然后,您可以執行以下操作:
def rows(infile):
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
yield row
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
genjson.dump(rows(infile), outfile)
stdlib json.JSONEncoder
在文檔中有一個示例可以做到這一點-盡管效率不是很高,因為它首先消耗了整個迭代器來構建列表,然后轉儲該列表:
class GenJSONEncoder(json.JSONEncoder):
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, o)
j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
outfile.write(j.encode(rows(infile)))
實際上,如果您願意構建一個完整的列表而不是逐行編碼,則只需進行明確的列表化可能會更簡單:
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
json.dump(list(rows(infile)))
您還可以通過覆蓋iterencode
方法來iterencode
,但這將變得不那么瑣碎了,您可能想在PyPI上尋找一種有效且經過良好測試的流式迭代JSON庫,而不是從json
模塊自己構建它。
但是,與此同時,這是您問題的直接解決方案,與現有代碼的更改盡可能少:
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# keep track of the index, just to distinguish line 0 from the rest
for i, line in enumerate(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
# add the ,\n _before_ each row except the first
if i:
outfile.write(',\n')
json.dump(row,outfile)
# write the final ]
outfile.write('\n]')
這個技巧-處理第一個元素而不是最后一個元素-簡化了許多此類問題。
到簡化事情的另一個方法是在相鄰的一對線,使用關於一個較小的變化實際迭代pairwise
在示例itertools
文檔:
def pairwise(iterable):
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# iterate pairs of lines
for line, nextline in pairwise(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
# add the , if there is a next line
if nextline is not None:
outfile.write(',')
outfile.write('\n')
# write the final ]
outfile.write(']')
這與以前的版本一樣有效,並且在概念上更簡單-但更加抽象。
只需對代碼進行最少的編輯,您就可以在Python中創建一個字典列表,並將其立即轉儲為JSON文件(假設數據集足夠小以適合內存):
import csv
import json
import os
rows = [] # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
for line in infile:
row = dict()
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
rows.append(row) # Append row to list
json.dump(rows[1:], outfile) # Write entire list to file (except first row)
熊貓可以輕松解決此問題:
df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
.map(lambda s: s.split(','))
.map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)
似乎使用csv.DictReader
重新發明輪子要容易csv.DictReader
:
import csv
import json
data = []
with open('names.csv', 'r', newline='') as infile:
for row in csv.DictReader(infile):
data.append(row)
with open('names1.json','w') as outfile:
json.dump(data, outfile, indent=4)
執行以下names1.json
文件的內容(我使用indent=4
只是為了使其更易於閱讀):
[
{
"id": "123",
"team_name": "Biology",
"team_members": "Ali Smith, Jon Doe"
},
{
"id": "234",
"team_name": "Math",
"team_members": "Jane Smith"
},
{
"id": "345",
"team_name": "Statistics ",
"team_members": "Matt P, Albert Shaw"
},
{
"id": "456",
"team_name": "Chemistry",
"team_members": "Andrew M, Matt Shaw, Ali Smith"
},
{
"id": "678",
"team_name": "Physics",
"team_members": "Joe Doe, Jane Smith, Ali Smith"
}
]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.