簡體   English   中英

Python CSV到JSON解析器將輸出添加到引號中

[英]Python CSV to JSON parser add quotes to output

我有一個CSV到JSON Python腳本,感謝用戶Petri讓我將Geonames CSV轉換轉換為MongoImport友好的JSON。

問題是,Geonames有一個名為alternatenames的字段,當前被引用並視為一個長字符串。 因此無法在MongoDB中正確查詢。 我想將字段更改為字符串數組,例如: "alternatenames":["name1", "name2"]

Python腳本如下所示:

import csv, simplejson, decimal, codecs

data = open("cities.txt")
reader = csv.DictReader(data, delimiter=",", quotechar='"')

with codecs.open("cities.json", "w", encoding="utf-8") as out:
   for r in reader:
      for k, v in r.items():
         # make sure nulls are generated
         if not v:
            r[k] = None
         # parse and generate decimal arrays
         elif k == "loc":
            r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")]
         # generate a number
         elif k == "geonameid":
            r[k] = int(v)
      out.write(simplejson.dumps(r, ensure_ascii=False, use_decimal=True)+"\n")

我的CSV包含以下字段:

"geonameid","name","asciiname","alternatenames","loc","feature_class","feature_code","country_code","cc2","admin1_code","admin2_code","admin3_code","admin4_code"
3,"Zamīn Sūkhteh","Zamin Sukhteh","Zamin Sukhteh,Zamīn Sūkhteh","[48.91667,32.48333]","P","PPL","IR",,"15",,,
5,"Yekāhī","Yekahi","Yekahi,Yekāhī","[48.9,32.5]","P","PPL","IR",,"15",,,
7,"Tarvīḩ ‘Adāī","Tarvih `Adai","Tarvih `Adai,Tarvīḩ ‘Adāī","[48.2,32.1]","P","PPL","IR",,"15",,,

我當前的JSON輸出如下所示:

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Zamin Sukhteh,Zamīn Sūkhteh", "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Yekahi,Yekāhī", "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Tarvih `Adai,Tarvīḩ ‘Adāī", "asciiname": "Tarvih `Adai", "admin4_code": null}

我想更改JSON輸出以添加字符串數組,如下所示(向右滾動到alternatenames ):

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Zamin Sukhteh", "Zamīn Sūkhteh"], "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Yekahi,Yekāhī"], "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Tarvih `Adai", "Tarvīḩ ‘Adāī"], "asciiname": "Tarvih `Adai", "admin4_code": null}

另外,我應該將Access 2010導出的CSV中的quotechar更改為^而不是"以避免雙引號?

謝謝你的幫助。

在現有的“elif”中添加另一個“elif”來處理“alternatenames”:

     elif k == "alternatenames":
        r[k] = [name.strip() for name in v.split(",")]

所以首先在逗號上拆分字符串,然后在開始/結束處去除空白。

我認為你的quotechar不是問題所在。 您必須手動指定希望將該字段轉換為字符串列表。

警告:未經測試的代碼如下

elif k == "alternatenames":
    r[k] = unicode.split(v, ',')

我假設v是基於字符的unicode,但是如果它是ascii,請調整。

嘗試包括這個:

elif k == "alternatenames":
   r[k] = [v.split(",")]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM