简体   繁体   中英

How to convert specific CSV format to JSON using Python

I have downloaded a CSV file from Google Trends which presents data in this format:

Top cities for golden globes
City,golden globes
New York (United States),100
Los Angeles (United States),91
Toronto (Canada),69

Top regions for golden globes
Region,golden globes
United States,100
Canada,91
Ireland,72
Australia,72

There are 3-4 of these groups separated by whitespace. The first line of each group contains text I want to use as a key, followed by a list of dictionaries I need associated with that key. Does anyone have any advice on some Python tools I could use to make this happen? I'm not having much luck with Python's CSV library.

My desired output from the above CSV would look like this:

{
"Top cities for golden globes" :
   {
      "New York (United States)" : 100,
      "Los Angeles (United States)" : 91,
      "Toronto (Canada)" : 69
   },
"Top regions for golden globes" :
   {
      "United States" : 100,
      "Canada" : 91,
      "Ireland" : 72,
      "Australia" : 72
   }
}

Your input format is so expectable that I would do it by hand, without a CSV library.

import json
from collections import defaultdict

fh = open("yourfile.csv")
result = defaultdict(dict) #dictionary holding the data
current_key = "" #current category
ignore_next = False #flag to skip header

for line in fh:
    line = line.strip() #throw away newline
    if line == "": #line is empty
        current_key = ""
        continue
    if current_key == "": #current_key is empty
        current_key = line #so the current line is the header for the following data
        ignore_next = True
        continue
    if ignore_next: #we're in a line that can be ignored
        ignore_next = False
        continue
    (a,b) = line.split(",")
    result[current_key][a] = b
fh.close()

#pretty-print data
print json.dumps(result, sort_keys=True, indent=4)

I'd try something like...:

row = []
dd = {}
with open('the.csv') as f:
    r = csv.reader(f)
    while True:
        if row:  # normal case, non-empty row
            d[row[0]] = row[1]
            row = next(r, None)
            if row is None: break
        else:  # row is empty at start and after blank line
            category = next(f, None)
            if category is None: break
            category = category.strip()
            next(r)  # skip headers row
            d = dd[category] = {}
            row = next(r, None)
            if row is None: break

Now, dd should be the dict-of-dicts you want, and you can json.dump it as you wish.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM