简体   繁体   中英

argparse - group unknown fields in csv

I'm doing a csv to json convertion, all works fine, problem is that my last field has sometimes comma seperated values, and the parser considers it as a new column.

ie:

key1 key2 key3 key4
val1 val2 val3 val4,val4.1,val4.2,val4.3

I get this kind of json:

{key1: val1, key2: val2, key3:val3, key4:val4} 

And val4.1,val4.2,val4.3 aren't present. The appropriate result would be:

{key1: "val1", key2: "val2", key3: "val3", key4: "val4,val4.1,val4.2,val4.3"} 

My code so far:

#!/usr/bin/env python
"""Convert csv to json"""
import json
import argparse

def parse(filename):
    with open(filename) as f:
        csv = f.read().split('\r\n\r\n')[1]

    keys = ['val1', 'val2', 'val3', 'val4']
    for line in stations.split('\r\n')[1:]:
        yield dict(zip(keys, [cell.strip() for cell in line.split(',')]))


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('-i', default='dump.csv', help='csv input file')
    parser.add_argument('-o', default='state/dump.json', help='json output file')
    (opt, args) = parser.parse_args()

    with open(opt.o, 'w+') as f:
        rows = []
        for row in parse(opt.i):
            rows.append(row)
        json.dump(rows, f, ensure_ascii=False)

Solution :

The answer was actually pretty simple, just a long day so I didn't thought of it right away. soluting for seekers is this:

  1. calculate how many items are there more than keys in the current row.
  2. get all extras into a single string
  3. replace the single "val4" with the new string.

code:

keys = ['val1', 'val2', 'val3', 'val4']
for line in csv.split('\r\n')[1:]:
    current_line = line.split(',')
    extras = len(current_line) - len(keys)
    newString = ""
    for i in range(len(keys), len(keys) + extras):
        if(i+1 == len(keys) + extras):
            newString += current_line[i]
        else:
            newString += current_line[i] + ","

    list = line.split(',')
    list[len(keys)-1] = probes
    yield dict(zip(keys, [cell.strip() for cell in list]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM