简体   繁体   中英

Convert CSV file to JSON file

I am trying to convert my CSV file to a JSON file format. When I do it, there is an extra entry in the JSON file which only contains field names.

I have tried using pandas, dictionary but can't seem to get to the result. Something or other comes.

I want to remove the extra filed names only entry at the start of the JSON. Also how can I make ConnectionId as key and the same format for a different output.

import csv, json

csvfile = open('/home/Desktop/PD/GEOSubscriberLocations_LTE_sample.csv', 'r')
jsonfile = open('/home/Desktop/PD/script5.json', 'w')

fieldnames = ("Confidence", "ConnectionId", "Imei", "Imsi", "IsData", "IsSignalling", "IsVoice", "Latitude", "Longitude",
              "Mcc", "Mnc", "SegmentDuration", "SegmentStartTime", "ServingCellLabel", "Sv", 
              "TrackingAreaCode", "Uncertainity")

reader = csv.DictReader(csvfile , fieldnames)

code = ''
for row in reader:
    for key in row:
        row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')
        json.dump(row, jsonfile, indent=4, sort_keys=False)
        jsonfile.write('\n')

The actual result is:

{
    "Confidence": "Confidence", 
    "IsData": "IsData", 
    "Latitude": "Latitude", 
    "ConnectionId": "ConnectionId", 
    "Mcc": "Mcc", 
    "Sv": "Sv", 
    "Longitude": "Longitude", 
    "Uncertainity": "Uncertainty", 
    "IsVoice": "IsVoice", 
    "IsSignalling": "IsSignalling", 
    "SegmentStartTime": "SegmentStartTime", 
    "Imei": "Imei", 
    "SegmentDuration": "SegmentDuration", 
    "Mnc": "Mnc", 
    "ServingCellLabel": "ServingCellLabel", 
    "Imsi": "Imsi", 
    "TrackingAreaCode": "TrackingAreaCode"
}
{
    "Confidence": "1.994667E-07", 
    "IsData": "FALSE", 
    "Latitude": "1.694202", 
    "ConnectionId": "330708186825281", 
    "Mcc": "999", 
    "Sv": "01", 
    "Longitude": "0.434623", 
    "Uncertainity": "178", 
    "IsVoice": "FALSE", 
    "IsSignalling": "TRUE", 
    "SegmentStartTime": "16/02/2017 09:56:59.912", 
    "Imei": "99999006686069", 
    "SegmentDuration": "00:00:00.0350000", 
    "Mnc": "99", 
    "ServingCellLabel": "Cell18", 
    "Imsi": "999992223223602", 
    "TrackingAreaCode": "1234"
}
{
    "Confidence": "1.504506E-12", 
    "IsData": "FALSE", 
    "Latitude": "1.633704", 
    "ConnectionId": "260339442647675", 
    "Mcc": "999", 
    "Sv": "02", 
    "Longitude": "0.668554", 
    "Uncertainity": "314", 
    "IsVoice": "FALSE", 
    "IsSignalling": "TRUE", 
    "SegmentStartTime": "16/02/2017 09:57:01.377", 
    "Imei": "99999207564306", 
    "SegmentDuration": "00:00:00.0280000", 
    "Mnc": "99", 
    "ServingCellLabel": "Cell19", 
    "Imsi": "999993793410366", 
    "TrackingAreaCode": "1235"
}
{
    "Confidence": "0.3303348", 
    "IsData": "FALSE", 
    "Latitude": "1.847635", 
    "ConnectionId": "260339442647676", 
    "Mcc": "999", 
    "Sv": "14", 
    "Longitude": "1.356349", 
    "Uncertainity": "129", 
    "IsVoice": "FALSE", 
    "IsSignalling": "TRUE", 
    "SegmentStartTime": "16/02/2017 09:57:01.555", 
    "Imei": "99999605176135", 
    "SegmentDuration": "00:00:00.0290000", 
    "Mnc": "99", 
    "ServingCellLabel": "Cell13", 
    "Imsi": "999992216631694", 
    "TrackingAreaCode": "1236"
}
{
    "Confidence": "0.01800376", 
    "IsData": "FALSE", 
    "Latitude": "1.914598", 
    "ConnectionId": "330708186825331", 
    "Mcc": "999", 
    "Sv": "74", 
    "Longitude": "1.222736", 
    "Uncertainity": "463", 
    "IsVoice": "FALSE", 
    "IsSignalling": "TRUE", 
    "SegmentStartTime": "16/02/2017 09:57:02.689", 
    "Imei": "99999007880884", 
    "SegmentDuration": "00:00:00.0260000", 
    "Mnc": "99", 
    "ServingCellLabel": "Cell7", 
    "Imsi": "999992226681236", 
    "TrackingAreaCode": "1237"
}
{
    "Confidence": "0.2068138", 
    "IsData": "FALSE", 
    "Latitude": "1.850279", 
    "ConnectionId": "330708186825354", 
    "Mcc": "999", 
    "Sv": "13", 
    "Longitude": "1.349263", 
    "Uncertainity": "167", 
    "IsVoice": "FALSE", 
    "IsSignalling": "TRUE", 
    "SegmentStartTime": "16/02/2017 09:57:04.351", 
    "Imei": "99999002855874", 
    "SegmentDuration": "00:00:00.0300000", 
    "Mnc": "99", 
    "ServingCellLabel": "Cell15", 
    "Imsi": "999995430231562", 
    "TrackingAreaCode": "1238"
}

If using ConnectionId as key, I want my output like:

{
    "ConnectionId": "189970698469977",
        {
            "Confidence": "0.01428183",
            "Imei": "99999507405260",
            "Imsi": "999992226504812",
            "IsData": "FALSE",
            "IsSignalling": "TRUE",
            "IsVoice": "FALSE",
            "Latitude": "1.848613",
            "Longitude": "1.354355",
            "Mcc": "999",
            "Mnc": "99",
            "SegmentDuration": "00:00:00.0860000",
            "SegmentStartTime": "16/02/2017 09:57:00.053",
            "ServingCellLabel": "Cell14",
            "Sv": "06",
            "TrackingAreaCode": "1256",
            "Uncertainty": 662
        }

Try replacing your for loop with the following code:

arr = []

with open (csvFile) as f:
    csvReader = csv.DictReader(f)
    #print(csvReader)
    for csvRow in csvReader:
        arr.append(csvRow)

print(arr)

# write the data to a json file
with open(jsonFile, "w") as jsonFile:
    jsonFile.write(json.dumps(arr, indent = 4))

Please refer to this link as well.

The extra field-names-only entry

If you provide fieldnames explicitly, csv will assume that the first row of the .csv file is data. If you leave out the fieldnames parameter, it will assume the first row of the .csv file is a header row with the field names:

The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.

It looks like your .csv file has a header row but you have also provided fieldnames explicitly so csv has read in the header row as data. To just use the fieldnames in the header row change your call to DictReader to:

csv.DictReader(csvfile)  # notice no fieldnames parameter

Using a certain field as a key

First consider how best to represent this in JSON and what you are trying to gain from indexing by this field, the example you gave isn't quite valid JSON.

{
    "ConnectionId": "189970698469977",
        {
            "Confidence": "0.01428183",
            "Imei": "99999507405260",
            ...
        }

It's not valid because:

  • We open a { , indicating this is an 'object'
  • Objects have keys, and values associated with those keys and nothing else
  • We provide a key 'ConnectionID' and a value for it. This is fine
  • Then we provide another object but no key, this is invalid.

Assuming you want to be able to quickly look up objects based on the connectionId, how about we create an object in JSON that looks like this:

{
    "189970698469977": {
        "Confidence": "0.01428183",
        "Imei": "99999507405260",
        ...
    },
    "260339442647676": {
        "Confidence": ...
    },
    ...
}

This gives us the kind of satisfying property that the JSON will only be valid if the keys are unique.

To do this, we need to create a dictionary in Python that we will represent in the JSON dump:

We can create Python dictionaries from a sequence of (key, value) tuples. Example from the docs :

>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'guido': 4127, 'jack': 4098}

We will use this constructor to create our indexed dictionary:

dictionaryEntries = [(row['ConnectionId'], row) for row in csvReader]
dictionaryToDump = dict(dictionaryEntries)

Putting it together

Your code now might look like this:

import csv
import json

with open('mycsv.csv') as csvFile:
  csvReader = csv.DictReader(csvFile)
  dictionaryEntries = [(row['ConnectionId'], row) for row in csvReader]

dictionaryToDump = dict(dictionaryEntries)

with open('myjson.json', 'w') as jsonFile:
    jsonFile.write(json.dumps(dictionaryToDump))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM