I am trying to convert my CSV file to a JSON file format. When I do it, there is an extra entry in the JSON file which only contains field names.
I have tried using pandas, dictionary but can't seem to get to the result. Something or other comes.
I want to remove the extra filed names only entry at the start of the JSON. Also how can I make ConnectionId as key and the same format for a different output.
import csv, json
csvfile = open('/home/Desktop/PD/GEOSubscriberLocations_LTE_sample.csv', 'r')
jsonfile = open('/home/Desktop/PD/script5.json', 'w')
fieldnames = ("Confidence", "ConnectionId", "Imei", "Imsi", "IsData", "IsSignalling", "IsVoice", "Latitude", "Longitude",
"Mcc", "Mnc", "SegmentDuration", "SegmentStartTime", "ServingCellLabel", "Sv",
"TrackingAreaCode", "Uncertainity")
reader = csv.DictReader(csvfile , fieldnames)
code = ''
for row in reader:
for key in row:
row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')
json.dump(row, jsonfile, indent=4, sort_keys=False)
jsonfile.write('\n')
The actual result is:
{
"Confidence": "Confidence",
"IsData": "IsData",
"Latitude": "Latitude",
"ConnectionId": "ConnectionId",
"Mcc": "Mcc",
"Sv": "Sv",
"Longitude": "Longitude",
"Uncertainity": "Uncertainty",
"IsVoice": "IsVoice",
"IsSignalling": "IsSignalling",
"SegmentStartTime": "SegmentStartTime",
"Imei": "Imei",
"SegmentDuration": "SegmentDuration",
"Mnc": "Mnc",
"ServingCellLabel": "ServingCellLabel",
"Imsi": "Imsi",
"TrackingAreaCode": "TrackingAreaCode"
}
{
"Confidence": "1.994667E-07",
"IsData": "FALSE",
"Latitude": "1.694202",
"ConnectionId": "330708186825281",
"Mcc": "999",
"Sv": "01",
"Longitude": "0.434623",
"Uncertainity": "178",
"IsVoice": "FALSE",
"IsSignalling": "TRUE",
"SegmentStartTime": "16/02/2017 09:56:59.912",
"Imei": "99999006686069",
"SegmentDuration": "00:00:00.0350000",
"Mnc": "99",
"ServingCellLabel": "Cell18",
"Imsi": "999992223223602",
"TrackingAreaCode": "1234"
}
{
"Confidence": "1.504506E-12",
"IsData": "FALSE",
"Latitude": "1.633704",
"ConnectionId": "260339442647675",
"Mcc": "999",
"Sv": "02",
"Longitude": "0.668554",
"Uncertainity": "314",
"IsVoice": "FALSE",
"IsSignalling": "TRUE",
"SegmentStartTime": "16/02/2017 09:57:01.377",
"Imei": "99999207564306",
"SegmentDuration": "00:00:00.0280000",
"Mnc": "99",
"ServingCellLabel": "Cell19",
"Imsi": "999993793410366",
"TrackingAreaCode": "1235"
}
{
"Confidence": "0.3303348",
"IsData": "FALSE",
"Latitude": "1.847635",
"ConnectionId": "260339442647676",
"Mcc": "999",
"Sv": "14",
"Longitude": "1.356349",
"Uncertainity": "129",
"IsVoice": "FALSE",
"IsSignalling": "TRUE",
"SegmentStartTime": "16/02/2017 09:57:01.555",
"Imei": "99999605176135",
"SegmentDuration": "00:00:00.0290000",
"Mnc": "99",
"ServingCellLabel": "Cell13",
"Imsi": "999992216631694",
"TrackingAreaCode": "1236"
}
{
"Confidence": "0.01800376",
"IsData": "FALSE",
"Latitude": "1.914598",
"ConnectionId": "330708186825331",
"Mcc": "999",
"Sv": "74",
"Longitude": "1.222736",
"Uncertainity": "463",
"IsVoice": "FALSE",
"IsSignalling": "TRUE",
"SegmentStartTime": "16/02/2017 09:57:02.689",
"Imei": "99999007880884",
"SegmentDuration": "00:00:00.0260000",
"Mnc": "99",
"ServingCellLabel": "Cell7",
"Imsi": "999992226681236",
"TrackingAreaCode": "1237"
}
{
"Confidence": "0.2068138",
"IsData": "FALSE",
"Latitude": "1.850279",
"ConnectionId": "330708186825354",
"Mcc": "999",
"Sv": "13",
"Longitude": "1.349263",
"Uncertainity": "167",
"IsVoice": "FALSE",
"IsSignalling": "TRUE",
"SegmentStartTime": "16/02/2017 09:57:04.351",
"Imei": "99999002855874",
"SegmentDuration": "00:00:00.0300000",
"Mnc": "99",
"ServingCellLabel": "Cell15",
"Imsi": "999995430231562",
"TrackingAreaCode": "1238"
}
If using ConnectionId as key, I want my output like:
{
"ConnectionId": "189970698469977",
{
"Confidence": "0.01428183",
"Imei": "99999507405260",
"Imsi": "999992226504812",
"IsData": "FALSE",
"IsSignalling": "TRUE",
"IsVoice": "FALSE",
"Latitude": "1.848613",
"Longitude": "1.354355",
"Mcc": "999",
"Mnc": "99",
"SegmentDuration": "00:00:00.0860000",
"SegmentStartTime": "16/02/2017 09:57:00.053",
"ServingCellLabel": "Cell14",
"Sv": "06",
"TrackingAreaCode": "1256",
"Uncertainty": 662
}
Try replacing your for
loop with the following code:
arr = []
with open (csvFile) as f:
csvReader = csv.DictReader(f)
#print(csvReader)
for csvRow in csvReader:
arr.append(csvRow)
print(arr)
# write the data to a json file
with open(jsonFile, "w") as jsonFile:
jsonFile.write(json.dumps(arr, indent = 4))
Please refer to this link as well.
If you provide fieldnames explicitly, csv
will assume that the first row of the .csv file is data. If you leave out the fieldnames parameter, it will assume the first row of the .csv file is a header row with the field names:
The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.
It looks like your .csv file has a header row but you have also provided fieldnames explicitly so csv
has read in the header row as data. To just use the fieldnames in the header row change your call to DictReader to:
csv.DictReader(csvfile) # notice no fieldnames parameter
First consider how best to represent this in JSON and what you are trying to gain from indexing by this field, the example you gave isn't quite valid JSON.
{
"ConnectionId": "189970698469977",
{
"Confidence": "0.01428183",
"Imei": "99999507405260",
...
}
It's not valid because:
{
, indicating this is an 'object' Assuming you want to be able to quickly look up objects based on the connectionId, how about we create an object in JSON that looks like this:
{
"189970698469977": {
"Confidence": "0.01428183",
"Imei": "99999507405260",
...
},
"260339442647676": {
"Confidence": ...
},
...
}
This gives us the kind of satisfying property that the JSON will only be valid if the keys are unique.
To do this, we need to create a dictionary in Python that we will represent in the JSON dump:
We can create Python dictionaries from a sequence of (key, value)
tuples. Example from the docs :
>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'guido': 4127, 'jack': 4098}
We will use this constructor to create our indexed dictionary:
dictionaryEntries = [(row['ConnectionId'], row) for row in csvReader]
dictionaryToDump = dict(dictionaryEntries)
Your code now might look like this:
import csv
import json
with open('mycsv.csv') as csvFile:
csvReader = csv.DictReader(csvFile)
dictionaryEntries = [(row['ConnectionId'], row) for row in csvReader]
dictionaryToDump = dict(dictionaryEntries)
with open('myjson.json', 'w') as jsonFile:
jsonFile.write(json.dumps(dictionaryToDump))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.