Empty dictionary when using Pandas to_dict()

Question

I'm trying to create a JSON file from a CSV using Pandas. I have the following function but I'm running into a problem where the Incidents dictionary contains nothing.

import pandas as pd
import json

data = pd.read_csv('ufo-sightings.csv',sep = ',', delimiter = None,encoding='latin-1', dtype=str)

data_new = data.rename(columns = {
    'duration (seconds)' : 'Seconds',
    'duration (hours/min)' : 'Hours',
    'date posted' : 'DatePosted', 
    'city' : 'City', 
    'state' : 'State', 
    'country' : 'Country',
    'shape' : 'Shape', 
    'comments' : 'Comments',
    'latitude' : 'Latitude',
    'longitude ' : 'Longitude'
})

df = data_new[['City', 'State', 'Country', 'Shape', 'Seconds', 'Hours',
       'Comments', 'DatePosted', 'Latitude', 'Longitude']]

sightings = df[['Country']].drop_duplicates().sort_values(['Country'], ascending = [True])

def writeEfile(filename):

    file = open(filename,'w')
    rec = 'use UFO\n'
    file.write(rec)
    
    for r in thisfile[['Country']].itertuples(index = False):
        theserows = (df[(df['Country']==r)])
        print(type(r))
        print(type(theserows))
        
        agginfo = theserows[['State', 'City', 'Shape', 'Seconds', 'Hours', 'Comments', 'DatePosted', 'Latitude', 'Longitude']]

        entries = json.dumps({"Country" : r,
                              "Incidents": agginfo.to_dict('records')})
        
        rec = 'db.ufo_sightings.insert(' + entries + ')\n'
        file.write(rec)
    file.close()
    return()

filename = 'ufo_sightings.js'
thisfile = sightings
b = writeEfile(filename)

Apologies for the poor variable names and excessive use.

My goal is to create a JSON file with the following structure - db.ufo_sightings.insert({"Country": "us", "Incidents": [{"City": "New York City", "State": "New York"}, {"City": "LA", "State": "California"}... ]}) where if the City matches a City in the sightings dataframe, you put that incident in the correct country.

Answer 1

In the method, you are using thisfile which equals to sightings object. The sightings object just has Country column due to this line

sightings = df[['Country']].drop_duplicates().sort_values(['Country'], ascending = [True])

In the loop, (which could be simplified to for idx, row in thisfile.iterrows() ) you are accessing other columns which does not exist. Hence empty dict due to the line agginfo.to_dict()

If your goal is to drop duplicates in df and sort it by country, you can simply do

sightings = df.drop_duplicates(subset=['Country']).sort_values('Country', ascending=True)

Further edit, as you need more help.

So, for starters, dropping duplicates is a bad idea as you need all the other column values with same country names.

so here is a function I would define:

def create_json(df):
    for country in df["Country"].unique():
        allrows = df.loc[df["Country"] == country, ]
        incidents = []

        for _, row in allrows.iterrows():
            incidents.append({
                "City": str(row['City'])),
                "State": str(row['State'])),
                ### Similarly add all the other required fields.
            })

        print(json.dumps({
              "Country": str(country),
              "Incidents": incidents
            })
        )

And then call the function on data without removing duplicates, So:

create_json(df)

This would print all the json dumps. Just assign it to some object and perform your further function

Empty dictionary when using Pandas to_dict()

Question

1 answers

solution1
0 ACCPTED 2020-12-04 20:17:09

Empty dictionary when using Pandas to_dict()

Question

1 answers

solution1 0 ACCPTED 2020-12-04 20:17:09

solution1
0 ACCPTED 2020-12-04 20:17:09