简体   繁体   中英

Empty dictionary when using Pandas to_dict()

I'm trying to create a JSON file from a CSV using Pandas. I have the following function but I'm running into a problem where the Incidents dictionary contains nothing.

import pandas as pd
import json

data = pd.read_csv('ufo-sightings.csv',sep = ',', delimiter = None,encoding='latin-1', dtype=str)

data_new = data.rename(columns = {
    'duration (seconds)' : 'Seconds',
    'duration (hours/min)' : 'Hours',
    'date posted' : 'DatePosted', 
    'city' : 'City', 
    'state' : 'State', 
    'country' : 'Country',
    'shape' : 'Shape', 
    'comments' : 'Comments',
    'latitude' : 'Latitude',
    'longitude ' : 'Longitude'
})

df = data_new[['City', 'State', 'Country', 'Shape', 'Seconds', 'Hours',
       'Comments', 'DatePosted', 'Latitude', 'Longitude']]

sightings = df[['Country']].drop_duplicates().sort_values(['Country'], ascending = [True])

def writeEfile(filename):

    file = open(filename,'w')
    rec = 'use UFO\n'
    file.write(rec)
    
    for r in thisfile[['Country']].itertuples(index = False):
        theserows = (df[(df['Country']==r)])
        print(type(r))
        print(type(theserows))
        
        agginfo = theserows[['State', 'City', 'Shape', 'Seconds', 'Hours', 'Comments', 'DatePosted', 'Latitude', 'Longitude']]

        entries = json.dumps({"Country" : r,
                              "Incidents": agginfo.to_dict('records')})
        
        rec = 'db.ufo_sightings.insert(' + entries + ')\n'
        file.write(rec)
    file.close()
    return()

filename = 'ufo_sightings.js'
thisfile = sightings
b = writeEfile(filename)

Apologies for the poor variable names and excessive use.

My goal is to create a JSON file with the following structure - db.ufo_sightings.insert({"Country": "us", "Incidents": [{"City": "New York City", "State": "New York"}, {"City": "LA", "State": "California"}... ]}) where if the City matches a City in the sightings dataframe, you put that incident in the correct country.

In the method, you are using thisfile which equals to sightings object. The sightings object just has Country column due to this line

sightings = df[['Country']].drop_duplicates().sort_values(['Country'], ascending = [True])

In the loop, (which could be simplified to for idx, row in thisfile.iterrows() ) you are accessing other columns which does not exist. Hence empty dict due to the line agginfo.to_dict()

If your goal is to drop duplicates in df and sort it by country, you can simply do

sightings = df.drop_duplicates(subset=['Country']).sort_values('Country', ascending=True)

Further edit, as you need more help.

So, for starters, dropping duplicates is a bad idea as you need all the other column values with same country names.

so here is a function I would define:

def create_json(df):
    for country in df["Country"].unique():
        allrows = df.loc[df["Country"] == country, ]
        incidents = []

        for _, row in allrows.iterrows():
            incidents.append({
                "City": str(row['City'])),
                "State": str(row['State'])),
                ### Similarly add all the other required fields.
            })

        print(json.dumps({
              "Country": str(country),
              "Incidents": incidents
            })
        )

And then call the function on data without removing duplicates, So:

create_json(df)

This would print all the json dumps. Just assign it to some object and perform your further function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM