简体   繁体   中英

Pandas.dataframe adds an extra row when parsing dictionary

Pandas Version: 1.03 Python Version(s): 2.7.17, 3.7.3 Chromebook - Debian Buster

New to python but I could not find even a question about this behavior. I have an address I am receiving as JSON from a google API which I parse into a dictionary object and then write to a csv file after creating a pandas DataFrame. (I am not including the code that translates from JSON to dict but this is how it would be done if there were no conversion.)

add = {'street': 'Farm to Market 369', 'state': 'Texas', 'city': 'Iowa Park', 'county': 'Wichita County', 'country': 'United States', 'postal_code': '76367', 'neighborhood': None, 'sublocality': None, 'housenumber': None, 'postal_town': None, 'subpremise': None, 'latitude': 33.9738616, 'longitude': -98.5964961, 'location_type': 'ROOFTOP', 'postal_code_suffix': None, 'street_number': '2101'}

There are sixteen rows of data but the creation of the dataframe appears to be adding an empty key and a null value so the DataFrame contains 17 rows rather than the 16 I am expecting.

I am including a test file which just populates a dict with data and then passes the keys and values into pandas.df. Check out the table output.


#!/usr/bin/env python3
import pandas as pd
import dumper

def writeAddressCsv(unitName,add):
    #sv_file_path = dataDir+unitName+"_address.csv"

    print (dumper.dump(add))
    df=pd.DataFrame(add.values(),add.keys())
    print(df)
    exit(0)
    #try:
    #    export_csv = df.to_csv(csv_file_path)
    #except:
    #    print("failed to save  address to " + csv_file_path)


add = {"street": "Farm to Market 369", "state": "Texas", "city": "Iowa Park", "county": "Wichita County", "country": "United States", "postal_code": "76367", "neighborhood": None, "sublocality": None, "housenumber": None, "postal_town": None, "subpremise": None, "latitude": 33.9738616, "longitude": -98.5964961, "location_type": "ROOFTOP", "postal_code_suffix": None, "street_number": "2101"}

writeAddressCsv("foo",add)

                                     0 <-----------(null key and 'None' (null) value???)
street              Farm to Market 369
state                            Texas
city                         Iowa Park
county                  Wichita County
country                  United States
postal_code                      76367
neighborhood                      None
sublocality                       None
housenumber                       None
postal_town                       None
subpremise                        None
latitude                       33.9739
longitude                     -98.5965
location_type                  ROOFTOP
postal_code_suffix                None
street_number                     2101

That null key is not in the dict....or is it?

I thought I was doing something wrong when creating the dictionary so I just made a test that initializes two dict objects using both accepted methods, one empty and one in which I add data. Both report this strange 'None' in the dumper output which I would normally just assume was some sort of default behavior indicator (default for an empty column value or something) but pandas apparently sees it as a real column if my sleuthing has uncovered something that is at all important.

#!/usr/bin/env python3
import dumper


finaldict = dict()
finaldict2 = {"test": "foo","test2":"foo2"}


print ('finaldict is a: '  + str(type(finaldict)))
print ('finaldict2 is a: ' + str(type(finaldict2)))

print (dumper.dump(finaldict))
print (dumper.dump(finaldict2))

Here's the output: ( I am asking what object type because the dumper output looked to me like it was reporting the objects as strings - 'str at xxxx').


finaldict is a: <class 'dict'>
finaldict2 is a: <class 'dict'>
<str at 0x79ce5dcb58>: '{}'None <------- wtf mate?
<str at 0x79ce4acce8>: "{'test': 'foo', 'test2': 'foo2'}"None <-------- wtf mate?

Apparently this 'thing' is inherent to the dict object and pandas is just trying to do with it what it can. Does anyone know how I can prevent it without going back and removing the spurious line from my csv? (,0) after the dataframe contents have been output?

This acts the same way in Python 2.7.17 as it does in 3.7.3 so this doesn't seem to be an issue with python but with pandas.

PS.: I thought maybe pandas was picking up an extra row so to verify that the dict only has 16 rows, I added a call to dict.keys() and dict.values() to see if I was adding something to the dict that it was returning in one of these calls, but NO the dict seems to properly return keys and values. Pandas is creating 17!

Number of Keys: 16
dict_keys(['street', 'state', 'city', 'county', 'country', 'postal_code', 'neighborhood', 'sublocality', 'housenumber', 'postal_town', 'subpremise', 'latitude', 'longitude', 'location_type', 'postal_code_suffix', 'street_number'])
Number of values: 16
dict_values(['Farm to Market 369', 'Texas', 'Iowa Park', 'Wichita County', 'United States', '76367', None, None, None, None, None, 33.9738616, -98.5964961, 'ROOFTOP', None, '2101'])

PSS:

This may be related but there was no answer.

Pandas adding extra row to DataFrame when assigning index

Is this a pandas bug or am I doing something wrong?

TLDR: It is not a bug, what you see is a pd.Series name. All series have it, and since you didn't provide one, pandas automatically assigned it using autoincrement.

Both columns and rows in pd.DataFrame are pd.Series . You passed values and index to the constructor, but did not pass columns, thus the default name was used to name column series (ie autoincrement). You can specify column names manually, eg:

df=pd.DataFrame(add.values(), add.keys(), columns=['Address'])
# btw, I'm not sure if dict values and keys are guaranteed to be in the same order

Or, if you always parse one dict of single values, just make a Series:

s = pd.Series(add, name='Address')

If you check length of the dataframe, it will be the same as the dict length.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM