Pandas Version: 1.03 Python Version(s): 2.7.17, 3.7.3 Chromebook - Debian Buster
New to python but I could not find even a question about this behavior. I have an address I am receiving as JSON from a google API which I parse into a dictionary object and then write to a csv file after creating a pandas DataFrame. (I am not including the code that translates from JSON to dict but this is how it would be done if there were no conversion.)
add = {'street': 'Farm to Market 369', 'state': 'Texas', 'city': 'Iowa Park', 'county': 'Wichita County', 'country': 'United States', 'postal_code': '76367', 'neighborhood': None, 'sublocality': None, 'housenumber': None, 'postal_town': None, 'subpremise': None, 'latitude': 33.9738616, 'longitude': -98.5964961, 'location_type': 'ROOFTOP', 'postal_code_suffix': None, 'street_number': '2101'}
There are sixteen rows of data but the creation of the dataframe appears to be adding an empty key and a null value so the DataFrame contains 17 rows rather than the 16 I am expecting.
I am including a test file which just populates a dict with data and then passes the keys and values into pandas.df. Check out the table output.
#!/usr/bin/env python3
import pandas as pd
import dumper
def writeAddressCsv(unitName,add):
#sv_file_path = dataDir+unitName+"_address.csv"
print (dumper.dump(add))
df=pd.DataFrame(add.values(),add.keys())
print(df)
exit(0)
#try:
# export_csv = df.to_csv(csv_file_path)
#except:
# print("failed to save address to " + csv_file_path)
add = {"street": "Farm to Market 369", "state": "Texas", "city": "Iowa Park", "county": "Wichita County", "country": "United States", "postal_code": "76367", "neighborhood": None, "sublocality": None, "housenumber": None, "postal_town": None, "subpremise": None, "latitude": 33.9738616, "longitude": -98.5964961, "location_type": "ROOFTOP", "postal_code_suffix": None, "street_number": "2101"}
writeAddressCsv("foo",add)
0 <-----------(null key and 'None' (null) value???)
street Farm to Market 369
state Texas
city Iowa Park
county Wichita County
country United States
postal_code 76367
neighborhood None
sublocality None
housenumber None
postal_town None
subpremise None
latitude 33.9739
longitude -98.5965
location_type ROOFTOP
postal_code_suffix None
street_number 2101
That null key is not in the dict....or is it?
I thought I was doing something wrong when creating the dictionary so I just made a test that initializes two dict objects using both accepted methods, one empty and one in which I add data. Both report this strange 'None' in the dumper output which I would normally just assume was some sort of default behavior indicator (default for an empty column value or something) but pandas apparently sees it as a real column if my sleuthing has uncovered something that is at all important.
#!/usr/bin/env python3
import dumper
finaldict = dict()
finaldict2 = {"test": "foo","test2":"foo2"}
print ('finaldict is a: ' + str(type(finaldict)))
print ('finaldict2 is a: ' + str(type(finaldict2)))
print (dumper.dump(finaldict))
print (dumper.dump(finaldict2))
Here's the output: ( I am asking what object type because the dumper output looked to me like it was reporting the objects as strings - 'str at xxxx').
finaldict is a: <class 'dict'>
finaldict2 is a: <class 'dict'>
<str at 0x79ce5dcb58>: '{}'None <------- wtf mate?
<str at 0x79ce4acce8>: "{'test': 'foo', 'test2': 'foo2'}"None <-------- wtf mate?
Apparently this 'thing' is inherent to the dict object and pandas is just trying to do with it what it can. Does anyone know how I can prevent it without going back and removing the spurious line from my csv? (,0) after the dataframe contents have been output?
This acts the same way in Python 2.7.17 as it does in 3.7.3 so this doesn't seem to be an issue with python but with pandas.
PS.: I thought maybe pandas was picking up an extra row so to verify that the dict only has 16 rows, I added a call to dict.keys() and dict.values() to see if I was adding something to the dict that it was returning in one of these calls, but NO the dict seems to properly return keys and values. Pandas is creating 17!
Number of Keys: 16
dict_keys(['street', 'state', 'city', 'county', 'country', 'postal_code', 'neighborhood', 'sublocality', 'housenumber', 'postal_town', 'subpremise', 'latitude', 'longitude', 'location_type', 'postal_code_suffix', 'street_number'])
Number of values: 16
dict_values(['Farm to Market 369', 'Texas', 'Iowa Park', 'Wichita County', 'United States', '76367', None, None, None, None, None, 33.9738616, -98.5964961, 'ROOFTOP', None, '2101'])
PSS:
This may be related but there was no answer.
Pandas adding extra row to DataFrame when assigning index
Is this a pandas bug or am I doing something wrong?
TLDR: It is not a bug, what you see is a pd.Series name. All series have it, and since you didn't provide one, pandas automatically assigned it using autoincrement.
Both columns and rows in pd.DataFrame
are pd.Series
. You passed values and index to the constructor, but did not pass columns, thus the default name was used to name column series (ie autoincrement). You can specify column names manually, eg:
df=pd.DataFrame(add.values(), add.keys(), columns=['Address'])
# btw, I'm not sure if dict values and keys are guaranteed to be in the same order
Or, if you always parse one dict of single values, just make a Series:
s = pd.Series(add, name='Address')
If you check length of the dataframe, it will be the same as the dict length.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.