简体   繁体   English

Pandas.dataframe 解析字典时增加一行

[英]Pandas.dataframe adds an extra row when parsing dictionary

Pandas Version: 1.03 Python Version(s): 2.7.17, 3.7.3 Chromebook - Debian Buster Pandas 版本:1.03 Python 版本:2.7.17、3.7.3 Chromebook - Debian Buster

New to python but I could not find even a question about this behavior. python 的新手,但我什至找不到关于这种行为的问题。 I have an address I am receiving as JSON from a google API which I parse into a dictionary object and then write to a csv file after creating a pandas DataFrame. I have an address I am receiving as JSON from a google API which I parse into a dictionary object and then write to a csv file after creating a pandas DataFrame. (I am not including the code that translates from JSON to dict but this is how it would be done if there were no conversion.) (我不包括从 JSON 转换为 dict 的代码,但如果没有转换,这将是如何完成的。)

add = {'street': 'Farm to Market 369', 'state': 'Texas', 'city': 'Iowa Park', 'county': 'Wichita County', 'country': 'United States', 'postal_code': '76367', 'neighborhood': None, 'sublocality': None, 'housenumber': None, 'postal_town': None, 'subpremise': None, 'latitude': 33.9738616, 'longitude': -98.5964961, 'location_type': 'ROOFTOP', 'postal_code_suffix': None, 'street_number': '2101'}

There are sixteen rows of data but the creation of the dataframe appears to be adding an empty key and a null value so the DataFrame contains 17 rows rather than the 16 I am expecting.有 16 行数据,但 dataframe 的创建似乎添加了一个空键和一个 null 值,因此 ZBA834BA059A9A379459C112175EB8164Z 包含 17 行而不是预期的。

I am including a test file which just populates a dict with data and then passes the keys and values into pandas.df.我包括一个测试文件,它只是用数据填充字典,然后将键和值传递到 pandas.df。 Check out the table output.查看表格 output。


#!/usr/bin/env python3
import pandas as pd
import dumper

def writeAddressCsv(unitName,add):
    #sv_file_path = dataDir+unitName+"_address.csv"

    print (dumper.dump(add))
    df=pd.DataFrame(add.values(),add.keys())
    print(df)
    exit(0)
    #try:
    #    export_csv = df.to_csv(csv_file_path)
    #except:
    #    print("failed to save  address to " + csv_file_path)


add = {"street": "Farm to Market 369", "state": "Texas", "city": "Iowa Park", "county": "Wichita County", "country": "United States", "postal_code": "76367", "neighborhood": None, "sublocality": None, "housenumber": None, "postal_town": None, "subpremise": None, "latitude": 33.9738616, "longitude": -98.5964961, "location_type": "ROOFTOP", "postal_code_suffix": None, "street_number": "2101"}

writeAddressCsv("foo",add)

                                     0 <-----------(null key and 'None' (null) value???)
street              Farm to Market 369
state                            Texas
city                         Iowa Park
county                  Wichita County
country                  United States
postal_code                      76367
neighborhood                      None
sublocality                       None
housenumber                       None
postal_town                       None
subpremise                        None
latitude                       33.9739
longitude                     -98.5965
location_type                  ROOFTOP
postal_code_suffix                None
street_number                     2101

That null key is not in the dict....or is it?那个 null 键不在字典中……或者是吗?

I thought I was doing something wrong when creating the dictionary so I just made a test that initializes two dict objects using both accepted methods, one empty and one in which I add data.我以为我在创建字典时做错了,所以我只是做了一个测试,使用两种接受的方法初始化两个 dict 对象,一个是空的,一个是我添加数据的。 Both report this strange 'None' in the dumper output which I would normally just assume was some sort of default behavior indicator (default for an empty column value or something) but pandas apparently sees it as a real column if my sleuthing has uncovered something that is at all important.两者都在自卸车 output 中报告了这个奇怪的“无”,我通常认为这是某种默认行为指示器(默认为空列值或其他值),但 pandas 显然将它视为一个真正的列,如果我的调查发现了一些非常重要。

#!/usr/bin/env python3
import dumper


finaldict = dict()
finaldict2 = {"test": "foo","test2":"foo2"}


print ('finaldict is a: '  + str(type(finaldict)))
print ('finaldict2 is a: ' + str(type(finaldict2)))

print (dumper.dump(finaldict))
print (dumper.dump(finaldict2))

Here's the output: ( I am asking what object type because the dumper output looked to me like it was reporting the objects as strings - 'str at xxxx').这是 output :(我在问什么 object 类型,因为自卸车 output 在我看来就像将对象报告为字符串 - 'str at xxxx')


finaldict is a: <class 'dict'>
finaldict2 is a: <class 'dict'>
<str at 0x79ce5dcb58>: '{}'None <------- wtf mate?
<str at 0x79ce4acce8>: "{'test': 'foo', 'test2': 'foo2'}"None <-------- wtf mate?

Apparently this 'thing' is inherent to the dict object and pandas is just trying to do with it what it can.显然,这个“东西”是字典 object 所固有的,而 pandas 只是想尽其所能。 Does anyone know how I can prevent it without going back and removing the spurious line from my csv?有谁知道如何在不返回并从 csv 中移除虚假线路的情况下防止它发生? (,0) after the dataframe contents have been output? (,0) 后 dataframe 的内容已经是 output?

This acts the same way in Python 2.7.17 as it does in 3.7.3 so this doesn't seem to be an issue with python but with pandas.这在 Python 2.7.17 中的作用与在 3.7.3 中的作用相同,因此这似乎不是 python 的问题,而是 Z3A43B4F88325D94022C0EFAAZC2FA2F5

PS.: I thought maybe pandas was picking up an extra row so to verify that the dict only has 16 rows, I added a call to dict.keys() and dict.values() to see if I was adding something to the dict that it was returning in one of these calls, but NO the dict seems to properly return keys and values. PS.:我想也许 pandas 正在拾取额外的行,以便验证字典只有 16 行,我添加了对 dict.keys() 和 dict.values() 的调用,以查看我是否在 dict 中添加了一些东西它在其中一个调用中返回,但没有,dict 似乎正确返回键和值。 Pandas is creating 17! Pandas 正在创造 17!

Number of Keys: 16
dict_keys(['street', 'state', 'city', 'county', 'country', 'postal_code', 'neighborhood', 'sublocality', 'housenumber', 'postal_town', 'subpremise', 'latitude', 'longitude', 'location_type', 'postal_code_suffix', 'street_number'])
Number of values: 16
dict_values(['Farm to Market 369', 'Texas', 'Iowa Park', 'Wichita County', 'United States', '76367', None, None, None, None, None, 33.9738616, -98.5964961, 'ROOFTOP', None, '2101'])

PSS:附言:

This may be related but there was no answer.这可能是相关的,但没有答案。

Pandas adding extra row to DataFrame when assigning index Pandas 在分配索引时向 DataFrame 添加额外的行

Is this a pandas bug or am I doing something wrong?这是 pandas 错误还是我做错了什么?

TLDR: It is not a bug, what you see is a pd.Series name. TLDR:这不是错误,您看到的是 pd.Series 名称。 All series have it, and since you didn't provide one, pandas automatically assigned it using autoincrement.所有系列都有它,由于您没有提供它,因此 pandas 使用自动增量自动分配它。

Both columns and rows in pd.DataFrame are pd.Series . pd.Series pd.DataFrame You passed values and index to the constructor, but did not pass columns, thus the default name was used to name column series (ie autoincrement).您将值和索引传递给构造函数,但没有传递列,因此使用默认名称来命名列系列(即自动增量)。 You can specify column names manually, eg:您可以手动指定列名,例如:

df=pd.DataFrame(add.values(), add.keys(), columns=['Address'])
# btw, I'm not sure if dict values and keys are guaranteed to be in the same order

Or, if you always parse one dict of single values, just make a Series:或者,如果您总是解析单个值的一个字典,只需制作一个系列:

s = pd.Series(add, name='Address')

If you check length of the dataframe, it will be the same as the dict length.如果您检查 dataframe 的长度,它将与 dict 长度相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM