简体   繁体   English

在删除选定的行后,为什么大熊猫数据框仅显示NaN值?

[英]Why large pandas dataframe shows only NaN values after I drop selected rows?

Using the pandas library v. 17.1, I am trying to remove the rows from a large (882504 rows) dataframe named productDataNat where parName =='rt', but then all the other rows become NaN : 使用pandas库v.17.1,我试图从名为productDataNat的大型(882504行)数据框中删除行,其中parName =='rt',但随后所有其他行变为NaN

productDataNat = pd.read_csv('https://lobianco.org/temp/productData_P0-Mi-Ei.csv',sep=';',  dtype={'value': np.float64})
productDataNat = productDataNat.drop(['Unnamed: 8'],axis=1)
productDataNat.set_index(['scen','country','region','prod','freeDim','year','parName'], inplace=True)
productDataNat.head()

OUT1

productDataNat.drop('rt', level='parName', axis=0)

在此输入图像描述

When instead I play with an example dataframe it works as expected: 相反,当我使用示例数据帧时,它按预期工作:

midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,1,0],[1,0,1,0]])
dfmix = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
dfmix

在此输入图像描述

dfmix.drop('x',level=1,axis=0)

在此输入图像描述

Bug in pandas or something wrong (what?) with my dataframe ? 我的数据框架中有大熊猫的错误​​或错误(什么?)?

Correction: It works exactly the same for me (i'm using Pandas v0.18.0): 更正:它对我来说完全相同(我正在使用Pandas v0.18.0):

In [4]: df.drop('rt', level='parName', axis=0)
Out[4]:
                                                          value
scen     country region prod        freeDim year parName
P0-Mi-Ei 11000   11042  hardWRoundW NaN     2005 dl         NaN
                        softWRoundW NaN     2005 dl         NaN
                        pulpWFuelW  NaN     2005 dl         NaN
                        ashRoundW   NaN     2005 dl         NaN
                        fuelW       NaN     2005 dl         NaN
                        hardWSawnW  NaN     2005 dl         NaN
                        softWSawnW  NaN     2005 dl         NaN
                        plyW        NaN     2005 dl         NaN
                        pulpW       NaN     2005 dl         NaN
                        pannels     NaN     2005 dl         NaN
                        ashSawnW    NaN     2005 dl         NaN
                        ashPlyW     NaN     2005 dl         NaN
                 11061  hardWRoundW NaN     2005 dl         NaN
                        softWRoundW NaN     2005 dl         NaN
                        pulpWFuelW  NaN     2005 dl         NaN
                        ashRoundW   NaN     2005 dl         NaN
                        fuelW       NaN     2005 dl         NaN
                        hardWSawnW  NaN     2005 dl         NaN
                        softWSawnW  NaN     2005 dl         NaN
                        plyW        NaN     2005 dl         NaN
                        pulpW       NaN     2005 dl         NaN
                        pannels     NaN     2005 dl         NaN
                        ashSawnW    NaN     2005 dl         NaN
                        ashPlyW     NaN     2005 dl         NaN
                 11072  hardWRoundW NaN     2005 dl         NaN
                        softWRoundW NaN     2005 dl         NaN
                        pulpWFuelW  NaN     2005 dl         NaN
                        ashRoundW   NaN     2005 dl         NaN
                        fuelW       NaN     2005 dl         NaN
                        hardWSawnW  NaN     2005 dl         NaN

as a workaround you can get rid of rt s before setting multiindex: 作为解决方法,您可以在设置multiindex之前摆脱rt

cols = 'scen;parName;country;region;prod;freeDim;year;value'.split(';')
url = 'https://lobianco.org/temp/productData_P0-Mi-Ei.csv'
productDataNat = pd.read_csv(url, sep=';', dtype={'value': np.float64}, usecols=cols)

df = productDataNat.ix[productDataNat.parName != 'rt']

df.set_index(['scen','country','region','prod','freeDim','year','parName'], inplace=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM