[英]Why large pandas dataframe shows only NaN values after I drop selected rows?
Using the pandas library v. 17.1, I am trying to remove the rows from a large (882504 rows) dataframe named productDataNat
where parName
=='rt', but then all the other rows become NaN
: 使用pandas库v.17.1,我试图从名为productDataNat
的大型(882504行)数据框中删除行,其中parName
=='rt',但随后所有其他行变为NaN
:
productDataNat = pd.read_csv('https://lobianco.org/temp/productData_P0-Mi-Ei.csv',sep=';', dtype={'value': np.float64})
productDataNat = productDataNat.drop(['Unnamed: 8'],axis=1)
productDataNat.set_index(['scen','country','region','prod','freeDim','year','parName'], inplace=True)
productDataNat.head()
productDataNat.drop('rt', level='parName', axis=0)
When instead I play with an example dataframe it works as expected: 相反,当我使用示例数据帧时,它按预期工作:
midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,1,0],[1,0,1,0]])
dfmix = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
dfmix
dfmix.drop('x',level=1,axis=0)
Bug in pandas or something wrong (what?) with my dataframe ? 我的数据框架中有大熊猫的错误或错误(什么?)?
Correction: It works exactly the same for me (i'm using Pandas v0.18.0): 更正:它对我来说完全相同(我正在使用Pandas v0.18.0):
In [4]: df.drop('rt', level='parName', axis=0)
Out[4]:
value
scen country region prod freeDim year parName
P0-Mi-Ei 11000 11042 hardWRoundW NaN 2005 dl NaN
softWRoundW NaN 2005 dl NaN
pulpWFuelW NaN 2005 dl NaN
ashRoundW NaN 2005 dl NaN
fuelW NaN 2005 dl NaN
hardWSawnW NaN 2005 dl NaN
softWSawnW NaN 2005 dl NaN
plyW NaN 2005 dl NaN
pulpW NaN 2005 dl NaN
pannels NaN 2005 dl NaN
ashSawnW NaN 2005 dl NaN
ashPlyW NaN 2005 dl NaN
11061 hardWRoundW NaN 2005 dl NaN
softWRoundW NaN 2005 dl NaN
pulpWFuelW NaN 2005 dl NaN
ashRoundW NaN 2005 dl NaN
fuelW NaN 2005 dl NaN
hardWSawnW NaN 2005 dl NaN
softWSawnW NaN 2005 dl NaN
plyW NaN 2005 dl NaN
pulpW NaN 2005 dl NaN
pannels NaN 2005 dl NaN
ashSawnW NaN 2005 dl NaN
ashPlyW NaN 2005 dl NaN
11072 hardWRoundW NaN 2005 dl NaN
softWRoundW NaN 2005 dl NaN
pulpWFuelW NaN 2005 dl NaN
ashRoundW NaN 2005 dl NaN
fuelW NaN 2005 dl NaN
hardWSawnW NaN 2005 dl NaN
as a workaround you can get rid of rt
s before setting multiindex: 作为解决方法,您可以在设置multiindex之前摆脱rt
:
cols = 'scen;parName;country;region;prod;freeDim;year;value'.split(';')
url = 'https://lobianco.org/temp/productData_P0-Mi-Ei.csv'
productDataNat = pd.read_csv(url, sep=';', dtype={'value': np.float64}, usecols=cols)
df = productDataNat.ix[productDataNat.parName != 'rt']
df.set_index(['scen','country','region','prod','freeDim','year','parName'], inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.