[英]Delete rows where a level of multindex is nan (Pandas)
I have the following table with a multindex 我有一个带有multindex的下表
value
userid date
NaN 2014-06-12 42799
2014-06-13 47673
2014-06-14 47042
2014-06-15 48079
2014-06-16 44873
2014-06-17 46586
2014-06-18 44575
1000000021 2014-06-17 0
1000000024 2014-06-22 20
1000000043 2014-06-12 14
2014-06-14 22
.
.
.
.
I would like to drop the row where the userid is Nan. 我想删除用户标识为Nan的行。 If I wanted to drop another row I could do
如果我想再放一行,我可以做
data = data.drop(1000000021)
but 但
data = data.drop('NaN')
data = data.drop(np.nan)
and other attempts all return errors of differing varieties. 其他尝试都会返回不同品种的错误。 Is there a way to drop the row without having to reindex?
有没有一种方法可以删除行而无需重新索引?
You could identify the rows whose index is NaN using df.index.labels[0] == -1
, and select the other rows using df.loc
: 您可以使用
df.index.labels[0] == -1
标识索引为NaN的行,并使用df.loc
选择其他行:
In [48]: df.loc[~(df.index.labels[0] == -1)]
Out[48]:
value
userid date
1000000021 2014-06-17 0
1000000024 2014-06-22 20
1000000043 2014-06-12 14
2014-06-14 22
When using a boolean index, df[...]
, df.loc[...]
, and df.iloc[...]
all behave the same. 使用布尔索引时,
df[...]
, df.loc[...]
和df.iloc[...]
行为均相同。 df[...]
is commonly used to select columns, however, so you might want to avoid using df[...]
for also selecting rows, as done above. df[...]
通常用于选择列,因此,如上所述,您可能要避免使用df[...]
来选择行。 That leaves df.loc
and df.iloc
as viable choices. 这使
df.loc
和df.iloc
成为可行的选择。 Since df.iloc
was created mainly for selecting by integer index, you might want to use df.loc[...]
for selecting by label and by boolean mask. 由于
df.iloc
主要是为了按整数索引选择而创建的,因此您可能想使用df.loc[...]
来按标签和布尔掩码进行选择。 But this is just my convention -- Pandas allows all three. 但这只是我的约定-熊猫允许所有这三个。
Easier to reset and drop from the frame, then set the index. 更容易重置并从框架中掉落,然后设置索引。
In [3]: df = DataFrame(np.random.randint(0,10,size=16).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[np.nan,1,2,3],pd.date_range('20130101',periods=4)],names=['first','second']))
In [4]: df
Out[4]:
value
first second
NaN 2013-01-01 0
2013-01-02 2
2013-01-03 9
2013-01-04 3
1 2013-01-01 8
2013-01-02 8
2013-01-03 5
2013-01-04 3
2 2013-01-01 4
2013-01-02 1
2013-01-03 2
2013-01-04 7
3 2013-01-01 3
2013-01-02 9
2013-01-03 3
2013-01-04 4
In [5]: df.reset_index().dropna(subset=['first']).set_index(['first','second'])
Out[5]:
value
first second
1 2013-01-01 8
2013-01-02 8
2013-01-03 5
2013-01-04 3
2 2013-01-01 4
2013-01-02 1
2013-01-03 2
2013-01-04 7
3 2013-01-01 3
2013-01-02 9
2013-01-03 3
2013-01-04 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.