简体   繁体   English

删除多指标为nan(Pandas)的行

[英]Delete rows where a level of multindex is nan (Pandas)

I have the following table with a multindex 我有一个带有multindex的下表

                       value
userid     date

NaN        2014-06-12   42799
           2014-06-13   47673
           2014-06-14   47042
           2014-06-15   48079
           2014-06-16   44873
           2014-06-17   46586
           2014-06-18   44575
1000000021 2014-06-17   0
1000000024 2014-06-22   20
1000000043 2014-06-12   14
           2014-06-14   22
          .
          .
          .
          .

I would like to drop the row where the userid is Nan. 我想删除用户标识为Nan的行。 If I wanted to drop another row I could do 如果我想再放一行,我可以做

data = data.drop(1000000021)

but

data = data.drop('NaN')
data = data.drop(np.nan)

and other attempts all return errors of differing varieties. 其他尝试都会返回不同品种的错误。 Is there a way to drop the row without having to reindex? 有没有一种方法可以删除行而无需重新索引?

You could identify the rows whose index is NaN using df.index.labels[0] == -1 , and select the other rows using df.loc : 您可以使用df.index.labels[0] == -1标识索引为NaN的行,并使用df.loc选择其他行:

In [48]: df.loc[~(df.index.labels[0] == -1)]
Out[48]: 
                       value
userid     date             
1000000021 2014-06-17      0
1000000024 2014-06-22     20
1000000043 2014-06-12     14
           2014-06-14     22

When using a boolean index, df[...] , df.loc[...] , and df.iloc[...] all behave the same. 使用布尔索引时, df[...]df.loc[...]df.iloc[...]行为均相同。 df[...] is commonly used to select columns, however, so you might want to avoid using df[...] for also selecting rows, as done above. df[...]通常用于选择列,因此,如上所述,您可能要避免使用df[...]来选择行。 That leaves df.loc and df.iloc as viable choices. 这使df.locdf.iloc成为可行的选择。 Since df.iloc was created mainly for selecting by integer index, you might want to use df.loc[...] for selecting by label and by boolean mask. 由于df.iloc主要是为了按整数索引选择而创建的,因此您可能想使用df.loc[...]来按标签布尔掩码进行选择。 But this is just my convention -- Pandas allows all three. 但这只是我的约定-熊猫允许所有这三个。

Easier to reset and drop from the frame, then set the index. 更容易重置并从框架中掉落,然后设置索引。

In [3]: df =  DataFrame(np.random.randint(0,10,size=16).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[np.nan,1,2,3],pd.date_range('20130101',periods=4)],names=['first','second']))

In [4]: df
Out[4]: 
                  value
first second           
NaN   2013-01-01      0
      2013-01-02      2
      2013-01-03      9
      2013-01-04      3
1     2013-01-01      8
      2013-01-02      8
      2013-01-03      5
      2013-01-04      3
2     2013-01-01      4
      2013-01-02      1
      2013-01-03      2
      2013-01-04      7
3     2013-01-01      3
      2013-01-02      9
      2013-01-03      3
      2013-01-04      4

In [5]: df.reset_index().dropna(subset=['first']).set_index(['first','second'])
Out[5]: 
                  value
first second           
1     2013-01-01      8
      2013-01-02      8
      2013-01-03      5
      2013-01-04      3
2     2013-01-01      4
      2013-01-02      1
      2013-01-03      2
      2013-01-04      7
3     2013-01-01      3
      2013-01-02      9
      2013-01-03      3
      2013-01-04      4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM