简体   繁体   English

具有多级列索引的 Pandas dropna

[英]Pandas dropna with multilevel column index

So I have got this Pandas DataFrame with multilevel index for the columns:所以我得到了这个带有列多级索引的 Pandas DataFrame:

   group1    group2    group3
   1    2    1    2    1    2
0  ...  ...  NaN  ...  ...  ...
1  NaN  ...  ...  ...  ...  ...
2  ...  ...  ...  ...  NaN  ...

Now i want to drop the rows where the columns group2 and group3 have NaN values.现在我想删除group2group3列具有 NaN 值的行。 Which equates to rows 0 and 2 in this instance.在这种情况下,这相当于第 0 行和第 2 行。

According to my understanding of the documentation this should work:根据我对文档的理解,这应该有效:

df.dropna(axis = 'rows', subset = ['group2', 'group3'])

But it does not.但事实并非如此。 Instead I get the error:相反,我收到错误:

KeyError: ['group2', 'group3']

Could someone please point out to me how to properly specify the subset?有人可以向我指出如何正确指定子集吗?

Kind regards, Rasmus亲切的问候,拉斯穆斯


Update更新

So it seems like .dropna() cannot work with mulitlevel column indexes.所以看起来 .dropna() 不能与多级列索引一起使用。 In the end I went with the less elegant, but workable method suggested, slightly rewritten:最后,我采用了建议的不太优雅但可行的方法,稍作改写:

mask_nan = df[['group2', 'group3']].isna().any(axis = 'columns')
df[~mask_nan]    # ~ to negate / flip the boolean values

Seems like we can not pass the index level in dropna , so we could do似乎我们无法通过dropna的索引level ,所以我们可以这样做

df.loc[:,['group2', 'group3']].isna().any(1)

Then然后

df=df[df.loc[:,['group2', 'group3']].isna().any(1)]

I think this is a similiar question to yours.我认为这是一个与您类似的问题

import numpy as np

df = df[np.isfinite(df['group2', 'group3'])]

Only the rows where the values are finite are taken into account here.此处仅考虑值有限的行。

Start from detail.从细节开始。 When you run:当你运行时:

idx = pd.IndexSlice
df.loc[:, idx['group2':'group3']]

You will get columns for group2 and group3 :您将获得group2group3 的列:

  group2     group3    
       1   2      1   2
0    NaN   3    4.0   5
1    8.0   9   10.0  11
2   14.0  15    NaN  17

Now a more compicated expession:现在一个更复杂的expession:

df.loc[:, idx['group2':'group3']].notnull().all(axis=1)

will display a boolean Series with True where all columns are not null:将显示一个带有True布尔系列,其中所有列都不为空:

0    False
1     True
2    False
dtype: bool

So the code that you need is to use the above code in boolean indexing :所以你需要的代码是在布尔索引中使用上面的代码:

df[df.loc[:, idx['group2':'group3']].notnull().all(axis=1)]

(+ idx = pd.IndexSlice before). (+ idx = pd.IndexSlice之前)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM