简体   繁体   English

用空数据过滤熊猫数据框列

[英]filter pandas dataframe columns with null data

I have a pandas dataframe with 200+ columns. 我有一个200列以上的pandas数据框。 I'm trying to inspect all the columns with null data. 我正在尝试检查具有空数据的所有列。 How can I filter/display the columns which have null data? 如何过滤/显示具有空数据的列? df.isnull().sum() lists count of all columns, but I want to see only columns with non-zero null data count as the number of columns is high. df.isnull()。sum()列出所有列的计数,但是我想仅查看具有非零空数据计数的列,因为列数很高。

Newer Pandas versions have new methods DataFrame.isna() and DataFrame.notna() 较新的Pandas版本具有新方法DataFrame.isna()DataFrame.notna()

1) Using DataFrame.isna() method ! 1)使用DataFrame.isna()方法!

>>> df
    A     B     C     D  E      F
0   0   1.0   2.0     3  4    one
1   3   5.0   NaN   NaT  5    two
2   8   NaN  10.0  None  6  three
3  11  12.0  13.0   NaT  7   four

To get Just the List of Columns which are null values: 要获取仅空值列列表:

>>> df.columns[df.isna().any()].tolist()
['B', 'C', 'D']

To list down all the columns which are having nan values. 列出所有具有nan值的列。

>>> df.loc[:, df.isna().any()]
      B     C     D
0   1.0   2.0     3
1   5.0   NaN   NaT
2   NaN  10.0  None
3  12.0  13.0   NaT

2) Using DataFrame.isnull() method ! 2)使用DataFrame.isnull()方法!

To get Just the List of Columns which are null values, returns type is boolean. 要获取仅是空值的列列表,返回类型为布尔值。

>>> df.isnull().any()
A    False
B     True
C     True
D     True
E    False
F    False
dtype: bool

To get Just the List of Columns which are null having values: 要获取仅包含值的null列列表:

>>> df.columns[df.isnull().any()].tolist()
['B', 'C', 'D']

To select a subset - all columns containing at least one NaN value: 要选择一个子集-所有列至少包含一个NaN值:

>>> df.loc[:, df.isnull().any()]
      B     C     D
0   1.0   2.0     3
1   5.0   NaN   NaT
2   NaN  10.0  None
3  12.0  13.0   NaT

If you want to count the missing values in each column: 如果要计算每列中的缺失值:

>>> df.isnull().sum()
A    0
B    1
C    1
D    3
E    0
F    0
dtype: int64

OR 要么

>>> df.isnull().sum(axis=0)  # axis=0 , across the columns
A    0
B    1
C    1
D    3
E    0
F    0

# >>> df.isnull().sum(axis=1)  # across the rows

Finally, to get the total number of NaN & non NaN values in the DataFrame: 最后,要获取DataFrame中的NaN和非NaN值的总数:

Nan value counts Nan值计数

>>> df.isnull().sum().sum()

Non NaN value count 非NaN值计数

>>> df.notnull().sum().sum()

Once you've got the counts, just filter on the entries greater than zero: 一旦获得计数,就可以过滤大于零的条目:

null_counts = df.isnull().sum()
null_counts[null_counts > 0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM