[英]filter pandas dataframe columns with null data
I have a pandas dataframe with 200+ columns. 我有一个200列以上的pandas数据框。 I'm trying to inspect all the columns with null data.
我正在尝试检查具有空数据的所有列。 How can I filter/display the columns which have null data?
如何过滤/显示具有空数据的列? df.isnull().sum() lists count of all columns, but I want to see only columns with non-zero null data count as the number of columns is high.
df.isnull()。sum()列出所有列的计数,但是我想仅查看具有非零空数据计数的列,因为列数很高。
Newer Pandas versions have new methods DataFrame.isna() and DataFrame.notna() 较新的Pandas版本具有新方法DataFrame.isna()和DataFrame.notna()
1) Using DataFrame.isna()
method ! 1)使用
DataFrame.isna()
方法!
>>> df
A B C D E F
0 0 1.0 2.0 3 4 one
1 3 5.0 NaN NaT 5 two
2 8 NaN 10.0 None 6 three
3 11 12.0 13.0 NaT 7 four
To get Just the List of Columns which are null values: 要获取仅空值列列表:
>>> df.columns[df.isna().any()].tolist()
['B', 'C', 'D']
To list down all the columns which are having nan values. 列出所有具有nan值的列。
>>> df.loc[:, df.isna().any()]
B C D
0 1.0 2.0 3
1 5.0 NaN NaT
2 NaN 10.0 None
3 12.0 13.0 NaT
2) Using DataFrame.isnull()
method ! 2)使用
DataFrame.isnull()
方法!
To get Just the List of Columns which are null values, returns type is boolean. 要获取仅是空值的列列表,返回类型为布尔值。
>>> df.isnull().any()
A False
B True
C True
D True
E False
F False
dtype: bool
To get Just the List of Columns which are null having values: 要获取仅包含值的null列列表:
>>> df.columns[df.isnull().any()].tolist()
['B', 'C', 'D']
To select a subset - all columns containing at least one NaN
value: 要选择一个子集-所有列至少包含一个
NaN
值:
>>> df.loc[:, df.isnull().any()]
B C D
0 1.0 2.0 3
1 5.0 NaN NaT
2 NaN 10.0 None
3 12.0 13.0 NaT
If you want to count the missing values in each column: 如果要计算每列中的缺失值:
>>> df.isnull().sum()
A 0
B 1
C 1
D 3
E 0
F 0
dtype: int64
OR 要么
>>> df.isnull().sum(axis=0) # axis=0 , across the columns
A 0
B 1
C 1
D 3
E 0
F 0
# >>> df.isnull().sum(axis=1) # across the rows
Finally, to get the total number of NaN & non NaN values in the DataFrame: 最后,要获取DataFrame中的NaN和非NaN值的总数:
Nan value counts Nan值计数
>>> df.isnull().sum().sum()
Non NaN value count 非NaN值计数
>>> df.notnull().sum().sum()
Once you've got the counts, just filter on the entries greater than zero: 一旦获得计数,就可以过滤大于零的条目:
null_counts = df.isnull().sum()
null_counts[null_counts > 0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.