简体   繁体   English

Python Pandas:检查行值中的所有列是否为 NaN

[英]Python Pandas: Check if all columns in rows value is NaN

Kindly accept my apologies if my question has already been answered.如果我的问题已经得到解答,请接受我的道歉。 I tried to find a solution but all I can find is to dropna solution for all NaN's in a dataframe.我试图找到一个解决方案,但我只能找到 dataframe 中所有 NaN 的 dropna 解决方案。 My question is that I have a dataframe with 6 columns and 500 rows.我的问题是我有一个 6 列和 500 行的 dataframe。 I need to check if in any particular row all the values are NaN so that I can drop them from my dataset.我需要检查任何特定行中的所有值是否都是 NaN,以便我可以将它们从我的数据集中删除。 Example below row 2, 6 & 7 contains all Nan from col1 to col6:下面第 2、6 和 7 行的示例包含从 col1 到 col6 的所有 Nan:

    Col1    Col2    Col3    Col4    Col5    Col6
    12      25      02      78      88      90
    Nan     Nan     Nan     Nan     Nan     Nan
    Nan     35      03      11      65      53
    Nan     Nan     Nan     Nan     22      21
    Nan     15      93      111     165     153
    Nan     Nan     Nan     Nan     Nan     Nan
    Nan     Nan     Nan     Nan     Nan     Nan
    141     121     Nan     Nan     Nan     Nan

Please note that top row is just headings and from 2nd row on wards my data starts.请注意,第一行只是标题,从第二行开始,我的数据开始了。 Will be grateful if anyone can help me in right direction to solve this puzzle.如果有人能在正确的方向上帮助我解决这个难题,我将不胜感激。

And also my 2nd question is that after deleting all Nan in all columns if I want to delete the rows where 4 or 5 columns data is missing then what will be the best solution.而且我的第二个问题是,在删除所有列中的所有 Nan 之后,如果我想删除缺少 4 或 5 列数据的行,那么最好的解决方案是什么。

and last question is, is it possible after deleting the rows with most Nan's then how can I create box plot on the remaining for example 450 rows?最后一个问题是,是否有可能在删除大多数 Nan's 的行之后,我如何在剩余的例如 450 行上创建框 plot?

Any response will be highly appreciated.任何回应将不胜感激。

Regards,问候,

For those search because wish to know on the question title: 对于那些搜索,因为希望知道问题标题:

Check if all columns in rows value is NaN 检查行中的所有列值是否为NaN

A simple approach would be: 一个简单的方法是:

df[[list_of_cols_to_check]].isnull().apply(lambda x: all(x), axis=1) 

import pandas as pd
import numpy as np


df = pd.DataFrame({'movie': [np.nan, 'thg', 'mol', 'mol', 'lob', 'lob'],
                  'rating': [np.nan, 4., 5., np.nan, np.nan, np.nan],
                  'name':   ['John', np.nan, 'N/A', 'Graham', np.nan, np.nan]}) 
df.head()

在此输入图像描述


To check if all columns is NaN: 要检查所有列是否为NaN:

cols_to_check = df.columns
df['is_na'] = df[cols_to_check].isnull().apply(lambda x: all(x), axis=1) 
df.head() 

在此输入图像描述


To check if columns 'name', 'rating' are NaN: 要检查列的“名称”,“评级”是否为NaN:

cols_to_check = ['name', 'rating']
df['is_na'] = df[cols_to_check].isnull().apply(lambda x: all(x), axis=1) 
df.head()  

在此输入图像描述

I need to check if in any particular row all the values are NaN so that I can drop them from my dataset. 我需要检查在任何特定行中是否所有值都是NaN,以便我可以从我的数据集中删除它们。

That's exactly what pd.DataFrame.dropna(how='all') does: 这正是pd.DataFrame.dropna(how='all')作用:

In [3]: df = pd.DataFrame({'a': [None, 1, None], 'b': [None, 1, 2]})

In [4]: df
Out[4]: 
     a    b
0  NaN  NaN
1  1.0  1.0
2  NaN  2.0

In [5]: df.dropna(how='all')
Out[5]: 
     a    b
1  1.0  1.0
2  NaN  2.0

Regarding your second question, pd.DataFrame.boxplot will do that. 关于你的第二个问题, pd.DataFrame.boxplot会这样做。 You can specify the columns you want (if needed), with the column parameter. 您可以使用column参数指定所需的列(如果需要)。 See the example in the docs also. 请参阅文档中的示例

Check if all columns in rows value is NaN检查行值中的所有列是否为 NaN

    #This gives you a boolean output if the df contains any row with all NaN values
    df.isnull().values.all()

The answer given by @Ami still holds. @Ami 给出的答案仍然成立。 This check is useful when dealing with derived values, before dropping you might need to re-evaluate your feature extraction logic if any.此检查在处理派生值时很有用,在删除之前,您可能需要重新评估您的特征提取逻辑(如果有)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM