简体   繁体   English

Pandas dropna() 不起作用(这绝对不是常见的原因!)

[英]Pandas dropna() not working (it definitely isn't the common reasons why!)

I have this dataframe:我有这个 dataframe:

在此处输入图像描述

There are many NaNs that were somehow produced when transforming data:转换数据时会以某种方式产生许多 NaN:

在此处输入图像描述

So I try to drop them using:所以我尝试使用以下方法删除它们:

df = df.dropna(how='all')

I still just get this (I know I'm only showing 3 columns, but all the columns are filled with NaNs)我仍然得到这个(我知道我只显示 3 列,但所有列都填充了 NaN)

在此处输入图像描述

I've tried assuming their string and using:我试过假设他们的字符串并使用:

df = df[~df.isin(['NaN']).any(axis=1)]

This also didn't work.这也没有奏效。 Any other thoughts or ideas?还有其他想法或想法吗?

When you slice with a Boolean DataFrame the logic used is where .当您使用 Boolean DataFrame切片时, 使用的逻辑是where That is, where the mask is True it returns the value, where the mask is False it by default chooses np.NaN .也就是说,在掩码为True的情况下,它返回值,在掩码为False的情况下,它默认选择np.NaN

Thus, if you are slicing with df.isna() by definition you NaN everything .因此,如果您根据定义使用df.isna()进行切片,那么您就是NaN Everything This is because where df.isna() is True it passes the value ( NaN ) and where the df was not null where passes NaN .这是因为df.isna()是 True 它传递值( NaN ),而 df 不是 null 传递NaNwhere

import pandas as pd
import numpy as np

df = pd.DataFrame({'foo': np.NaN, 'bar': np.NaN, 'baz': np.NaN, 'boo': 1}, index=['A'])
#   foo  bar  baz  boo
#A  NaN  NaN  NaN    1

df.isnull()
#    foo   bar   baz    boo
#A  True  True  True  False

df[df.isnull()]
#   foo  bar  baz  boo
#A  NaN  NaN  NaN  NaN

df.where(df.isnull())
#   foo  bar  baz  boo
#A  NaN  NaN  NaN  NaN

So you don't have rows full of NaN , your mask just guarantees every cell becomes NaN .所以你没有充满NaN的行,你的掩码只是保证每个单元格都变成NaN If you want to inspect rows that are NaN without modifying the values you can display rows with at least 1 NaN :如果要在不修改值的情况下检查为NaN的行,则可以显示至少为 1 NaN的行:

df[df.isnull().any(1)]
#   foo  bar  baz  boo
#A  NaN  NaN  NaN    1

Or to see the distribution of NaN across the rows take the value counts of the sum across rows.或者要查看NaN的分布,取跨行总和的值计数。 This shows we have 1 row with 3 null values.这表明我们有 1 行具有 3 个 null 值。

df.isnull().sum(1).value_counts()
#3    1
#dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM