Pandas - 删除只有 NaN 值的行

Question

I have a DataFrame containing many NaN values.我有一个包含许多 NaN 值的 DataFrame。 I want to delete rows that contain too many NaN values;我想删除包含太多 NaN 值的行； specifically: 7 or more.特别是：7个或更多。

I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.我尝试以多种方式使用dropna函数，但很明显它会贪婪地删除包含任何NaN 值的列或行。

This question ( Slice Pandas DataFrame by Row ), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple这个问题（ Slice Pandas DataFrame by Row ）告诉我，如果我可以编译一个包含太多 NaN 值的行列表，我可以用一个简单的方法将它们全部删除

df.drop(rows)

I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?).我知道我可以使用count函数计算非空值，我可以将它们从总数中减去并以这种方式获得 NaN 计数（是否有直接的方法来计算连续的 NaN 值？）。 But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.但即便如此，我还是不确定如何编写一个逐行遍历 DataFrame 的循环。

Here's some pseudo-code that I think is on the right track:这是我认为正确的一些伪代码：

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem;我还是 Pandas 的新手，所以我对解决这个问题的其他方法非常开放； whether they're simpler or more complex.无论它们是更简单还是更复杂。

Answer 1

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:基本上这样做的方法是确定 cols 的数量，设置非 nan 值的最小数量并删除不符合此条件的行：

df.dropna(thresh=(len(df) - 7))

See the docs查看文档

Answer 2

df.dropna的可选 thresh 参数允许您为其提供最小数量的非 NA 值以保留该行。

df.dropna(thresh=df.shape[1]-7)

Pandas - 删除只有 NaN 值的行

问题描述

2 个解决方案

解决方案1
14 已采纳 2014-08-05 19:15:53

解决方案2
4 2014-08-05 19:14:58

Pandas - 删除只有 NaN 值的行

问题描述

2 个解决方案

解决方案1 14 已采纳 2014-08-05 19:15:53

解决方案2 4 2014-08-05 19:14:58

解决方案1
14 已采纳 2014-08-05 19:15:53

解决方案2
4 2014-08-05 19:14:58