简体   繁体   English

Pandas - 删除只有 NaN 值的行

[英]Pandas - Delete Rows with only NaN values

I have a DataFrame containing many NaN values.我有一个包含许多 NaN 值的 DataFrame。 I want to delete rows that contain too many NaN values;我想删除包含太多 NaN 值的行; specifically: 7 or more.特别是:7个或更多。

I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.我尝试以多种方式使用dropna函数,但很明显它会贪婪地删除包含任何NaN 值的列或行。

This question ( Slice Pandas DataFrame by Row ), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple这个问题( Slice Pandas DataFrame by Row )告诉我,如果我可以编译一个包含太多 NaN 值的行列表,我可以用一个简单的方法将它们全部删除

df.drop(rows)

I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?).我知道我可以使用count函数计算非空值,我可以将它们从总数中减去并以这种方式获得 NaN 计数(是否有直接的方法来计算连续的 NaN 值?)。 But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.但即便如此,我还是不确定如何编写一个逐行遍历 DataFrame 的循环。

Here's some pseudo-code that I think is on the right track:这是我认为正确的一些伪代码:

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem;我还是 Pandas 的新手,所以我对解决这个问题的其他方法非常开放; whether they're simpler or more complex.无论它们是更简单还是更复杂。

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:基本上这样做的方法是确定 cols 的数量,设置非 nan 值的最小数量并删除不符合此条件的行:

df.dropna(thresh=(len(df) - 7))

See the docs查看文档

df.dropna的可选 thresh 参数允许您为其提供最小数量的非 NA 值以保留该行。

df.dropna(thresh=df.shape[1]-7)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM