简体   繁体   中英

Dropping rows with pandas data frame when multiple Null values exist

I'm attempting to go through each row in a data frame and checking if selected row has more than 3 null values (this part works) and then deleting the entire row. However, upon trying to drop said rows from the data frame, I'm met with an error:

AttributeError: 'NoneType' object has no attribute 'index'

Forgive me if this code is inefficient, I only need it to work.

import pandas as pd

df = pd.read_csv('data/mycsv.csv')


i = 0

while i < len(df.index):
    if df.iloc[i].isnull().sum() > 3:    
        df = df.drop(df.index[i], inplace = True)
    i += 1

Use DataFrame.dropna with thresh , but because it is for non NaNs column need subtract length of columns:

np.random.seed(2021)

df = pd.DataFrame(np.random.choice([np.nan, 1], size=(5,6)))
print (df)
     0    1    2    3    4    5
0  NaN  1.0  1.0  NaN  1.0  NaN
1  NaN  NaN  1.0  NaN  1.0  1.0
2  1.0  1.0  NaN  NaN  NaN  NaN
3  NaN  NaN  1.0  1.0  1.0  1.0
4  NaN  1.0  NaN  1.0  NaN  NaN

N = 3
df1 = df.dropna(thresh=len(df.columns) - N)
print(df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0


N = 2
df2 = df.dropna(thresh=len(df.columns) - N)
print(df2)
    0   1    2    3    4    5
3 NaN NaN  1.0  1.0  1.0  1.0

You can filter rows if equal or less like 3 NaN s in boolean indexing :

N = 3
df1 = df[df.isnull().sum(axis=1) <= N]
print (df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0

Use threshold=X as parameter of dropna where X is the number of columns ( df.shape[1] ) minus your threshold ( 3 ).

Suppose this dataframe

>>> df
     0    1    2    3    4    5
0  NaN  NaN  NaN  NaN  NaN  NaN  # Drop (Nan = 6)
1  NaN  NaN  NaN  NaN  NaN  1.0  # Drop (Nan = 5)
2  NaN  NaN  NaN  NaN  1.0  1.0  # Drop (Nan = 4)
3  NaN  NaN  NaN  1.0  1.0  1.0  # Keep (Nan = 3)
4  NaN  NaN  1.0  1.0  1.0  1.0  # Keep (Nan = 2)
5  NaN  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 1)
6  1.0  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 0)
df = df.dropna(thresh=df.shape[1] - 3)
print(df)

     0    1    2    3    4    5
3  NaN  NaN  NaN  1.0  1.0  1.0
4  NaN  NaN  1.0  1.0  1.0  1.0
5  NaN  1.0  1.0  1.0  1.0  1.0
6  1.0  1.0  1.0  1.0  1.0  1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM