简体   繁体   中英

Python: Delete rows from Pandas Dataframe if selected columns are empty

Let's say I have a big DataFrame but I want to concentrate on a selected part of it like 3 columns out of 4. I want to remove the entire row if at least 2 of the values of these selected 3 columns are empty.

For example this is the dataframe I have and my selected columns are ['B','C','D'] :

 A   B   C   D
     1       1
 2           2
 3   3   3   3
 4         

How to get rid of the rows if at least two of values are empty in the selected columns, which are second and fourth rows.

Final dataframe is:

 A   B   C   D
     1       1
 3   3   3   3

Use dropna if empty values are NaN s:

cols = ['B','C','D']

df = df.dropna(subset=cols, thresh=2)
#same as
#df = df[df[cols].isnull().sum(1) < 2]
print (df)
     A    B    C    D
0  NaN  1.0  NaN  1.0
2  3.0  3.0  3.0  3.0

Or if empty values are empty strings compare numpy arrays created by values and filter by boolean indexing :

df = df[(df[cols].values == '').sum(axis=1) < 2]

Use subset with thresh on dropna

In [2720]: df.dropna(subset=['B','C','D'], thresh=2)
Out[2720]:
     A    B    C    D
0  NaN  1.0  NaN  1.0
2  3.0  3.0  3.0  3.0

Or, use notnull

In [2723]: df[df[['B', 'C', 'D']].notnull().sum(1).ge(2)]
Out[2723]:
     A    B    C    D
0  NaN  1.0  NaN  1.0
2  3.0  3.0  3.0  3.0

Details

In [2722]: df
Out[2722]:
     A    B    C    D
0  NaN  1.0  NaN  1.0
1  2.0  NaN  NaN  2.0
2  3.0  3.0  3.0  3.0
3  4.0  NaN  NaN  NaN

If the values are blanks instead of null, use df[df[['B', 'C', 'D']].eq('').sum(1).lt(2)] or df[df[['B', 'C', 'D']].ne('').sum(1).ge(2)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM