简体   繁体   中英

Extract column names from Excel where rows values are blank or NaN using Python

Out of multiple columns present in Excel sheet, I need to check and find out the names of the specific columns for each rows from an excel sheet where the rows having NaN or blank, and enter the name of the column in an another column. If none of the column is having any blank OR NaN values it will be written as No Gaps.

Input Data:

col1    col2   col3   col4   col5   col6   Result
AB       BC     CD     EF     GH     IJ
AN       AP            AR     AS     AT
BP              BQ     BR            BT
BZ       BY                   BX     BW
CP       CQ     CR     CS           NaN
CZ       NaN    CR     CS           NaN

Expected output:

Result

No Gaps  
col3 is not available
col2, col5 not available 
col3, col4 not available
col5, col6 not available
col1, col5, col6 not available

The below script can gives the correct output for rows with NaN value in dataframe, but if there is any blank rows, it doesn't consider.

Script i have been using:

p = df[['col1','col2','col3','col4','col5']]
z = p.isna().dot(p.columns+",").str.rstrip(",")

df['Results'] = np.where(z.ne(''),z.add(" not available"),"No Gaps")

Also tried using:

z = p.eq('').dot(p.columns+",").str.rstrip(",")

Idea is replace empty strings to missing values before testing them:

p = df[['col1','col2','col3','col4','col5']]
z = p.replace('', np.nan).isna().dot(p.columns+",").str.rstrip(",")

df['Results'] = np.where(z.ne(''),z.add(" not available"),"No Gaps")
print (df)
  col1 col2 col3 col4 col5 col6                  Results
0   AB   BC   CD   EF   GH   IJ                  No Gaps
1   AN   AP        AR   AS   AT       col3 not available
2   BP        BQ   BR        BT  col2,col5 not available
3   BZ   BY             BX   BW  col3,col4 not available
4   CP   CQ   CR   CS       NaN       col5 not available
5   CZ  NaN   CR   CS       NaN  col2,col5 not available

If possible empty string with spaces use:

p = df[['col1','col2','col3','col4','col5']]
z = p.replace(r'^\s*$', np.nan, regex=True).isna().dot(p.columns+",").str.rstrip(",")

df['Results'] = np.where(z.ne(''),z.add(" not available"),"No Gaps")
print (df)
  col1 col2 col3 col4 col5 col6                  Results
0   AB   BC   CD   EF   GH   IJ                  No Gaps
1   AN   AP        AR   AS   AT       col3 not available
2   BP        BQ   BR        BT  col2,col5 not available
3   BZ   BY             BX   BW  col3,col4 not available
4   CP   CQ   CR   CS       NaN       col5 not available
5   CZ  NaN   CR   CS       NaN  col2,col5 not available

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM