简体   繁体   中英

Create a new row in a dataframe based only for non NaN values from a column

Lets say i have i dataframe like this one:

      col1       col2         col3
0     data1   Completed       Fail
1     data2   Completed       NaN
2     data3   Completed    Completed
3     data4   Completed       NaN
4     data5      NaN          NaN

How can i add an extra row for each time the value in col3 is not NaN and have a dataframe like this:

      col1     status           
0     data1   Completed 
1     data1      Fail
2     data2   Completed     
3     data3   Completed    
4     data3   Completed
5     data4   Completed      
6     data5      NaN        

I tried this but im not getting the desirable output:

df  = df.melt(id_vars=['col1'],  
        value_name="status")

IIUC, you can first start by using pd.melt() as you already did but also drop all the null values by chaining dropna() . This will get you close, but not exactly where you want to be:

new = df.melt(id_vars='col1',value_name='status').sort_values(by='col1').dropna().drop('variable',axis=1)

>>> print(new)

    col1     status
0  data1  Completed
5  data1       Fail
1  data2  Completed
2  data3  Completed
7  data3  Completed
3  data4  Completed

At this point, you will need to bring over the rows from your original df that were nan in col2. You can do that usingisnull() andpd.concat() respectively:

col2_nan = df.loc[df.col2.isnull()].drop('col3',axis=1).rename(columns = {'col2':'status'})

>>> print(pd.concat([new,col2_nan]).reset_index(drop=True))


    col1     status
0  data1  Completed
1  data1       Fail
2  data2  Completed
3  data3  Completed
4  data3  Completed
5  data4  Completed
6  data5        NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM