简体   繁体   中英

fill NaN values of a df under condition

I have a resampled df:

          Timestamp         Loading      Power      Energy      ID      status
2020-04-09 06:45:00             1.0       1000        5000       1          on
2020-04-09 06:46:00             1.0       1000        5500       1          on
2020-04-09 06:47:00             NaN        NaN         NaN     NaN         NaN
2020-04-09 06:48:00             NaN        NaN         NaN     NaN         NaN
2020-04-09 06:49:00             1.0          5           0       1         off
2020-04-09 06:50:00             1.0       3000         200       2          on
...

The first thing: df['Loading'] was originally of the type 'boolean' and no its a number (1 or 0) - how can i change this?

The NaN values of the column df['status'] should simply be continued (last entry was on, then the lines should be filled with on until an off comes).

Now the other lines of the other columns should be filled differently, depending on whether the status is on or off:

status == on: loading = 'true'; energy = last existing entry; power = last existing entry; id == last existing entry

status == off: loading = 'false'; energy = 0; power = 0; Id = 'no ID'.

i tried something like that:

cond = (df2['Status'] != df2['Status'].shift(-1)) | (df2['Status'].notna())
df2.loc[cond] = df2.loc[cond].ffill()

without desired success...

Expected outcome:

          Timestamp         Loading      Power      Energy      ID      status
2020-04-09 06:45:00            True       1000        5000       1          on
2020-04-09 06:46:00            True       1000        5500       1          on
2020-04-09 06:47:00            True       1000        5500       1          on
2020-04-09 06:48:00            True       1000        5500       1          on
2020-04-09 06:49:00           False          5           0   no Id         off
2020-04-09 06:49:00            True       3000         200       2          on
...

EDIT the condition for filling the nan values is more complicated than expected: I have different cycles which are marked by different IDs. Within a cycle (ID appears both before and after the nan value) the power of the two "surrounding" lines should be averaged and in the column energy the last existing value of the column energy should be entered. Outside the cycle (ID before.= next ID) the power as well as the energy should be set to 0.

Use for loop like this

df["status"]=[df["status"].values[i-1] if pd.isna(x) else x for i,x in enumerate (df["status"].values) ]

First, for the boolean column you can use:

df["Loading"] = df["Loading"].map({1:True, np.nan: False})

For filling the NAs:

df["status"] = df["status"].ffill()

Finally for the condition, I do not fully understand your description, should it be "no ID" whenever some of the cases holds? maybe this can work:

df.at[df[status]=="off","ID"] = "no ID"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM