I want to replace the values in the last X columns of my dataframe replacing NaN with 0 and any integers in thos columns with 1. The X columns are defined by the value in the column M.
For example if I had a df with 2 users: A & B Who have been active for the last (M) 1 & 2 periods only respectively.
A has been active the last 1 period only and B the last 2 periods, hence I want to replace the NaNs with 0s in these periods and any integers with 1 to show they were active.
Current structure is like this but extended for a 1million+ users and 24 periods and M can take a value between 0 and 23.
ID | M | P1 | P2 | P3
A | 1 | NaN| NaN| NaN
B | 2 | NaN| 4 | NaN
I would like to replace with 0 if NaN in the only the last M columns, or with 1 if the there is an integer value in those same columns.
so the data should look like this:
ID | M | P1 | P2 | P3
A | 1 | NaN| NaN| 0
B | 2 | NaN| 1 | 0
Thank you
Try using the df.apply()
method as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'ID' : ['A', 'B'],
'M' : [1, 2],
'P1' : [np.nan, np.nan],
'P2' : [np.nan, 4],
'P3' : [np.nan, np.nan]
}
)
print(df)
Returns:
ID M P1 P2 P3
0 A 1 NaN NaN NaN
1 B 2 NaN 4.0 NaN
Then we use the apply function over n_cols
which is the number of columns:
n_cols = 3
for i in range(n_cols):
idx = 0 - (i+1)
df.iloc[:, idx] = df.iloc[:, idx].apply(lambda x: 0.0 if np.isnan(x) else 1.0)
print(df)
Which returns:
ID M P1 P2 P3
0 A 1 0.0 0.0 0.0
1 B 2 0.0 1.0 0.0
To use the column 'M'
as the number of columns do the following - note that this will be slower since there are two loops:
for i, n_cols in enumerate(df['M'].values):
for j in range(n_cols):
idx = 0 - (j+1)
df.iloc[i, idx] = 0.0 if np.isnan(df.iloc[i, idx]) else 1.0
which returns:
ID M P1 P2 P3
0 A 1 NaN NaN 0.0
1 B 2 NaN 1.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.