Python, replace all integers in the last N columns based on the value in the first column for each row

Question

I want to replace the values in the last X columns of my dataframe replacing NaN with 0 and any integers in thos columns with 1. The X columns are defined by the value in the column M.

For example if I had a df with 2 users: A & B Who have been active for the last (M) 1 & 2 periods only respectively.

A has been active the last 1 period only and B the last 2 periods, hence I want to replace the NaNs with 0s in these periods and any integers with 1 to show they were active.

Current structure is like this but extended for a 1million+ users and 24 periods and M can take a value between 0 and 23.

ID | M | P1 | P2 | P3  
A  | 1 | NaN| NaN| NaN    
B  | 2 | NaN| 4  | NaN

I would like to replace with 0 if NaN in the only the last M columns, or with 1 if the there is an integer value in those same columns.

so the data should look like this:

ID | M | P1 | P2 | P3  
A  | 1 | NaN| NaN| 0    
B  | 2 | NaN| 1  | 0

Thank you

Answer 1

Try using the df.apply() method as follows:

import pandas as pd
import numpy as np 

df = pd.DataFrame(
    {
        'ID' : ['A', 'B'],
        'M' : [1, 2],
        'P1' : [np.nan, np.nan],
        'P2' : [np.nan, 4],
        'P3' : [np.nan, np.nan]
    }
)
print(df)

Returns:

  ID  M  P1   P2  P3
0  A  1 NaN  NaN NaN
1  B  2 NaN  4.0 NaN

Then we use the apply function over n_cols which is the number of columns:

n_cols = 3
for i in range(n_cols):
    idx = 0 - (i+1)
    df.iloc[:, idx] = df.iloc[:, idx].apply(lambda x: 0.0 if np.isnan(x) else 1.0)
print(df)

Which returns:

  ID  M  P1   P2   P3
0  A  1 0.0  0.0  0.0
1  B  2 0.0  1.0  0.0

To use the column 'M' as the number of columns do the following - note that this will be slower since there are two loops:

for i, n_cols in enumerate(df['M'].values):
    for j in range(n_cols):
        idx = 0 - (j+1)
        df.iloc[i, idx] = 0.0 if np.isnan(df.iloc[i, idx]) else 1.0

which returns:

  ID  M  P1   P2   P3
0  A  1 NaN  NaN  0.0
1  B  2 NaN  1.0  0.0

Python, replace all integers in the last N columns based on the value in the first column for each row

Question

1 answers

solution1
0 ACCPTED 2020-07-02 13:00:06

Python, replace all integers in the last N columns based on the value in the first column for each row

Question

1 answers

solution1 0 ACCPTED 2020-07-02 13:00:06

solution1
0 ACCPTED 2020-07-02 13:00:06