简体   繁体   中英

Pandas - Replace multiple column values with previous column value when condition is met

I have a large dataframe that looks like this:

Start       End        Alm_No1 Val_No1  Alm_No2 Val_No2 Alm_No3 Val_No3
1/1/19 0:00 1/2/19 0:00    1       0       2       1       3       0
1/2/19 0:00 1/3/19 0:00    1       0       2       0       3       1
1/3/19 0:00 1/4/19 0:00    1       1       2       0       3       0
1/4/19 0:00 1/5/19 0:00    1       0       2       0       3       1
1/5/19 0:00 1/6/19 0:00    1       1       2       0       3       0
1/6/19 0:00 1/7/19 0:00    1       0       2       1       3       1
1/7/19 0:00 1/8/19 0:00    4       0       5       1       6       0
1/8/19 0:00 1/9/19 0:00    4       0       5       1       6       1
1/9/19 0:00 1/10/19 0:00   4       1       5       1       6       0

I want to update all values in columns "Val" with the number from the associated "Alm" column if the value is 1 so that I can get rid of the "Alm" columns.

The outcome would look like this:

Start           End     Alm_No1  Val_No1 Alm_No2 Val_No2  Alm_No3 Val_No3
1/1/19 0:00 1/2/19 0:00    1       0       2       2       3       0
1/2/19 0:00 1/3/19 0:00    1       0       2       0       3       3
1/3/19 0:00 1/4/19 0:00    1       1       2       0       3       0
1/4/19 0:00 1/5/19 0:00    1       0       2       0       3       3
1/5/19 0:00 1/6/19 0:00    1       1       2       0       3       0
1/6/19 0:00 1/7/19 0:00    1       0       2       2       3       3
1/7/19 0:00 1/8/19 0:00    4       0       5       5       6       0
1/8/19 0:00 1/9/19 0:00    4       0       5       5       6       6
1/9/19 0:00 1/10/19 0:00   4       4       5       5       6       0

I have created the list of columns which value should be changed:

val_col = df.columns.tolist()
val_list=[]
for i in range(0, len(val_col)) : 
    if val_col[i].startswith('Val'): 
        val_list.append(i)

then I tried creating a while look to iterate over the columns:

for x in val_list: 
    i = 0 
    while i < len(df): 
        if df.iloc[i, x] == 1: 
            df.iloc[i, x] = df.iloc[i, x-1] 
            i+=1 

It takes forever too load and I have a hard time finding something that works with lambda or apply. Any hint? Thanks in advance!

Never loop over the rows of a dataframe. You should set columns all in one operation.

for i in range(1,4): 
    df[f'Val_No{i}'] *= df[f'Alm_No{i}'] 

I feel silly answering my own questions just a few minutes later but I think I found something that works:

for x in val_list:
    df.loc[df.iloc[:,x]==1,df.columns[x]] = df.iloc[:, x-1]

Worked like a charm!

234 ms ± 15.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I came up with a solution working for arbitrary number of Alm_No... / Val_No... columns.

Let's start from a function to be applied to each row:

def fn(row):
    for i in range(2, row.size, 2):
        j = i + 1
        if row.iloc[j]:
            row.iloc[j] = row.iloc[i]
    return row

Note the construction of the for loop. It starts from 2 (position of Alm_No1 column), with step 2 (the distance to Alm_No2 column).

j holds the number of the next column ( Val_No... ).

If the "current" Val_No != 0 then substitute here the value from the "current" Alm_No .

After the loop completes the changed row is returned.

So the only thing to do is to apply this function to each row:

df.apply(fn, axis=1)

My timeit measurements indicated that my solution runs a little (7 %) quicker than yours and about 35 times quicker than the one proposed by BallpointBen .

Apparently, the usage of f-strings has some share in this (quite significant) difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM