简体   繁体   中英

Fastest way to fill multiple columns by a given condition on other columns pandas

I'm working with a very long dataframe, so I'm looking for the fastest way to fill several columns at once given certain conditions. So let's say you have this dataframe:

data = {
        'col_A1':[1,'','',''],
        'col_A2':['','','',''],
        'col_A3':['','','',''],
        'col_B1':['','',1,''],
        'col_B2':['','','',''],
        'col_B3':['','','',''],
        'col_C1':[1,1,'',''],
        'col_C2':['','','',''],
        'col_C3':['','','',''],
        }
df = pd.DataFrame(data)
df

Input:

col_A1 col_A2 col_A3 col_B1 col_B2 col_B3 col_C1 col_C2 col_C3
1 1
1
1

And we want to find all '1' values in columns A1,B1 and C1 and then replace other values in the matching rows and columns A2,A3, B2,B3 and C2,C3 as well:

Output:

col_A1 col_A2 col_A3 col_B1 col_B2 col_B3 col_C1 col_C2 col_C3
1 2 3 1 2 3
1 2 3
1 2 3

I am currently iterating over columns A and looking for where A1 == 1 matches and then replacing the values for A2 and A3 in the matching rows, and the same for B, C... But speed is important, so I'm wondering if I can do this for all columns at once, or in a more vectorized way.

You can use:

# extract letters/numbers from column names
nums = df.columns.str.extract('(\d+)$', expand=False)
# ['1', '2', '3', '1', '2', '3', '1', '2', '3']
letters = df.columns.str.extract('_(\D)', expand=False)
# ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']

# or in a single line
# letters, nums = df.columns.str.extract(r'(\D)(\d+)$').T.to_numpy()

# compute a mask of values to fill
mask = df.ne('').groupby(letters, axis=1).cummax(axis=1)
# NB. alternatively use df.eq('1')...

# set the values
df2 = mask.mul(nums)

output:

  col_A1 col_A2 col_A3 col_B1 col_B2 col_B3 col_C1 col_C2 col_C3
0      1      2      3                           1      2      3
1                                                1      2      3
2                           1      2      3                     
3                                                               

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM