Pandas: Optimal subtract every nth row

Question

I'm writing a function for a special case of row-wise subtraction in pandas.

First the user should be able to specify rows either by regex (ie "_BL[0-9]+") or by regular index ie every 6th row
Then we must subtract every matching row from rows preceding it, but not past another match
[Optionally] Drop selected rows
Column to match on should be user-defined by either index or label

For example if:

Samples	var1	var1
something	10	20
something	20	30
something	40	30
some_BL20_thing	100	100
something	50	70
something	90	100
some_BL10_thing	100	10

Expected output should be:

Samples	var1	var1
something	-90	-80
something	-80	-70
something	-60	-70
something	-50	60
something	-10	90

My current (incomplete) implementation relies heavily on looping:

 def subtract_blanks(data:pd.DataFrame, num_samples:int)->pd.DataFrame: ''' Accepts a data dataframe and a mod int and subtracts each blank from all mod preceding samples ''' expr = compile(r'(_BL[0-9]{1})') output = data.copy(deep = True) for idx,row in output.iterrows(): if search(expr,row['Sample']): for i in range(1,num_samples+1): output.iloc[idx-i,data_start:] = output.iloc[idx-i,6:]-row.iloc[6:] return output

Is there a better way of doing this? This implementation seems pretty ugly. I've also considered maybe splitting the DataFrame to chucks and operating on them instead.

Answer 1

Code

# Create boolean mask for matching rows # m = np.arange(len(df)) % 6 == 5 # for index match m = df['Samples'].str.contains(r'_BL\d+') # for regex match # mask the values and backfill to propagate the row # values corresponding to match in backward direction df['var1'] = df['var1'] - df['var1'].mask(~m).bfill() # Delete the matching rows df = df[~m].copy()

 Samples var1 var1 0 something -90.0 -80.0 1 something -80.0 -70.0 2 something -60.0 -70.0 4 something -50.0 60.0 5 something -10.0 90.0

Note: The core logic is specified in the code so I'll leave the function implementation upto the OP.

Pandas: Optimal subtract every nth row

Question

1 answers

solution1
1 ACCPTED 2022-07-04 10:18:06

Code

Pandas: Optimal subtract every nth row

Question

1 answers

solution1 1 ACCPTED 2022-07-04 10:18:06

Code

solution1
1 ACCPTED 2022-07-04 10:18:06