Pandas：最佳每第n行减去一次

Question

I'm writing a function for a special case of row-wise subtraction in pandas.我正在为 pandas 中的逐行减法的特殊情况编写 function。

First the user should be able to specify rows either by regex (ie "_BL[0-9]+") or by regular index ie every 6th row首先，用户应该能够通过正则表达式（即“_BL[0-9]+”）或常规索引（即每第 6 行）指定行
Then we must subtract every matching row from rows preceding it, but not past another match然后我们必须从它前面的行中减去每个匹配的行，但不能超过另一个匹配
[Optionally] Drop selected rows [可选] 删除选定的行
Column to match on should be user-defined by either index or label要匹配的列应由索引或 label 用户定义

For example if:例如，如果：

Samples样品	var1变量1	var1变量1
something某物	10 10	20 20
something某物	20 20	30 30
something某物	40 40	30 30
some_BL20_thing some_BL20_thing	100 100	100 100
something某物	50 50	70 70
something某物	90 90	100 100
some_BL10_thing some_BL10_thing	100 100	10 10

Expected output should be:预期的 output 应该是：

Samples样品	var1变量1	var1变量1
something某物	-90 -90	-80 -80
something某物	-80 -80	-70 -70
something某物	-60 -60	-70 -70
something某物	-50 -50	60 60
something某物	-10 -10	90 90

My current (incomplete) implementation relies heavily on looping:我当前的（不完整的）实现很大程度上依赖于循环：

 def subtract_blanks(data:pd.DataFrame, num_samples:int)->pd.DataFrame: ''' Accepts a data dataframe and a mod int and subtracts each blank from all mod preceding samples ''' expr = compile(r'(_BL[0-9]{1})') output = data.copy(deep = True) for idx,row in output.iterrows(): if search(expr,row['Sample']): for i in range(1,num_samples+1): output.iloc[idx-i,data_start:] = output.iloc[idx-i,6:]-row.iloc[6:] return output

Is there a better way of doing this?有没有更好的方法来做到这一点？ This implementation seems pretty ugly.这个实现看起来很丑陋。 I've also considered maybe splitting the DataFrame to chucks and operating on them instead.我还考虑过可能将 DataFrame 拆分为卡盘并对其进行操作。

Answer 1

Code代码

# Create boolean mask for matching rows # m = np.arange(len(df)) % 6 == 5 # for index match m = df['Samples'].str.contains(r'_BL\d+') # for regex match # mask the values and backfill to propagate the row # values corresponding to match in backward direction df['var1'] = df['var1'] - df['var1'].mask(~m).bfill() # Delete the matching rows df = df[~m].copy()

 Samples var1 var1 0 something -90.0 -80.0 1 something -80.0 -70.0 2 something -60.0 -70.0 4 something -50.0 60.0 5 something -10.0 90.0

Note: The core logic is specified in the code so I'll leave the function implementation upto the OP.注意：核心逻辑在code中指定，所以我将把 function 的实现留给 OP。

Pandas：最佳每第n行减去一次

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-04 10:18:06

Code代码

Pandas：最佳每第n行减去一次

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-04 10:18:06

Code代码

解决方案1
1 已采纳 2022-07-04 10:18:06