[英]How to substract columns in pandas df based on condition
I have a dataset which looks like this.我有一个看起来像这样的数据集。 In my new dataset, I want to subtract the amount column(s) with principal column(s) and remainder(s) column.在我的新数据集中,我想用主列和余数列减去金额列。
For instance, if the amount
column is 4, the principal
column is 2 and remainder
is 3, then the first amount column must be subtracted from the first principal column and first remainder column, 2nd with 2nd principal column and 2nd remainder column and 3rd with 3rd remainder column (since now there is no more principal column).例如,如果amount
列是 4, principal
列是 2, remainder
是 3,则必须从第一个主列和第一个余数列中减去第一个金额列,第 2 个与第 2 个主列和第 2 个余数列相减,第 3 个与第三个余数列(因为现在没有更多的主列)。 And the last amount4 column must stay as it is as newamount4最后一个 amount4 列必须保持原样 newamount4
amount1 amount2 amount3 amount4 principal1 principal2 remainder1 remainder2 remainder3
100 250 150 100 250 100 80 100 100
200 200 350 25 450 100 120 100 50
300 150 450 30 200 100 150 100 100
250 550 550 100 100 200 50 500 200
550 200 650 200 250 200 500 100 500
My new dataset must look like this.我的新数据集必须如下所示。 Please note am stands for amount and pr stands for principal and rem stands for remainder .请注意am代表金额, pr代表本金, rem代表余数。
newamount1 newamount2 newamount3 newamount4
-230(am1-pr1-rem1) 50(am2-pr2-rem2) 50(am3-rem3) amount4
-370 0 300 amount4
50 50 350 amount4
100 -150 350 amount4
-200 -100 150 amount4
You can use defaultdict
to group common suffixes, then apply a reducing function ( np.subtract.reduce
) to get your output:您可以使用defaultdict
对常见后缀进行分组,然后应用减少 function ( np.subtract.reduce
) 来获得 output:
from collections import defaultdict
mapping = defaultdict(list)
for column in df:
if column[-1] != 4:
mapping[f"newamount{column[-1]}"].append(df[column])
else:
mapping[f"newamount{column[-1]}"].append(column)
mapping = {
key: np.subtract.reduce(value) if "4" not in key else "amount4"
for key, value in mapping.items()
}
pd.DataFrame(mapping)
newamount1 newamount2 newamount3 newamount4
0 -230 50 50 amount4
1 -370 0 300 amount4
2 -50 -50 350 amount4
3 100 -150 350 amount4
4 -200 -100 150 amount4
You could also iterate through a groupby:您还可以遍历 groupby:
mapping = {
f"newamount{key}": frame.agg(np.subtract.reduce, axis=1)
for key, frame in df.groupby(df.columns.str[-1], axis=1)
}
pd.DataFrame(mapping).assign(newamount4="amount4")
You may use the code below and adapt it if your data goes beyond 4
:如果您的数据超过4
,您可以使用下面的代码并对其进行调整:
You can use pivot_longer function from pyjanitor to reshape the data before grouping and aggregating;您可以使用pyjanitor中的 pivot_longer function在分组和聚合之前重塑数据; at the moment you have to install the latest development version from github :目前您必须从github安装最新的开发版本:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
(
df.pivot_longer(names_to=".value",
names_pattern=".+(\d)$",
ignore_index=False)
.fillna(0)
.add_prefix("newamount")
.groupby(level=0)
.agg(np.subtract.reduce)
.assign(newamount4="amount4") # edit your preferred column
)
Sticking to functions within Pandas only, we can reshape the data by stacking, before grouping and aggregating:仅使用 Pandas 中的函数,我们可以在分组和聚合之前通过堆叠来重塑数据:
df.columns = df.columns.str.split("(\d)", expand=True).droplevel(-1)
(
df.stack(0)
.fillna(0)
.droplevel(-1)
.groupby(level=0)
.agg(np.subtract.reduce)
.add_prefix("newamount")
.assign(newamount4="amount4")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.