[英]Python dataframe sum on specific columns
My task is to calculate the cumulative principal. 我的任务是计算累积本金。 Using numpy.ppmt it only gives a specific month's paid principal, so I want to add columns which contains monthly principal for each record and then take the sum to get the cumulative principal. 使用numpy.ppmt只能给出特定月份的有偿本金,因此我想添加包含每个记录的每月本金的列,然后取总和来获得累积本金。
For example, I have a dataframe looks like the following: 例如,我有一个数据框如下所示:
frame = pd.DataFrame({'rate':[0.1, 0.1], 'per':[2, 4], 'nper':[360, 360], 'pv':[100000, 200000]})
max_per = frame['per'].max()
columns = ['principal%s'%i for i in range(1, max_per + 1)]
df = pd.DataFrame(index=frame.index, columns=columns, dtype='float').fillna(0)
for index, column in enumerate(columns):
df[column] = -np.ppmt(rate=frame['rate'] / 100 / 12, per=index + 1, nper=frame['nper'],
pv=frame['pv'], when=when)
frame.join(df)
The dataframe will look like the following: 数据框将如下所示:
nper per pv rate epp principal1 principal2 principal3 \
0 360 2 100000 0.1 547.309838 273.643517 273.666321 273.689126
1 360 4 200000 0.1 2189.421796 547.287034 547.332642 547.378253
principal4
0 273.711934
1 547.423868
The problem is that for record one, principal3 and principal4 should be 0. One workaround is to calculate the sum of principal1-principal4 based on column 'per', for example if frame.per == 2, then I only sum principal1 and principal 2, and if frame.per == 4, then I only sum principal1 through principal4. 问题在于,对于记录一,principal3和principal4应该为0。一种解决方法是基于“ per”列计算principal1-principal4的总和,例如,如果frame.per == 2,则我仅将principal1和principal相加2,如果frame.per == 4,那么我只将principal1到principal4相加。 Any help to do that. 任何帮助做到这一点。
I can calculate the cumulative principal by calling apply, but I do not want to do that because it is slow. 我可以通过调用apply计算累计本金,但是我不想这样做,因为它很慢。
Thanks. 谢谢。
One possible solution is set 0
by mask
before join
by boolean mask with comparing arange by length of column with per
column for 2d numpy array
: 一种可能的解决方案是在通过布尔掩码join
之前join
mask
设置为0
,然后将2d numpy array
的列长度范围与per
列进行比较:
#subtract 1 because python counts from 0
mask = np.arange(len(df.columns)) > frame['per'].values[:, None] - 1
df = frame.join(df.mask(mask, 0))
print(df)
rate per nper pv principal1 principal2 principal3 principal4
0 0.1 2 360 100000 273.643517 273.666321 0.000000 0.000000
1 0.1 4 360 200000 547.287034 547.332642 547.378253 547.423868
Another solution with numpy.where
: numpy.where
另一个解决方案:
mask = np.arange(len(df.columns)) > frame['per'].values[:, None] - 1
df[:] = np.where(mask, 0, df)
df = frame.join(df)
print(df)
rate per nper pv principal1 principal2 principal3 principal4
0 0.1 2 360 100000 273.643517 273.666321 0.000000 0.000000
1 0.1 4 360 200000 547.287034 547.332642 547.378253 547.423868
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.