简体   繁体   English

特定列上的Python数据框总和

[英]Python dataframe sum on specific columns

My task is to calculate the cumulative principal. 我的任务是计算累积本金。 Using numpy.ppmt it only gives a specific month's paid principal, so I want to add columns which contains monthly principal for each record and then take the sum to get the cumulative principal. 使用numpy.ppmt只能给出特定月份的有偿本金,因此我想添加包含每个记录的每月本金的列,然后取总和来获得累积本金。

For example, I have a dataframe looks like the following: 例如,我有一个数据框如下所示:

frame = pd.DataFrame({'rate':[0.1, 0.1], 'per':[2, 4], 'nper':[360, 360], 'pv':[100000, 200000]})

max_per = frame['per'].max()
columns = ['principal%s'%i for i in range(1, max_per + 1)]
df = pd.DataFrame(index=frame.index, columns=columns, dtype='float').fillna(0)

for index, column in enumerate(columns):
    df[column] = -np.ppmt(rate=frame['rate'] / 100 / 12, per=index + 1, nper=frame['nper'], 
                          pv=frame['pv'], when=when)
frame.join(df)

The dataframe will look like the following: 数据框将如下所示:

   nper  per      pv  rate          epp  principal1  principal2  principal3  \
0   360    2  100000   0.1   547.309838  273.643517  273.666321  273.689126   
1   360    4  200000   0.1  2189.421796  547.287034  547.332642  547.378253   

   principal4  
0  273.711934  
1  547.423868

The problem is that for record one, principal3 and principal4 should be 0. One workaround is to calculate the sum of principal1-principal4 based on column 'per', for example if frame.per == 2, then I only sum principal1 and principal 2, and if frame.per == 4, then I only sum principal1 through principal4. 问题在于,对于记录一,principal3和principal4应该为0。一种解决方法是基于“ per”列计算principal1-principal4的总和,例如,如果frame.per == 2,则我仅将principal1和principal相加2,如果frame.per == 4,那么我只将principal1到principal4相加。 Any help to do that. 任何帮助做到这一点。

I can calculate the cumulative principal by calling apply, but I do not want to do that because it is slow. 我可以通过调用apply计算累计本金,但是我不想这样做,因为它很慢。

Thanks. 谢谢。

One possible solution is set 0 by mask before join by boolean mask with comparing arange by length of column with per column for 2d numpy array : 一种可能的解决方案是在通过布尔掩码join之前join mask设置为0 ,然后将2d numpy array的列长度范围与per列进行比较:

#subtract 1 because python counts from 0
mask = np.arange(len(df.columns)) > frame['per'].values[:, None] - 1
df = frame.join(df.mask(mask, 0))
print(df)
   rate  per  nper      pv  principal1  principal2  principal3  principal4
0   0.1    2   360  100000  273.643517  273.666321    0.000000    0.000000
1   0.1    4   360  200000  547.287034  547.332642  547.378253  547.423868

Another solution with numpy.where : numpy.where另一个解决方案:

mask = np.arange(len(df.columns)) > frame['per'].values[:, None] - 1
df[:] = np.where(mask, 0, df)
df = frame.join(df)
print(df)
   rate  per  nper      pv  principal1  principal2  principal3  principal4
0   0.1    2   360  100000  273.643517  273.666321    0.000000    0.000000
1   0.1    4   360  200000  547.287034  547.332642  547.378253  547.423868

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM