I basically picked up Python last week, and although I am currently learning the basics, I've been tasked with building a small program in python at work. And would appreciate some help on this. I would like to create a SUMIFS function similar to the excel version. My data contains a cash flow date (CFDATE), portfolio name (PORTFOLIO) and cash flow amount (CF). I want tot sum the CF based on which portfolio it belongs to and based on the date on which it falls. I have managed to achieve this using the code below, however I am struggling to output my results as an array/table where the header row comprises of all the portfolios, and the initial column is a list of the dates (duplicates removed) and the CF are grouped according to each combination of (CFDATE,PORTFOLIO).
eg of desired output: PORTFOLIO-> 'A' 'B' 'C' CFDATE
'30/09/2017' 300 600 300 '31/10/2017' 300 0 600
code used so far:
from pandas import Series,DataFrame
from numpy import matrix
import numpy as np
import pandas as pd
df = DataFrame(pd.read_csv("...\Test.csv"))
portfolioMapping = sorted(list(set(df.PORTFOLIO)))
cfDateMapping = list(set(df.CFDATE))
for i in range(0,len(portfolioMapping)):
dfVar = df.CF * np.where(df.PORTFOLIO == portfolioMapping[i] , 1, 0)
for j in range(0,len(cfDateMapping)):
dfVar1 = df.CF/df.CF * np.where(df.CFDATE == cfDateMapping[j] , 1, 0)
print([portfolioMapping[i],[cfDateMapping[j]],sum(dfVar*dfVar1)])
The data is basically in this form:
PORTFOLIO CFDATE CF
A 30/09/2017 300
A 31/10/2017 300
C 31/10/2017 300
B 30/09/2017 300
B 30/09/2017 300
C 30/09/2017 300
C 31/10/2017 300
C 31/10/2017 300
I would really appreciate some help on the matter.
You need groupby
+ sum
+ unstack
:
df = df.groupby(['CFDATE', 'PORTFOLIO'])['CF'].sum().unstack(fill_value=0)
print (df)
PORTFOLIO A B C
CFDATE
30/09/2017 300 600 300
31/10/2017 300 0 900
Or pivot_table
:
df = df.pivot_table(index='CFDATE',
columns='PORTFOLIO',
values='CF',
aggfunc=sum,
fill_value=0)
print (df)
PORTFOLIO A B C
CFDATE
30/09/2017 300 600 300
31/10/2017 300 0 900
You can simply do that with Pandas's pivot_table()
:
df.pivot_table(index='CFDATE', columns=['PORTFOLIO'], aggfunc=sum, fill_value=0)
The result is the following:
PORTFOLIO A B C
CFDATE
30/09/2017 300 600 300
31/10/2017 300 0 900
I think the best in your case would be to use a groupby
method like the following:
df.groupby(['PORTFOLIO', 'CFDATE']).sum()
CF
PORTFOLIO CFDATE
A 30/09/2017 600
31/10/2017 300
B 30/09/2017 600
C 30/09/2017 300
31/10/2017 900
Basically, once you have grouped your dataframe
df
, you can then perform various method on it (like sum()
, mean()
, min()
, max()
, etc)
Also, you cans store you grouped dataframe in an object like the following:
grouped = df.groupby(['PORTFOLIO', 'CFDATE'])
It makes it more flexible to perform different calculations afterward:
grouped.sum()
grouped.mean()
grouped.count()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.