简体   繁体   中英

Can someone help me create SUMIFS function equivalent on Python?

I basically picked up Python last week, and although I am currently learning the basics, I've been tasked with building a small program in python at work. And would appreciate some help on this. I would like to create a SUMIFS function similar to the excel version. My data contains a cash flow date (CFDATE), portfolio name (PORTFOLIO) and cash flow amount (CF). I want tot sum the CF based on which portfolio it belongs to and based on the date on which it falls. I have managed to achieve this using the code below, however I am struggling to output my results as an array/table where the header row comprises of all the portfolios, and the initial column is a list of the dates (duplicates removed) and the CF are grouped according to each combination of (CFDATE,PORTFOLIO).

eg of desired output: PORTFOLIO-> 'A' 'B' 'C' CFDATE

'30/09/2017' 300 600 300 '31/10/2017' 300 0 600

code used so far:

from pandas import Series,DataFrame
from numpy import matrix
import numpy as np
import pandas as pd

df = DataFrame(pd.read_csv("...\Test.csv"))
portfolioMapping = sorted(list(set(df.PORTFOLIO)))
cfDateMapping = list(set(df.CFDATE))


for i in range(0,len(portfolioMapping)):
    dfVar = df.CF * np.where(df.PORTFOLIO == portfolioMapping[i] , 1, 0)
    for j in range(0,len(cfDateMapping)):
        dfVar1 = df.CF/df.CF * np.where(df.CFDATE == cfDateMapping[j] , 1, 0)
        print([portfolioMapping[i],[cfDateMapping[j]],sum(dfVar*dfVar1)])

The data is basically in this form:

PORTFOLIO   CFDATE  CF
A   30/09/2017  300
A   31/10/2017  300
C   31/10/2017  300
B   30/09/2017  300
B   30/09/2017  300
C   30/09/2017  300
C   31/10/2017  300
C   31/10/2017  300

I would really appreciate some help on the matter.

You need groupby + sum + unstack :

df = df.groupby(['CFDATE', 'PORTFOLIO'])['CF'].sum().unstack(fill_value=0)
print (df)
PORTFOLIO     A    B    C
CFDATE                   
30/09/2017  300  600  300
31/10/2017  300    0  900

Or pivot_table :

df = df.pivot_table(index='CFDATE', 
                    columns='PORTFOLIO', 
                    values='CF', 
                    aggfunc=sum, 
                    fill_value=0)
print (df)
PORTFOLIO     A    B    C
CFDATE                   
30/09/2017  300  600  300
31/10/2017  300    0  900

You can simply do that with Pandas's pivot_table() :

df.pivot_table(index='CFDATE', columns=['PORTFOLIO'], aggfunc=sum, fill_value=0)

The result is the following:

PORTFOLIO   A   B   C
CFDATE          
30/09/2017  300 600 300
31/10/2017  300 0   900

I think the best in your case would be to use a groupby method like the following:

df.groupby(['PORTFOLIO', 'CFDATE']).sum()

                      CF
PORTFOLIO CFDATE         
A         30/09/2017  600
          31/10/2017  300
B         30/09/2017  600
C         30/09/2017  300
          31/10/2017  900

Basically, once you have grouped your dataframe df , you can then perform various method on it (like sum() , mean() , min() , max() , etc)

Also, you cans store you grouped dataframe in an object like the following:

grouped = df.groupby(['PORTFOLIO', 'CFDATE'])

It makes it more flexible to perform different calculations afterward:

grouped.sum()
grouped.mean()
grouped.count()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM