简体   繁体   中英

Conditional Subtraction of rows in Python Pandas Dataframe

I am trying to solve a problem at hand as explained. I have a Dataframe as shown below:

Date    Item    Type    Qty Price
1/1/18  Orange  Add     100 25
5/1/18  Orange  Add     20  40
8/1/18  Orange  Add     40  20
18/1/18 Orange  Add     10  35
27/2/18 Orange  Sub     100 55
15/4/18 Orange  Sub     30  45

and I want to get the intermediate Dataframe like below:

Date    Item    Type    Qty Price   Diff
1/1/18  Orange  Add     0   25      30
5/1/18  Orange  Add     0   40      5
8/1/18  Orange  Add     30  20      25
18/1/18 Orange  Add     10  35

and then the final Dataframe I want it like this below:

Date    Item    Type    Qty Price
8/1/18  Orange  Add     30  20
18/1/18 Orange  Add     10  35

NOTE: Diff is a difference of Sub and Add Price. And Qty is also updated with Qty of Sub subtracted from Qty of Add.

Could anyone of you please help with the way it can be achieved. I was trying with groupby, apply and transform but till now I have not got this.

I have below code, still in development and not complete:

def FruitSummary():
    df = pd.DataFrame([
               ['01/1/18',   'Orange',   'Add',  100,    25],
               ['05/1/18',   'Orange',   'Add',   20,    40],
               ['08/1/18',   'Orange',   'Add',   40,    20],
               ['18/1/18',   'Orange',   'Add',   10,    35],
               ['27/2/18',   'Orange',   'Sub',  100,    55],
               ['15/4/18',   'Orange',   'Sub',   30,    45],
               ['02/1/18',   'Banana',   'Add',  110,     7],
               ['04/1/18',   'Banana',   'Add',   20,     9],
               ['11/1/18',   'Banana',   'Add',   40,     4],
               ['10/2/18',   'Banana',   'Add',   10,     3],
               ['15/3/18',   'Banana',   'Sub',  100,     9],
               ['15/4/18',   'Banana',   'Sub',   50,     8],
               ['10/3/18',   'Kiwi',     'Add',   80,    29],
               ['12/3/18',   'Berry',    'Add',   25,     5],
               ['18/4/18',   'Berry',    'Add',   15,     8]],
       columns=['Date',      'Item',     'Type', 'Qty',  'Price'])

    print(df)

    def fruit_stat(dfIN):
        print(dfIN)
        print((dfIN['Type'] == 'Sub').unique(), (dfIN['Type'] == 'ODD').unique())

        if len(dfIN) > 1 and (True in (dfIN['Type'] == 'Sub').unique()):
            print(dfIN['Item'].iloc[1], "'len > 1'", "'Sub True'")

dfFS = df.groupby(['Item']).apply(fruit_stat)
print(dfFS)

I am able to find some solution, not sure if it is optimal or there might be better solution for the same.

df = pd.DataFrame([['01/1/18',   'Orange',   'Add',  100,    25],
                   ['05/1/18',   'Orange',   'Add',   20,    40],
                   ['08/1/18',   'Orange',   'Add',   40,    20],
                   ['18/1/18',   'Orange',   'Add',   10,    35],
                   ['27/2/18',   'Orange',   'Sub',  100,    55],
                   ['15/4/18',   'Orange',   'Sub',   30,    45],
                   ['02/1/18',   'Banana',   'Add',  110,     7],
                   ['04/1/18',   'Banana',   'Add',   20,     9],
                   ['11/1/18',   'Banana',   'Add',   40,     4],
                   ['10/2/18',   'Banana',   'Add',   10,     3],
                   ['15/3/18',   'Banana',   'Sub',  100,     9],
                   ['15/4/18',   'Banana',   'Sub',   50,     8],
                   ['10/3/18',   'Kiwi',     'Add',   80,    29],
                   ['12/3/18',   'Berry',    'Add',   25,     5],
                   ['18/4/18',   'Berry',    'Add',   15,     8],
                   ['16/3/18',   'Cherry',   'Add',   25,     5],
                   ['21/4/18',   'Cherry',   'Sub',   25,     8],
                   ['19/3/18',   'Grapes',   'Add',   25,     5],
                   ['23/4/18',   'Grapes',   'Sub',   15,     8]],
          columns=['Date',      'Item',     'Type', 'Qty',  'Price'])


def FruitSummary(df):
    df['CumSum'] = df.groupby(['Item', 'Type'])['Qty'].cumsum()
    print(df)

    def fruit_stat(dfg):
        if dfg[dfg['Type'] == 'Sub']['Qty'].count():
            subT = dfg[dfg['Type'] == 'Sub']['CumSum'].iloc[-1]
            dfg['Qty'] = np.where((dfg['CumSum'] - subT) <= 0, 0, dfg['Qty'])
            dfg = dfg[dfg['Qty'] > 0]
            if(len(dfg) > 0):
                dfg['Qty'].iloc[0] = dfg['CumSum'].iloc[0] - subT

        return dfg

    dfFS = df.groupby(['Item'], as_index=False).apply(fruit_stat).drop(['CumSum'], axis=1).reset_index(drop=True)
    print(dfFS)

And the above code produces the answer like this below:

      Date    Item Type  Qty  Price
0  11/1/18  Banana  Add   20      4
1  10/2/18  Banana  Add   10      3
2  12/3/18   Berry  Add   25      5
3  18/4/18   Berry  Add   15      8
4  19/3/18  Grapes  Add   10      5
5  10/3/18    Kiwi  Add   80     29
6  08/1/18  Orange  Add   30     20
7  18/1/18  Orange  Add   10     35

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM