简体   繁体   中英

Create new column with various conditional logic between other columns

I have the following dataset

test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                    'account':['a','a','a','b','b','b','c','c','c','d','e'],
                    'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
                    'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
                    'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})

in which I would like to append a new column called denied_true based on the following parameters:

  1. while denied_sum is less than tot_chgs , return denied
  2. until the denied_sum exceeds tot_chgs , then compute the remaining difference between the sum of all prior denied_true less the tot_chgs
  3. and if denied ever equals tot_chgs at the first instance, just return denied and make remaining rows for the account 0

The output should effectively look like this:

在此处输入图像描述

The dataframe for the output is:

output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                    'account':['a','a','a','b','b','b','c','c','c','d','e'],
                    'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
                    'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
                    'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
                    'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})

So far, I have tried the following code using where, but it's missing the condition of subtract the previous denied_true value from the tot_chgs

test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test

在此处输入图像描述

However, I'm not really sure how to append multiple conditions to this where function. Maybe I need if/elif loops, or a boolean mask. Any help would be greatly appreciated!

You can convert DataFrame into OrderedDict and to handle it this straightforward way:

import pandas as pd
from collections import OrderedDict

test = pd.DataFrame({'date':      ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
                    'account':    ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
                    'tot_chg':    [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
                    'denied':     [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
                    'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
        
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)

# functions (samples)
def zero(dict, row):
    # if denied == denied_sum
    # change the dict...
    return dict['denied'][row]

def ex(dict, row):
    # if exceeds
    # change the dict...
    return 'exceed()'

def eq(dict, row):
    # if equals
    # change the dict...
    return 'equal()'

def get_value(dict, row):
    # conditions
    if dict['denied'][row]     == dict['denied_sum'][row]: return zero(dict, row)
    if dict['denied_sum'][row] <  dict['tot_chg'][row]:    return dict['denied'][row]
    if dict['denied_sum'][row] >  dict['tot_chg'][row]:    return ex(dict, row)
    if dict['denied_sum'][row] == dict['tot_chg'][row]:    return eq(dict, row)


# MAIN

# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]

# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}

# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))

# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)

Input:

          date account  tot_chg  denied  denied_sum
0   2018-08-01       a     2072    1878        1878
1   2018-08-02       a     2072    1036        2914
2   2018-08-03       a     2072    1036        3950
3   2019-09-01       b      322     322         322
4   2019-09-02       b      322     161         483
5   2019-09-03       b      322     161         644
6   2020-01-02       c      483     150         150
7   2020-01-03       c      483     322         472
8   2020-01-04       c      483     322         794
9   2020-10-04       d      140     105         105
10  2020-10-05       e      570     570         570

Output:

          date account  tot_chg  denied  denied_sum denied_true
0   2018-08-01       a     2072    1878        1878        1878
1   2018-08-02       a     2072    1036        2914    exceed()
2   2018-08-03       a     2072    1036        3950    exceed()
3   2019-09-01       b      322     322         322         322
4   2019-09-02       b      322     161         483    exceed()
5   2019-09-03       b      322     161         644    exceed()
6   2020-01-02       c      483     150         150         150
7   2020-01-03       c      483     322         472         322
8   2020-01-04       c      483     322         794    exceed()
9   2020-10-04       d      140     105         105         105
10  2020-10-05       e      570     570         570         570

I didn't make a full implementation of your logic in the functions since it's just a sample.

About the same (probably it would be a bit easer) can be done via DataFrame > JSON > DataFrame.


Update . I've tried to implement the function ex() . Here is how it might look like.

def ex(dict, row):
    # if exceeds
    denied_true_slice = denied_true_list[0:row] # <-- global list
    tot_chg_slice     = [dict['tot_chg'][r] for r in range(row)]
    denied_true_sum   = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
    value = tot_chg_slice[-1] - denied_true_sum
    return value if value > 0 else 0

I'm not quite sure if it works as supposed. Since I'm not fully understand the quirky conditions. But I'm sure it looks rather ugly and cryptic and probably isn't in line with best Stackoverflow's examples.

Now there is the global list, so, MAIN section now looks like this:

# MAIN

# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
    denied_true_list.append(get_value(od,row))

denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]

# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}

# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))

# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)

Output:

          date account  tot_chg  denied  denied_sum  denied_true
0   2018-08-01       a     2072    1878        1878         1878
1   2018-08-02       a     2072    1036        2914          194
2   2018-08-03       a     2072    1036        3950            0
3   2019-09-01       b      322     322         322          322
4   2019-09-02       b      322     161         483            0
5   2019-09-03       b      322     161         644            0
6   2020-01-02       c      483     150         150          150
7   2020-01-03       c      483     322         472          322
8   2020-01-04       c      483     322         794            0
9   2020-10-04       d      140     105         105          105
10  2020-10-05       e      570     570         570          570

I believe it could be done much more pretty via native Pandas tools.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM