I have the following dataset
test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})
in which I would like to append a new column called denied_true
based on the following parameters:
denied_sum
is less than tot_chgs
, return denied
denied_sum
exceeds tot_chgs
, then compute the remaining difference between the sum of all prior denied_true
less the tot_chgs
denied
ever equals tot_chgs
at the first instance, just return denied
and make remaining rows for the account 0The output should effectively look like this:
The dataframe for the output is:
output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})
So far, I have tried the following code using where, but it's missing the condition of subtract the previous denied_true value from the tot_chgs
test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test
However, I'm not really sure how to append multiple conditions to this where function. Maybe I need if/elif loops, or a boolean mask. Any help would be greatly appreciated!
You can convert DataFrame into OrderedDict and to handle it this straightforward way:
import pandas as pd
from collections import OrderedDict
test = pd.DataFrame({'date': ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
'account': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
'tot_chg': [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
'denied': [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)
# functions (samples)
def zero(dict, row):
# if denied == denied_sum
# change the dict...
return dict['denied'][row]
def ex(dict, row):
# if exceeds
# change the dict...
return 'exceed()'
def eq(dict, row):
# if equals
# change the dict...
return 'equal()'
def get_value(dict, row):
# conditions
if dict['denied'][row] == dict['denied_sum'][row]: return zero(dict, row)
if dict['denied_sum'][row] < dict['tot_chg'][row]: return dict['denied'][row]
if dict['denied_sum'][row] > dict['tot_chg'][row]: return ex(dict, row)
if dict['denied_sum'][row] == dict['tot_chg'][row]: return eq(dict, row)
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
Input:
date account tot_chg denied denied_sum
0 2018-08-01 a 2072 1878 1878
1 2018-08-02 a 2072 1036 2914
2 2018-08-03 a 2072 1036 3950
3 2019-09-01 b 322 322 322
4 2019-09-02 b 322 161 483
5 2019-09-03 b 322 161 644
6 2020-01-02 c 483 150 150
7 2020-01-03 c 483 322 472
8 2020-01-04 c 483 322 794
9 2020-10-04 d 140 105 105
10 2020-10-05 e 570 570 570
Output:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 exceed()
2 2018-08-03 a 2072 1036 3950 exceed()
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 exceed()
5 2019-09-03 b 322 161 644 exceed()
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 exceed()
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
I didn't make a full implementation of your logic in the functions since it's just a sample.
About the same (probably it would be a bit easer) can be done via DataFrame > JSON > DataFrame.
Update . I've tried to implement the function ex()
. Here is how it might look like.
def ex(dict, row):
# if exceeds
denied_true_slice = denied_true_list[0:row] # <-- global list
tot_chg_slice = [dict['tot_chg'][r] for r in range(row)]
denied_true_sum = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
value = tot_chg_slice[-1] - denied_true_sum
return value if value > 0 else 0
I'm not quite sure if it works as supposed. Since I'm not fully understand the quirky conditions. But I'm sure it looks rather ugly and cryptic and probably isn't in line with best Stackoverflow's examples.
Now there is the global list, so, MAIN section now looks like this:
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
denied_true_list.append(get_value(od,row))
denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
Output:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 194
2 2018-08-03 a 2072 1036 3950 0
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 0
5 2019-09-03 b 322 161 644 0
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 0
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
I believe it could be done much more pretty via native Pandas tools.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.