[英]Create new column with various conditional logic between other columns
我有以下数据集
test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})
我想在 append 中创建一个基于以下参数的名为denied_true
的新列:
denied_sum
小于tot_chgs
时,返回denied
denied_sum
超过tot_chgs
,然后计算所有先前denied_true
的总和减去tot_chgs
之间的剩余差异denied
曾经等于tot_chgs
在第一个实例中,则返回denied
并使帐户的剩余行为 0output 实际上应该是这样的:
output 的 dataframe 是:
output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})
到目前为止,我已经使用 where 尝试了以下代码,但它缺少从 tot_chgs 中减去先前的 denied_true 值的条件
test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test
但是,我不太确定如何将 append 多个条件设置为 function。也许我需要 if/elif 循环,或者 boolean 掩码。 任何帮助将不胜感激!
您可以将 DataFrame 转换为 OrderedDict 并以这种直接的方式处理它:
import pandas as pd
from collections import OrderedDict
test = pd.DataFrame({'date': ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
'account': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
'tot_chg': [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
'denied': [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)
# functions (samples)
def zero(dict, row):
# if denied == denied_sum
# change the dict...
return dict['denied'][row]
def ex(dict, row):
# if exceeds
# change the dict...
return 'exceed()'
def eq(dict, row):
# if equals
# change the dict...
return 'equal()'
def get_value(dict, row):
# conditions
if dict['denied'][row] == dict['denied_sum'][row]: return zero(dict, row)
if dict['denied_sum'][row] < dict['tot_chg'][row]: return dict['denied'][row]
if dict['denied_sum'][row] > dict['tot_chg'][row]: return ex(dict, row)
if dict['denied_sum'][row] == dict['tot_chg'][row]: return eq(dict, row)
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
输入:
date account tot_chg denied denied_sum
0 2018-08-01 a 2072 1878 1878
1 2018-08-02 a 2072 1036 2914
2 2018-08-03 a 2072 1036 3950
3 2019-09-01 b 322 322 322
4 2019-09-02 b 322 161 483
5 2019-09-03 b 322 161 644
6 2020-01-02 c 483 150 150
7 2020-01-03 c 483 322 472
8 2020-01-04 c 483 322 794
9 2020-10-04 d 140 105 105
10 2020-10-05 e 570 570 570
Output:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 exceed()
2 2018-08-03 a 2072 1036 3950 exceed()
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 exceed()
5 2019-09-03 b 322 161 644 exceed()
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 exceed()
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
我没有在函数中完全实现你的逻辑,因为它只是一个示例。
大致相同(可能会更容易一些)可以通过 DataFrame > JSON > DataFrame 完成。
更新。 我试图实现 function ex()
。 这是它的样子。
def ex(dict, row):
# if exceeds
denied_true_slice = denied_true_list[0:row] # <-- global list
tot_chg_slice = [dict['tot_chg'][r] for r in range(row)]
denied_true_sum = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
value = tot_chg_slice[-1] - denied_true_sum
return value if value > 0 else 0
我不太确定它是否按预期工作。 由于我不完全了解古怪的条件。 但我确信它看起来相当丑陋和神秘,并且可能与最佳 Stackoverflow 的示例不一致。
现在有了全局列表,所以,MAIN 部分现在看起来像这样:
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
denied_true_list.append(get_value(od,row))
denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
Output:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 194
2 2018-08-03 a 2072 1036 3950 0
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 0
5 2019-09-03 b 322 161 644 0
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 0
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
我相信通过本机 Pandas 工具可以做得更漂亮。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.