[英]Create new column with various conditional logic between other columns
我有以下數據集
test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})
我想在 append 中創建一個基於以下參數的名為denied_true
的新列:
denied_sum
小於tot_chgs
時,返回denied
denied_sum
超過tot_chgs
,然后計算所有先前denied_true
的總和減去tot_chgs
之間的剩余差異denied
曾經等於tot_chgs
在第一個實例中,則返回denied
並使帳戶的剩余行為 0output 實際上應該是這樣的:
output 的 dataframe 是:
output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})
到目前為止,我已經使用 where 嘗試了以下代碼,但它缺少從 tot_chgs 中減去先前的 denied_true 值的條件
test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test
但是,我不太確定如何將 append 多個條件設置為 function。也許我需要 if/elif 循環,或者 boolean 掩碼。 任何幫助將不勝感激!
您可以將 DataFrame 轉換為 OrderedDict 並以這種直接的方式處理它:
import pandas as pd
from collections import OrderedDict
test = pd.DataFrame({'date': ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
'account': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
'tot_chg': [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
'denied': [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)
# functions (samples)
def zero(dict, row):
# if denied == denied_sum
# change the dict...
return dict['denied'][row]
def ex(dict, row):
# if exceeds
# change the dict...
return 'exceed()'
def eq(dict, row):
# if equals
# change the dict...
return 'equal()'
def get_value(dict, row):
# conditions
if dict['denied'][row] == dict['denied_sum'][row]: return zero(dict, row)
if dict['denied_sum'][row] < dict['tot_chg'][row]: return dict['denied'][row]
if dict['denied_sum'][row] > dict['tot_chg'][row]: return ex(dict, row)
if dict['denied_sum'][row] == dict['tot_chg'][row]: return eq(dict, row)
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
輸入:
date account tot_chg denied denied_sum
0 2018-08-01 a 2072 1878 1878
1 2018-08-02 a 2072 1036 2914
2 2018-08-03 a 2072 1036 3950
3 2019-09-01 b 322 322 322
4 2019-09-02 b 322 161 483
5 2019-09-03 b 322 161 644
6 2020-01-02 c 483 150 150
7 2020-01-03 c 483 322 472
8 2020-01-04 c 483 322 794
9 2020-10-04 d 140 105 105
10 2020-10-05 e 570 570 570
Output:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 exceed()
2 2018-08-03 a 2072 1036 3950 exceed()
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 exceed()
5 2019-09-03 b 322 161 644 exceed()
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 exceed()
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
我沒有在函數中完全實現你的邏輯,因為它只是一個示例。
大致相同(可能會更容易一些)可以通過 DataFrame > JSON > DataFrame 完成。
更新。 我試圖實現 function ex()
。 這是它的樣子。
def ex(dict, row):
# if exceeds
denied_true_slice = denied_true_list[0:row] # <-- global list
tot_chg_slice = [dict['tot_chg'][r] for r in range(row)]
denied_true_sum = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
value = tot_chg_slice[-1] - denied_true_sum
return value if value > 0 else 0
我不太確定它是否按預期工作。 由於我不完全了解古怪的條件。 但我確信它看起來相當丑陋和神秘,並且可能與最佳 Stackoverflow 的示例不一致。
現在有了全局列表,所以,MAIN 部分現在看起來像這樣:
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
denied_true_list.append(get_value(od,row))
denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
Output:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 194
2 2018-08-03 a 2072 1036 3950 0
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 0
5 2019-09-03 b 322 161 644 0
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 0
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
我相信通過本機 Pandas 工具可以做得更漂亮。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.