简体   繁体   English

在其他列之间创建具有各种条件逻辑的新列

[英]Create new column with various conditional logic between other columns

I have the following dataset我有以下数据集

test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                    'account':['a','a','a','b','b','b','c','c','c','d','e'],
                    'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
                    'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
                    'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})

in which I would like to append a new column called denied_true based on the following parameters:我想在 append 中创建一个基于以下参数的名为denied_true的新列:

  1. while denied_sum is less than tot_chgs , return denieddenied_sum小于tot_chgs时,返回denied
  2. until the denied_sum exceeds tot_chgs , then compute the remaining difference between the sum of all prior denied_true less the tot_chgs直到denied_sum超过tot_chgs ,然后计算所有先前denied_true的总和减去tot_chgs之间的剩余差异
  3. and if denied ever equals tot_chgs at the first instance, just return denied and make remaining rows for the account 0如果denied曾经等于tot_chgs在第一个实例中,则返回denied并使帐户的剩余行为 0

The output should effectively look like this: output 实际上应该是这样的:

在此处输入图像描述

The dataframe for the output is: output 的 dataframe 是:

output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                    'account':['a','a','a','b','b','b','c','c','c','d','e'],
                    'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
                    'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
                    'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
                    'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})

So far, I have tried the following code using where, but it's missing the condition of subtract the previous denied_true value from the tot_chgs到目前为止,我已经使用 where 尝试了以下代码,但它缺少从 tot_chgs 中减去先前的 denied_true 值的条件

test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test

在此处输入图像描述

However, I'm not really sure how to append multiple conditions to this where function. Maybe I need if/elif loops, or a boolean mask.但是,我不太确定如何将 append 多个条件设置为 function。也许我需要 if/elif 循环,或者 boolean 掩码。 Any help would be greatly appreciated!任何帮助将不胜感激!

You can convert DataFrame into OrderedDict and to handle it this straightforward way:您可以将 DataFrame 转换为 OrderedDict 并以这种直接的方式处理它:

import pandas as pd
from collections import OrderedDict

test = pd.DataFrame({'date':      ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
                    'account':    ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
                    'tot_chg':    [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
                    'denied':     [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
                    'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
        
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)

# functions (samples)
def zero(dict, row):
    # if denied == denied_sum
    # change the dict...
    return dict['denied'][row]

def ex(dict, row):
    # if exceeds
    # change the dict...
    return 'exceed()'

def eq(dict, row):
    # if equals
    # change the dict...
    return 'equal()'

def get_value(dict, row):
    # conditions
    if dict['denied'][row]     == dict['denied_sum'][row]: return zero(dict, row)
    if dict['denied_sum'][row] <  dict['tot_chg'][row]:    return dict['denied'][row]
    if dict['denied_sum'][row] >  dict['tot_chg'][row]:    return ex(dict, row)
    if dict['denied_sum'][row] == dict['tot_chg'][row]:    return eq(dict, row)


# MAIN

# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]

# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}

# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))

# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)

Input:输入:

          date account  tot_chg  denied  denied_sum
0   2018-08-01       a     2072    1878        1878
1   2018-08-02       a     2072    1036        2914
2   2018-08-03       a     2072    1036        3950
3   2019-09-01       b      322     322         322
4   2019-09-02       b      322     161         483
5   2019-09-03       b      322     161         644
6   2020-01-02       c      483     150         150
7   2020-01-03       c      483     322         472
8   2020-01-04       c      483     322         794
9   2020-10-04       d      140     105         105
10  2020-10-05       e      570     570         570

Output: Output:

          date account  tot_chg  denied  denied_sum denied_true
0   2018-08-01       a     2072    1878        1878        1878
1   2018-08-02       a     2072    1036        2914    exceed()
2   2018-08-03       a     2072    1036        3950    exceed()
3   2019-09-01       b      322     322         322         322
4   2019-09-02       b      322     161         483    exceed()
5   2019-09-03       b      322     161         644    exceed()
6   2020-01-02       c      483     150         150         150
7   2020-01-03       c      483     322         472         322
8   2020-01-04       c      483     322         794    exceed()
9   2020-10-04       d      140     105         105         105
10  2020-10-05       e      570     570         570         570

I didn't make a full implementation of your logic in the functions since it's just a sample.我没有在函数中完全实现你的逻辑,因为它只是一个示例。

About the same (probably it would be a bit easer) can be done via DataFrame > JSON > DataFrame.大致相同(可能会更容易一些)可以通过 DataFrame > JSON > DataFrame 完成。


Update .更新 I've tried to implement the function ex() .我试图实现 function ex() Here is how it might look like.这是它的样子。

def ex(dict, row):
    # if exceeds
    denied_true_slice = denied_true_list[0:row] # <-- global list
    tot_chg_slice     = [dict['tot_chg'][r] for r in range(row)]
    denied_true_sum   = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
    value = tot_chg_slice[-1] - denied_true_sum
    return value if value > 0 else 0

I'm not quite sure if it works as supposed.我不太确定它是否按预期工作。 Since I'm not fully understand the quirky conditions.由于我不完全了解古怪的条件。 But I'm sure it looks rather ugly and cryptic and probably isn't in line with best Stackoverflow's examples.但我确信它看起来相当丑陋和神秘,并且可能与最佳 Stackoverflow 的示例不一致。

Now there is the global list, so, MAIN section now looks like this:现在有了全局列表,所以,MAIN 部分现在看起来像这样:

# MAIN

# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
    denied_true_list.append(get_value(od,row))

denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]

# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}

# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))

# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)

Output: Output:

          date account  tot_chg  denied  denied_sum  denied_true
0   2018-08-01       a     2072    1878        1878         1878
1   2018-08-02       a     2072    1036        2914          194
2   2018-08-03       a     2072    1036        3950            0
3   2019-09-01       b      322     322         322          322
4   2019-09-02       b      322     161         483            0
5   2019-09-03       b      322     161         644            0
6   2020-01-02       c      483     150         150          150
7   2020-01-03       c      483     322         472          322
8   2020-01-04       c      483     322         794            0
9   2020-10-04       d      140     105         105          105
10  2020-10-05       e      570     570         570          570

I believe it could be done much more pretty via native Pandas tools.我相信通过本机 Pandas 工具可以做得更漂亮。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM