简体   繁体   English

MemoryError pandas根据其他列值生成新列

[英]MemoryError pandas generate new column based on other column value

I have a data frame. 我有一个数据框。 structure as following: 结构如下:

OMT                              object
ZIPCODE                          object
PRODUCT_CAT                       int64
SERVICE_CATEGORY                 object
CURRENT_STANDARD_EDD            float64
TOTAL                             int64
DESTINATION_DISTRIBUTION_CTR     object
OPS_EDD                         float64
OPS_EDD_achieve                   int64
suggest_edd_1                    object
suggest_edd_2                     int64
suggest_edd_value_1               int64
suggest_edd_value_2               int64
final_edd_group                  object
final_edd                       float64
final_edd_value                   int64

I want to perform the following operation: when the total is < 5, return the label value of among D1/D2/D3/D4/D5/D6 where the first value over -1 compared with D6.(if none, D6) 我想执行以下操作:当total <5时,返回D1 / D2 / D3 / D4 / D5 / D6中的标签值,其中第一个值超过-1与D6相比。(如果没有,D6)

if total is >=5, return the label value of among D1/D2/D3/D4/D5/D6 where the first value / d5 over 0.95 compared with D6.(if none, D6) 如果total> = 5,则返回D1 / D2 / D3 / D4 / D5 / D6中的标签值,其中第一个值/ d5超过0.95,与D6相比。(如果没有,D6)

I wrote following code but return 我写了下面的代码,但返回

 training_group['suggest_edd_1'] =np.where(training_group['TOTAL']>5,training_group[['D1','D2',
                'D3','D4','D5',
                'D6']].sub(training_group['D6'],axis =0).ge(-1).assign(D6=True).idxmax(1).str.extract('(\d+)'),
                 training_group[['D1','D2',
                'D3','D4','D5',
                'D6']].div(training_group['TOTAL'],axis =0).ge(0.95).assign(D6=True).idxmax(1).str.extract('(\d+)')) 

<ipython-input-72-61626eae2be9> in <module>
      4                  training_group[['D1','D2',
      5                 'D3','D4','D5',
----> 6                 'D6']].div(training_group['TOTAL'],axis =0).ge(OD_pari_target).assign(D6=True).idxmax(1).str.extract('(\d+)')) 

MemoryError: 

(Each separate one works, but if I apply condition on TOTAL it does not work. (每个单独的一个工作,但如果我在TOTAL上应用条件它不起作用。

I tried to use the lambda function applies to each row, but I could not find the approriate code to replace 我试图使用lambda函数适用于每一行,但我找不到要替换的approriate代码

assign(D6=True) and extract function assign(D6=True)extract function

    if x['TOTAL'] < piece_threthold:
        return x[['D1','D2',
                'D3','D4','D5',
                'D6']].sub(x['D6'],axis =0).ge(OD_pari_piece).ge(-1).idxmax(1)
    else:
        return x[['D1','D2',
                'D3','D4','D5',
                'D6']].div(x['TOTAL'],axis =0).ge(OD_pari_target).ge(-1).idxmax(1)

I can get the result I need by doing the following. 通过执行以下操作,我可以获得所需的结果。 however, I feel it is very inefficient and create more columns which I do not need. 但是,我觉得这是非常低效的,并创建了更多我不需要的列。 (I will drop the suggest_edd_1 and suggest_edd_2 later since I only need the final_suggest) (我将删除suggest_edd_1和suggest_edd_2,因为我只需要final_suggest)

training_group['suggest_edd_1'] = training_group[['D1','D2',
                'D3','D4','D5',
                'D6']].sub(training_group['D6'],axis =0).ge(OD_pari_piece).assign(D6=True).idxmax(1).str.extract('(\d+)')

training_group['suggest_edd_2'] = training_group[['D1','D2',
                'D3','D4','D5',
                'D6']].div(training_group['TOTAL'],axis =0).ge(OD_pari_target).assign(D6=True).idxmax(1).str.extract('(\d+)')

training_group['final_suggest'] = np.where(training_group['TOTAL']>5,training_group['suggest_edd_1'] ,training_group['suggest_edd_2'])

As you side each of them work fine pre-calculated the value to assign 当你身边的每一个工作正常时,预先计算要分配的值

s1=training_group[['D1','D2',
                'D3','D4','D5',
                'D6']].sub(training_group['D6'],axis =0).ge(-1).assign(D6=True).idxmax(1).str.extract('(\d+)')
s2=training_group[['D1','D2',
                'D3','D4','D5',
                'D6']].div(training_group['TOTAL'],axis =0).ge(0.95).assign(D6=True).idxmax(1).str.extract('(\d+)')

training_group['suggest_edd_1'] =np.where(training_group['TOTAL']>5,s1,
                 s2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM