简体   繁体   English

通过多列条件拆分列值的 Pythonic 方法

[英]Pythonic way to split column values by multiple column conditions

We have online order data that has total shipping charges at the order level, but our accountants need to split the total shipping charge across several vendors, which is broken out at the line level.我们有在线订单数据,其中包含订单级别的总运费,但我们的会计师需要将总运费分摊给多个供应商,并在行级别细分。

Not every product has shipping and some products have promotional free shipping, which needs to be accounted for when splitting shipping charges (don't account for shipping charges when items aren't shipped or shipping is free).不是每个产品都有运费,有些产品有促销免费送货,这需要在拆分运费时考虑(当商品不发货或免费送货时,不要考虑运费)。

I've created this test and it produces the expected outcome (big props to another coworker for getting something working), but want to understand if there is a more efficient (pythonic!) way to accomplish this.我创建了这个测试,它产生了预期的结果(对另一位同事进行工作的大道具),但想了解是否有更有效的(pythonic!)方法来实现这一点。

This was previously done via odbc connection to a sql database and handled with an excel formula.这以前是通过 odbc 连接到 sql 数据库完成的,并使用 excel 公式处理。

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': [10, 11, 11, 11, 12, 12, 13, 14, 15, 15],
                   'shipping': [5, 5, 5, 5, 5, 5, 0, 0, 5, 5],
                   'shipstatus': [True, True, True, False,
                   True, True, False, True, True, True],
                   'freeship': [False, True, False, False,
                   False, False, False, True, False, False]})

df['a'] = df.groupby(['id','shipstatus','freeship'])['shipping'].transform('count')
# the final step of the excel code is counting (grouping) by id and shipstatus, 
# so we group those here. we also group by freeship so that the count of id/shipstatus
# won't be included when freeship is true (which we zero out later)

df['b'] = df['a'] * (df['freeship']==False) 
# if freeship is true, second piece evaluates to false, whole thing evaluates to zero

df['c'] = df['shipping']/df['b'] 
# this will give you inf where we set stuff to zero above. 
# you'll also get NaN when 'shipping' is zero

df['LineShipping'] = df['shipstatus'] * (df['freeship']==False) * df['c']
# sets the whole thing to zero if freeship is true or shipstatus is false, 
# otherwise multiplies our # previous result by 1 and so no change

df = df.fillna(0) 
# sets all the NaN to zero

df = df.drop(columns=['a','b','c']) 
# saves the dataframe but with the temp columns dropped

print(df)

预期输出

So basically you want to distribute the shipping within an id evenly between rows with shipstatus==True and freeship==False .所以基本上你想在具有shipstatus==Truefreeship==False行之间均匀地分配一个id内的shipping When shipstatus==False or freeship==True , LineShipping==0 always.shipstatus==Falsefreeship==TrueLineShipping==0总是。

Therefore, you can just count/divide where your condition holds.因此,您可以计算/划分您的条件。 That way, you don't get warning from division by zeros:这样,您就不会收到除以零的警告:

counts = (df[df['shipstatus'] & ~df['freeship']]      # only count when shipstatus == True and freeship == False
             .groupby(['id'])                         # no need to groupby shipstatus
             ['shipping'].transform('size')           # size or count
          )

# only divide where `shipstatus==True` and `freeship==False`, else is `NaN`
# then fillna with 0
df['LineShipping'] = df['shipping'].div(counts).fillna(0)

Output:输出:

   id  shipping  shipstatus  freeship  LineShipping
0  10         5        True     False           5.0
1  11         5        True      True           0.0
2  11         5        True     False           5.0
3  11         5       False     False           0.0
4  12         5        True     False           2.5
5  12         5        True     False           2.5
6  13         0       False     False           0.0
7  14         0        True      True           0.0
8  15         5        True     False           2.5
9  15         5        True     False           2.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM