繁体   English   中英

通过多列条件拆分列值的 Pythonic 方法

[英]Pythonic way to split column values by multiple column conditions

我们有在线订单数据,其中包含订单级别的总运费,但我们的会计师需要将总运费分摊给多个供应商,并在行级别细分。

不是每个产品都有运费,有些产品有促销免费送货,这需要在拆分运费时考虑(当商品不发货或免费送货时,不要考虑运费)。

我创建了这个测试,它产生了预期的结果(对另一位同事进行工作的大道具),但想了解是否有更有效的(pythonic!)方法来实现这一点。

这以前是通过 odbc 连接到 sql 数据库完成的,并使用 excel 公式处理。

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': [10, 11, 11, 11, 12, 12, 13, 14, 15, 15],
                   'shipping': [5, 5, 5, 5, 5, 5, 0, 0, 5, 5],
                   'shipstatus': [True, True, True, False,
                   True, True, False, True, True, True],
                   'freeship': [False, True, False, False,
                   False, False, False, True, False, False]})

df['a'] = df.groupby(['id','shipstatus','freeship'])['shipping'].transform('count')
# the final step of the excel code is counting (grouping) by id and shipstatus, 
# so we group those here. we also group by freeship so that the count of id/shipstatus
# won't be included when freeship is true (which we zero out later)

df['b'] = df['a'] * (df['freeship']==False) 
# if freeship is true, second piece evaluates to false, whole thing evaluates to zero

df['c'] = df['shipping']/df['b'] 
# this will give you inf where we set stuff to zero above. 
# you'll also get NaN when 'shipping' is zero

df['LineShipping'] = df['shipstatus'] * (df['freeship']==False) * df['c']
# sets the whole thing to zero if freeship is true or shipstatus is false, 
# otherwise multiplies our # previous result by 1 and so no change

df = df.fillna(0) 
# sets all the NaN to zero

df = df.drop(columns=['a','b','c']) 
# saves the dataframe but with the temp columns dropped

print(df)

预期输出

所以基本上你想在具有shipstatus==Truefreeship==False行之间均匀地分配一个id内的shipping shipstatus==Falsefreeship==TrueLineShipping==0总是。

因此,您可以计算/划分您的条件。 这样,您就不会收到除以零的警告:

counts = (df[df['shipstatus'] & ~df['freeship']]      # only count when shipstatus == True and freeship == False
             .groupby(['id'])                         # no need to groupby shipstatus
             ['shipping'].transform('size')           # size or count
          )

# only divide where `shipstatus==True` and `freeship==False`, else is `NaN`
# then fillna with 0
df['LineShipping'] = df['shipping'].div(counts).fillna(0)

输出:

   id  shipping  shipstatus  freeship  LineShipping
0  10         5        True     False           5.0
1  11         5        True      True           0.0
2  11         5        True     False           5.0
3  11         5       False     False           0.0
4  12         5        True     False           2.5
5  12         5        True     False           2.5
6  13         0       False     False           0.0
7  14         0        True      True           0.0
8  15         5        True     False           2.5
9  15         5        True     False           2.5

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM