![](/img/trans.png)
[英]a more pythonic way to split a column in multiple columns and sum two of them
[英]Pythonic way to split column values by multiple column conditions
我们有在线订单数据,其中包含订单级别的总运费,但我们的会计师需要将总运费分摊给多个供应商,并在行级别细分。
不是每个产品都有运费,有些产品有促销免费送货,这需要在拆分运费时考虑(当商品不发货或免费送货时,不要考虑运费)。
我创建了这个测试,它产生了预期的结果(对另一位同事进行工作的大道具),但想了解是否有更有效的(pythonic!)方法来实现这一点。
这以前是通过 odbc 连接到 sql 数据库完成的,并使用 excel 公式处理。
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': [10, 11, 11, 11, 12, 12, 13, 14, 15, 15],
'shipping': [5, 5, 5, 5, 5, 5, 0, 0, 5, 5],
'shipstatus': [True, True, True, False,
True, True, False, True, True, True],
'freeship': [False, True, False, False,
False, False, False, True, False, False]})
df['a'] = df.groupby(['id','shipstatus','freeship'])['shipping'].transform('count')
# the final step of the excel code is counting (grouping) by id and shipstatus,
# so we group those here. we also group by freeship so that the count of id/shipstatus
# won't be included when freeship is true (which we zero out later)
df['b'] = df['a'] * (df['freeship']==False)
# if freeship is true, second piece evaluates to false, whole thing evaluates to zero
df['c'] = df['shipping']/df['b']
# this will give you inf where we set stuff to zero above.
# you'll also get NaN when 'shipping' is zero
df['LineShipping'] = df['shipstatus'] * (df['freeship']==False) * df['c']
# sets the whole thing to zero if freeship is true or shipstatus is false,
# otherwise multiplies our # previous result by 1 and so no change
df = df.fillna(0)
# sets all the NaN to zero
df = df.drop(columns=['a','b','c'])
# saves the dataframe but with the temp columns dropped
print(df)
所以基本上你想在具有shipstatus==True
和freeship==False
行之间均匀地分配一个id
内的shipping
。 当shipstatus==False
或freeship==True
, LineShipping==0
总是。
因此,您可以计算/划分您的条件。 这样,您就不会收到除以零的警告:
counts = (df[df['shipstatus'] & ~df['freeship']] # only count when shipstatus == True and freeship == False
.groupby(['id']) # no need to groupby shipstatus
['shipping'].transform('size') # size or count
)
# only divide where `shipstatus==True` and `freeship==False`, else is `NaN`
# then fillna with 0
df['LineShipping'] = df['shipping'].div(counts).fillna(0)
输出:
id shipping shipstatus freeship LineShipping
0 10 5 True False 5.0
1 11 5 True True 0.0
2 11 5 True False 5.0
3 11 5 False False 0.0
4 12 5 True False 2.5
5 12 5 True False 2.5
6 13 0 False False 0.0
7 14 0 True True 0.0
8 15 5 True False 2.5
9 15 5 True False 2.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.