通过多列条件拆分列值的 Pythonic 方法

Question

我们有在线订单数据，其中包含订单级别的总运费，但我们的会计师需要将总运费分摊给多个供应商，并在行级别细分。

不是每个产品都有运费，有些产品有促销免费送货，这需要在拆分运费时考虑（当商品不发货或免费送货时，不要考虑运费）。

我创建了这个测试，它产生了预期的结果（对另一位同事进行工作的大道具），但想了解是否有更有效的（pythonic！）方法来实现这一点。

这以前是通过 odbc 连接到 sql 数据库完成的，并使用 excel 公式处理。

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': [10, 11, 11, 11, 12, 12, 13, 14, 15, 15],
                   'shipping': [5, 5, 5, 5, 5, 5, 0, 0, 5, 5],
                   'shipstatus': [True, True, True, False,
                   True, True, False, True, True, True],
                   'freeship': [False, True, False, False,
                   False, False, False, True, False, False]})

df['a'] = df.groupby(['id','shipstatus','freeship'])['shipping'].transform('count')
# the final step of the excel code is counting (grouping) by id and shipstatus, 
# so we group those here. we also group by freeship so that the count of id/shipstatus
# won't be included when freeship is true (which we zero out later)

df['b'] = df['a'] * (df['freeship']==False) 
# if freeship is true, second piece evaluates to false, whole thing evaluates to zero

df['c'] = df['shipping']/df['b'] 
# this will give you inf where we set stuff to zero above. 
# you'll also get NaN when 'shipping' is zero

df['LineShipping'] = df['shipstatus'] * (df['freeship']==False) * df['c']
# sets the whole thing to zero if freeship is true or shipstatus is false, 
# otherwise multiplies our # previous result by 1 and so no change

df = df.fillna(0) 
# sets all the NaN to zero

df = df.drop(columns=['a','b','c']) 
# saves the dataframe but with the temp columns dropped

print(df)

Answer 1

所以基本上你想在具有shipstatus==True和freeship==False行之间均匀地分配一个id内的shipping 。 当shipstatus==False或freeship==True ， LineShipping==0总是。

因此，您可以计算/划分您的条件。 这样，您就不会收到除以零的警告：

counts = (df[df['shipstatus'] & ~df['freeship']]      # only count when shipstatus == True and freeship == False
             .groupby(['id'])                         # no need to groupby shipstatus
             ['shipping'].transform('size')           # size or count
          )

# only divide where `shipstatus==True` and `freeship==False`, else is `NaN`
# then fillna with 0
df['LineShipping'] = df['shipping'].div(counts).fillna(0)

输出：

   id  shipping  shipstatus  freeship  LineShipping
0  10         5        True     False           5.0
1  11         5        True      True           0.0
2  11         5        True     False           5.0
3  11         5       False     False           0.0
4  12         5        True     False           2.5
5  12         5        True     False           2.5
6  13         0       False     False           0.0
7  14         0        True      True           0.0
8  15         5        True     False           2.5
9  15         5        True     False           2.5

通过多列条件拆分列值的 Pythonic 方法

问题描述

1 个解决方案

解决方案1
3 2021-11-05 02:23:03

通过多列条件拆分列值的 Pythonic 方法

问题描述

1 个解决方案

解决方案1 3 2021-11-05 02:23:03

解决方案1
3 2021-11-05 02:23:03