如何告诉 Python 中的 lambda function 返回 None 而不是抛出键值错误

Question

我有 3 个 pandas 数据帧，它们描述了交易的不同方面。 我正在合并这些以“完成交易”。 本质上，我需要将正确的 SKU 定价和客户/sku 折扣添加到 df_sales_volume 中的正确交易中，以便将财务信息添加到交易中。

当 df_customer_discounts 中存在 Customer 和 SKU 的组合而 df_sales_volume 中不存在时，我如何告诉我的脚本Return None而不是返回KeyError ？

数据框：

df_sales_volume包含 3 列： Customer 、 SKU和Units Purchased 。 这详细说明了给定 SKU 和给定客户购买了多少单位。

df_sku_prices包含 3 列： SKU 、 List Price和Markedup Price 。 以及我需要合并到 df_sales_volume 中的那个月的 SKU 定价。

df_customer_discounts包含 3 列： Customer 、 SKU和Discount 。 该数据集包含企业对每个客户的所有折扣，尽管并非所有客户/折扣组合都会出现在df_sales_volume中。

下面是创建示例数据集的代码：

import pandas as pd

df_sales_volume = pd.DataFrame({ 
    "Customer": ["John's Fruit Shop"]*2 + ["Adam's Grocery's"]*3 + ["Lucy's Fresh Food"]*2, 
    "SKU": ["Bannanas"] + ["Apples"] + ["Avocados"] + ["Purple Grapes"] + ["Dragon Fruit"]*2 +                            ["Mangos"],
    "Units Purchased": [4] + [2] + [13] + [5] + [70] + [34] + [8],
    })

df_sku_prices = pd.DataFrame({
    "SKU": ["Avocados"] + ["Dragon Fruit"] + ["Grapes"] + ["Bannanas"] + ["Apples"],
    "List Price": [103.21] + [4.55] + [42.01] + [7.00] + [3.35], 
    "Markedup Price": [109.34] + [7.20] + [59.00] + [13.78] + [4.10]
}).set_index(["SKU"])


df_customer_discounts = pd.DataFrame({
    "Customer": ["John's Fruit Shop"]*4 + ["Adam's Grocery's"]*3 + ["Lucy's Fresh Food"]*3, 
    "SKU": ["Apples"] + ["Bannanas"] + ["Purple Grapes"] + ["Mandarins"] + ["Avocados"] + ["Purple Grapes"] + ["Dragon Fruit"] + ["Avocados"] + ["Dragon Fruit"] + ["Mangos"],
    "Discount": [0.05] + [0.35] + [0.22] + [0.15] + [0.50] + [0.40] + [0.10] + [0.75] + [0.01] + [0.24]
}).set_index(["SKU", "Customer"])

这是我试过的：

# Create copy of original volume file to work with

df_monthly_sales_report = df_sales_volume.copy()

# Lookup list price by sku 

df_monthly_sales_report["SKU List Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "List Price"], 
axis=1,
)

# Lookup marked up price by sku

df_monthly_sales_report["SKU Markedup Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "Markedup Price"], 
axis=1, 
)

# Lookup discounts by customer and sku

df_monthly_sales_report["Customer Discount"] = df_monthly_sales_report.apply(
lambda row: df_customer_discounts.loc[(row['SKU'], row['Customer']), "Discount"], 
axis=1, 
)

但是，我返回KeyError: 'Purple Grapes'

期望的结果是 DataFrame，其中：

新的 SKU 标价列为交易分配正确的 SKU 标价
一个新的 SKU 加价列为交易分配正确的 SKU 加价价格
为客户/SKU 组合分配正确折扣的新折扣列

关于数据集的注释：

现实生活中的数据集要大得多
df_customer_discounts通常包含df_sales_volume数据集中不存在的组合。 换句话说，某些客户的折扣未被激活，因为该客户未购买任何该产品。 我相信这是导致关键错误的原因。

我对不涉及 lambda 的方法持开放态度，但我是 Python 的新手，所以我的知识并不广泛。 这是一个我最终会与同事分享的脚本，并且会重新运行很多次。

Answer 1

我认为您正在寻找pandas.merge 。

无需对每一行应用查找 function，您只需执行 SQL 样式的左连接：

merged = (
    df_sales_volume.merge(
        df_sku_prices, 
        left_on="SKU", 
        right_index=True, 
        how="left"
    ).merge(
        df_customer_discounts, 
        left_on=["SKU", "Customer"], 
        right_index=True, 
        how="left"
    )
)

结果：

            Customer            SKU  Units Purchased  List Price  Markedup Price  Discount
0  John's Fruit Shop       Bannanas                4        7.00           13.78      0.35   
1  John's Fruit Shop         Apples                2        3.35            4.10      0.05   
2   Adam's Grocery's       Avocados               13      103.21          109.34      0.50   
3   Adam's Grocery's  Purple Grapes                5         NaN             NaN      0.40   
4   Adam's Grocery's   Dragon Fruit               70        4.55            7.20      0.10   
5  Lucy's Fresh Food   Dragon Fruit               34        4.55            7.20      0.01   
6  Lucy's Fresh Food         Mangos                8         NaN             NaN      0.24

Answer 2

最好手动编写代码..

Try:
    #The Code where you think the error occurs.
Except Exception as e:
    print(e)
    return None

Answer 3

当 df_customer_discounts 中存在客户和 SKU 的组合而 df_sales_volume 中不存在时，我如何告诉我的脚本不返回任何内容，而不是返回 KeyError？

现在只关注这个问题——它可能不是你的完整实现，但我们可能需要更好地理解这个问题才能更好地回答。

最初，我会避免对 customer_discounts dataframe 进行多重索引：

df_customer_discounts_cols = pd.DataFrame({
"Customer": ["John's Fruit Shop"]*4 + ["Adam's Grocery's"]*3 + ["Lucy's Fresh Food"]*3, 
"SKU": ["Apples"] + ["Bannanas"] + ["Purple Grapes"] + ["Mandarins"] + ["Avocados"] + ["Purple Grapes"] + ["Dragon Fruit"] + ["Avocados"] + ["Dragon Fruit"] + ["Mangos"],
"Discount": [0.05] + [0.35] + [0.22] + [0.15] + [0.50] + [0.40] + [0.10] + [0.75] + [0.01] + [0.24]})

然后我们可以使用pandas.query()方法来查找一个存在但另一个不存在的结果。 查询应该比应用更快。 我们可能可以将其压缩为更少的行，但要明确：

fruit = "Purple Grapes"
customer = "John's Fruit Shop"

cd_query = df_customer_discounts_cols.query("SKU == @fruit and Customer == @customer")

sv_query = df_sales_volume.query("SKU == @fruit and Customer == @customer")

result = sv_query
sv_query
if not cd_query.empty:
    if sv_query.empty:
        result = None


print(type(result))
print(result)

如何告诉 Python 中的 lambda function 返回 None 而不是抛出键值错误

问题描述

3 个解决方案

解决方案1
2 2021-10-03 04:18:07

解决方案2
0 2021-10-03 04:08:56

解决方案3
0 2021-10-03 05:24:06

如何告诉 Python 中的 lambda function 返回 None 而不是抛出键值错误

问题描述

3 个解决方案

解决方案1 2 2021-10-03 04:18:07

解决方案2 0 2021-10-03 04:08:56

解决方案3 0 2021-10-03 05:24:06

解决方案1
2 2021-10-03 04:18:07

解决方案2
0 2021-10-03 04:08:56

解决方案3
0 2021-10-03 05:24:06