简体   繁体   English

GroupBy自定义lambda function(使用字符串)在pandas

[英]GroupBy a custom lambda function (using strings) in pandas

I have the following DF我有以下DF

  Sku  Availability
0   1  out of stock
1   1      in stock
2   1      in stock
3   2  out of stock

How can I use a custom aggregate function to create the following DF:如何使用自定义聚合 function 创建以下 DF:

  Sku  Availability
0   1      in stock
2   2  out of stock

(Basically, if a SKU is in stock, the out of stock SKUs should be dropped, I have same SKUs because each refers to a different store...) (基本上,如果一个 SKU 有货,缺货的 SKU 应该被丢弃,我有相同的 SKU,因为每个都指的是不同的商店......)

MVCE: MVCE:

d = {'Sku': ['1', '1', '1', '2'], 'Availability': ['out of stock', 'in stock', 'in stock', 'out of stock']}
df = pd.DataFrame(data=d)
# df = df.groupby('Sku').apply(lambda x: ...) 

You can use sort_values to sort lexicographically your data by Availabilility then drop_duplicates (keep first row by Sku )您可以使用sort_valuesAvailabilility按字典顺序对数据进行排序,然后drop_duplicates (按Sku保留第一行)

out = df.sort_values(['Sku', 'Availability']) \
        .drop_duplicates('Sku', ignore_index=True)
print(out)

# Output:
  Sku  Availability
0   1      in stock
1   2  out of stock

A more consistent way is to use CategoricalDtype :一种更一致的方法是使用CategoricalDtype

# Explicit is better than implicit
cat = pd.CategoricalDtype(['in stock', 'out of stock'], ordered=True)
out = df.astype({'Availability': cat}).sort_values(['Sku', 'Availability']) \
        .drop_duplicates('Sku', ignore_index=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM