[英]GroupBy a custom lambda function (using strings) in pandas
I have the following DF我有以下DF
Sku Availability
0 1 out of stock
1 1 in stock
2 1 in stock
3 2 out of stock
How can I use a custom aggregate function to create the following DF:如何使用自定义聚合 function 创建以下 DF:
Sku Availability
0 1 in stock
2 2 out of stock
(Basically, if a SKU is in stock, the out of stock SKUs should be dropped, I have same SKUs because each refers to a different store...) (基本上,如果一个 SKU 有货,缺货的 SKU 应该被丢弃,我有相同的 SKU,因为每个都指的是不同的商店......)
MVCE: MVCE:
d = {'Sku': ['1', '1', '1', '2'], 'Availability': ['out of stock', 'in stock', 'in stock', 'out of stock']}
df = pd.DataFrame(data=d)
# df = df.groupby('Sku').apply(lambda x: ...)
You can use sort_values
to sort lexicographically your data by Availabilility
then drop_duplicates
(keep first row by Sku
)您可以使用sort_values
按Availabilility
按字典顺序对数据进行排序,然后drop_duplicates
(按Sku
保留第一行)
out = df.sort_values(['Sku', 'Availability']) \
.drop_duplicates('Sku', ignore_index=True)
print(out)
# Output:
Sku Availability
0 1 in stock
1 2 out of stock
A more consistent way is to use CategoricalDtype
:一种更一致的方法是使用CategoricalDtype
:
# Explicit is better than implicit
cat = pd.CategoricalDtype(['in stock', 'out of stock'], ordered=True)
out = df.astype({'Availability': cat}).sort_values(['Sku', 'Availability']) \
.drop_duplicates('Sku', ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.