简体   繁体   中英

GroupBy a custom lambda function (using strings) in pandas

I have the following DF

  Sku  Availability
0   1  out of stock
1   1      in stock
2   1      in stock
3   2  out of stock

How can I use a custom aggregate function to create the following DF:

  Sku  Availability
0   1      in stock
2   2  out of stock

(Basically, if a SKU is in stock, the out of stock SKUs should be dropped, I have same SKUs because each refers to a different store...)

MVCE:

d = {'Sku': ['1', '1', '1', '2'], 'Availability': ['out of stock', 'in stock', 'in stock', 'out of stock']}
df = pd.DataFrame(data=d)
# df = df.groupby('Sku').apply(lambda x: ...) 

You can use sort_values to sort lexicographically your data by Availabilility then drop_duplicates (keep first row by Sku )

out = df.sort_values(['Sku', 'Availability']) \
        .drop_duplicates('Sku', ignore_index=True)
print(out)

# Output:
  Sku  Availability
0   1      in stock
1   2  out of stock

A more consistent way is to use CategoricalDtype :

# Explicit is better than implicit
cat = pd.CategoricalDtype(['in stock', 'out of stock'], ordered=True)
out = df.astype({'Availability': cat}).sort_values(['Sku', 'Availability']) \
        .drop_duplicates('Sku', ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM