![](/img/trans.png)
[英]pandas GroupBy: How to GroupBy and Aggregate data to show only the top 3 values of a field by count
[英]How to show only column with Values in Pandas Groupby
你好數據科學家和熊貓專家,
我需要一些幫助,因為我無法正確組織我的數據。 這是我的數據框:
df_dict = [ {'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp1', 'duties': 'opening'}, \
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'deli'}, \
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp3', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp1', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp5', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp7', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp1', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp3', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp1', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp5', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp7', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp1', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp3', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp1', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp5', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp7', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'deli'}]
我想按如下方式組織我的輸出:
Store 1 Store 2 store3
Week emp1 emp2 emp3 emp1 emp4 emp5 emp2 emp6 emp7
2013-12-30 2 4 2 2 4 2 2 4 2
2014-01-06 1 1 1 1 1 1 2 1 1
所以我嘗試通過表達式遵循 Group:
df_group = dict_df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Store', 'employee'])\
['duties'].count().unstack(level=1).unstack(level=1).reset_index()
但是,它顯示了所有員工,而不是顯示員工在該特定商店中的工作示例:
Store 1
Week emp1 emp2 emp3 emp4 emp5 emp6 emp7
2013-12-30 2 4 2 NaN NaN NaN NaN
2014-01-06 1 1 1 NaN NaN NaN NaN
那么我怎樣才能得到我想要的結果。 基本上我想過濾掉不在該商店工作的員工。
為了這個需要使用 Groupby 更好還是我應該考慮其他方法?
預先感謝您的幫助和考慮。
嘗試取消堆疊多個級別[1, 2]
:
df_out = (df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Store', 'employee'])['duties']
.count()
.unstack(level=[1, 2])
)
print(df_out)
印刷:
Store store1 store2 store3
employee emp1 emp2 emp3 emp1 emp4 emp5 emp2 emp6 emp7
Date
2014-01-06 2 4 2 2 4 2 2 4 2
2014-01-13 1 2 1 1 2 1 1 2 1
您可以同時取消堆疊兩個級別:
(df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Store','employee'])
.size().unstack(['Store','employee'])
)
輸出:
Store store1 store2 store3
employee emp1 emp2 emp3 emp1 emp4 emp5 emp2 emp6 emp7
Date
2014-01-06 2 4 2 2 4 2 2 4 2
2014-01-13 1 2 1 1 2 1 1 2 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.