[英]Sorting in a Pandas pivot_table
我一直在尋找試圖找出正確排序我的數據透視表的地方,但是我沒有任何運氣。
client unit task hours month
0 A DVADA Account Management 6.50 January
1 A DVADA Buying 1.25 January
2 A DVADA Meeting / Call 0.50 January
3 A DVADA Account Management 3.00 January
4 A DVADA Billing 2.50 February
5 A DVADA Account Management 6.50 February
6 A DVADA Buying 1.25 February
7 A DVADA Meeting / Call 0.50 February
8 A DVADA Account Management 3.00 February
9 A DVADA Billing 2.50 February
10 A DVADA Billing 2.50 December
11 A DVADA Account Management 6.50 December
12 A DVADA Buying 1.25 December
13 A DVADA Meeting / Call 0.50 December
14 A DVADA Account Management 3.00 December
15 A DVADA Billing 2.50 December
16 A DVADA Account Management 6.50 August
17 A DVADA Buying 1.25 August
18 A DVADA Meeting / Call 0.50 August
19 A DVADA Account Management 3.00 August
20 A DVADA Account Management 6.50 April
21 A DVADA Buying 1.25 April
22 A DVADA Meeting / Call 0.50 April
23 A DVADA Account Management 3.00 April
24 B DVADA Account Management 6.50 January
25 B DVADA Buying 1.25 January
26 B DVADA Meeting / Call 0.50 January
27 B DVADA Account Management 3.00 January
28 B DVADA Billing 2.50 February
29 B DVADA Account Management 6.50 February
30 B DVADA Buying 1.25 February
31 B DVADA Meeting / Call 0.50 February
32 B DVADA Account Management 3.00 February
33 B DVADA Billing 2.50 February
34 B DVADA Billing 2.50 December
35 B DVADA Account Management 6.50 December
36 B DVADA Buying 1.25 December
37 B DVADA Meeting / Call 0.50 December
38 B DVADA Account Management 3.00 December
39 B DVADA Billing 2.50 December
40 B DVADA Account Management 6.50 August
41 B DVADA Buying 1.25 August
42 B DVADA Meeting / Call 0.50 August
43 B DVADA Account Management 3.00 August
44 B DVADA Account Management 6.50 April
45 B DVADA Buying 1.25 April
46 B DVADA Meeting / Call 0.50 April
47 C DVADA Account Management 3.00 April
48 C DVADA Account Management 6.50 January
49 C DVADA Buying 1.25 January
50 C DVADA Meeting / Call 0.50 January
51 C DVADA Account Management 3.00 January
52 C DVADA Billing 2.50 February
53 C DVADA Account Management 6.50 February
54 C DVADA Buying 1.25 February
55 C DVADA Meeting / Call 0.50 February
56 C DVADA Account Management 3.00 February
57 C DVADA Billing 2.50 February
58 C DVADA Billing 2.50 December
59 C DVADA Account Management 6.50 December
60 C DVADA Buying 1.25 December
61 C DVADA Meeting / Call 0.50 December
62 C DVADA Account Management 3.00 December
63 C DVADA Billing 2.50 December
64 C DVADA Account Management 6.50 August
65 C DVADA Buying 1.25 August
66 C DVADA Meeting / Call 0.50 August
67 C DVADA Account Management 3.00 August
68 C DVADA Account Management 6.50 April
69 C DVADA Buying 1.25 April
70 C DVADA Meeting / Call 0.50 April
71 C DVADA Account Management 3.00 April
df = pd.pivot_table(vp_clients,values ='hours',index = ['client','month'],aggfunc = sum)
它返回包含三列(客戶,月份,小時)的數據透視表。 每個客戶有12個月(1月至12月),每個月中的每個月都有一個小時。
hours
client month
A April 203.50
August 227.75
December 159.75
February 203.25
January 199.25
B April 203.50
August 227.75
December 159.75
February 203.25
January 199.25
C April 203.50
August 227.75
December 159.75
February 203.25
January 199.25
我想按月份對數據透視表進行排序,但要保留client列。
hours
client month
A January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
B January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
C January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
排序問題已由Scott的以下答案解決。 現在,我想向每個客戶端添加一行,並使用總小時數。
hours
client month
A January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Total 1000.34
B January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Total 1000.34
C January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Total 1000.34
任何幫助將不勝感激
vp_clients['month'] = pd.Categorical(vp_clients['month'],
ordered=True,
categories=['January','February','March',
'April','May','June','July',
'August','September','October',
'November','December','Total'])
df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
df = df.dropna()
pd.concat([df,df.sum(level=0).assign(month='Total').set_index('month', append=True)]).sort_index()
輸出:
hours
client month
A January 11.25
February 16.25
April 11.25
August 11.25
December 16.25
Total 66.25
B January 11.25
February 16.25
April 8.25
August 11.25
December 16.25
Total 63.25
C January 11.25
February 16.25
April 14.25
August 11.25
December 16.25
Total 69.25
讓我們使用pd.Categorical
:
vp_clients['month'] = pd.Categorical(vp_clients['month'],
ordered=True,
categories=['January','February','March',
'April','May','June','July',
'August','September','October',
'November','December'])
df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
df.dropna()
輸出:
hours
client month
A January 11.25
February 16.25
April 11.25
August 11.25
December 16.25
B January 11.25
February 16.25
April 8.25
August 11.25
December 16.25
C January 11.25
February 16.25
April 14.25
August 11.25
December 16.25
另外,如前所述,由於您不是將值以較寬的格式轉換為新列,因此請考慮簡單地使用groupby()
。 然后重新考慮reindex()
以自定義一月至十二月的順序,指定級別並與python的內置calendar
模塊接口。
import calendar
...
grp_df = df.groupby(['client', 'month']).agg({'hours': 'sum'})\
.reindex(level=1, labels=calendar.month_name)
# hours
# client month
# A January 11.25
# February 16.25
# April 11.25
# August 11.25
# December 16.25
# B January 11.25
# February 16.25
# April 8.25
# August 11.25
# December 16.25
# C January 11.25
# February 16.25
# April 14.25
# August 11.25
# December 16.25
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.