[英]Finding the mean of groups for weekdays in a pandas dataframe
我的數據集是這樣的:
tripduration starttime User Type
0 732 7/1/2015 00:00:03 Subscriber
1 322 7/1/2015 00:00:06 Subscriber
2 790 7/1/2015 00:00:17 Subscriber
3 1228 7/1/2015 00:00:23 Subscriber
4 1383 7/1/2015 00:00:44 Subscriber
5 603 7/1/2015 00:01:00 Subscriber
6 520 7/1/2015 00:01:03 Subscriber
7 289 7/1/2015 00:01:06 Subscriber
8 1771 7/1/2015 00:01:25 Customer
9 813 7/1/2015 00:01:41 Subscriber
10 1735 7/1/2015 00:01:50 Customer
11 832 7/1/2015 00:01:58 Subscriber
12 1210 7/1/2015 00:02:06 Subscriber
13 746 7/1/2015 00:02:07 Subscriber
14 749 7/1/2015 00:02:26 Subscriber
15 463 7/1/2015 00:02:26 Subscriber
16 331 7/1/2015 00:02:35 Subscriber
17 951 7/1/2015 00:02:43 Customer
18 1352 7/1/2015 00:02:47 Customer
19 275 7/1/2015 00:02:47 Subscriber
20 199 7/1/2015 00:03:05 Subscriber
21 383 7/1/2015 00:03:16 Customer
22 4210 7/1/2015 00:03:27 Subscriber
23 584 7/1/2015 00:03:34 Subscriber
24 735 7/1/2015 00:03:48 Subscriber
25 827 7/1/2015 00:03:56 Subscriber
26 677 7/1/2015 00:03:57 Subscriber
27 2371 7/1/2015 00:03:58 Customer
28 666 7/1/2015 00:04:03 Subscriber
29 999 7/1/2015 00:04:17 Subscriber
... ... ... ...
1085646 243 7/31/2015 23:57:25 Subscriber
1085647 1378 7/31/2015 23:57:29 Customer
1085648 230 7/31/2015 23:57:32 Subscriber
1085649 1669 7/31/2015 23:57:33 Subscriber
1085650 493 7/31/2015 23:57:44 Subscriber
1085651 822 7/31/2015 23:57:54 Subscriber
1085652 617 7/31/2015 23:58:03 Subscriber
1085653 349 7/31/2015 23:58:08 Subscriber
1085654 818 7/31/2015 23:58:12 Customer
1085655 2062 7/31/2015 23:58:15 Subscriber
1085656 945 7/31/2015 23:58:18 Customer
1085657 346 7/31/2015 23:58:24 Subscriber
1085658 399 7/31/2015 23:58:27 Subscriber
1085659 641 7/31/2015 23:58:42 Subscriber
1085660 1872 7/31/2015 23:58:43 Subscriber
1085661 12065 7/31/2015 23:58:51 Customer
1085662 265 7/31/2015 23:58:53 Subscriber
1085663 936 7/31/2015 23:58:58 Subscriber
1085664 395 7/31/2015 23:59:04 Subscriber
1085665 238 7/31/2015 23:59:10 Subscriber
1085666 551 7/31/2015 23:59:24 Subscriber
1085667 423 7/31/2015 23:59:23 Customer
1085668 1623 7/31/2015 23:59:24 Subscriber
1085669 1632 7/31/2015 23:59:24 Subscriber
1085670 305 7/31/2015 23:59:38 Subscriber
1085671 275 7/31/2015 23:59:40 Subscriber
1085672 530 7/31/2015 23:59:41 Subscriber
1085673 273 7/31/2015 23:59:42 Customer
1085674 1273 7/31/2015 23:59:56 Subscriber
1085675 1667 7/31/2015 23:59:59 Subscriber
訂戶在任何工作日(星期一至星期五)的平均旅行持續時間是多少?
函數a4()
應該返回平均值(浮點到兩位小數):
def a4(rides):
df1 = rides[rides['User Type'] == 'Subscriber']
df1['starttime'] = df1['starttime'].apply(pd.to_datetime) #convert obect into datetime
我被tripduration
以獲取工作日(星期一至星期五)來計算tripduration
。 我試圖解析starttime
使用parser.parse(df1['starttime'])
但得到了一個錯誤:
TypeError: Parser must be a string or character stream, not Series
獲取工作日平均值的正確方法是什么?
我認為您首先需要將to_datetime
列starttime
轉換。
然后通過boolean indexing
過濾。
如果所有workday
都需要一個標量值,請使用loc
來選擇帶有mean
列:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
m = (rides['starttime'].dt.dayofweek < 5) & (rides['User Type'] == 'Subscriber')
return round(rides.loc[m, 'tripduration'].mean(), 2)
print (a4(rides))
825.33
如果需要每天分別用dayofweek
添加新條件,然后以合計mean
groupby
:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
return df1.groupby(df1['starttime'].dt.dayofweek)['tripduration'].mean().round(2)
print (a4(rides))
starttime
2 840.96
4 809.71
Name: tripduration, dtype: float64
如果不需要天數,請使用weekday_name
:
def a4(rides):
rides['starttime'] = pd.to_datetime(rides['starttime'])
df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
return df1.groupby(df1['starttime'].dt.weekday_name)['tripduration'].mean().round(2)
print (a4(rides))
starttime
Friday 809.71
Wednesday 840.96
Name: tripduration, dtype: float64
df = pd.read_csv(...., parse_dates='starttime')
使用布爾索引進行過濾,然后groupby
dayofweek
以計算mean
。
df = df[(df.starttime.dt.dayofweek < 5) & df['User Type'].eq('Subscriber')]
g = np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(), 2)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.