簡體   English   中英

在熊貓數據框中找到平日的分組平均值

[英]Finding the mean of groups for weekdays in a pandas dataframe

我的數據集是這樣的:

         tripduration           starttime   User Type
0                 732   7/1/2015 00:00:03  Subscriber
1                 322   7/1/2015 00:00:06  Subscriber
2                 790   7/1/2015 00:00:17  Subscriber
3                1228   7/1/2015 00:00:23  Subscriber
4                1383   7/1/2015 00:00:44  Subscriber
5                 603   7/1/2015 00:01:00  Subscriber
6                 520   7/1/2015 00:01:03  Subscriber
7                 289   7/1/2015 00:01:06  Subscriber
8                1771   7/1/2015 00:01:25    Customer
9                 813   7/1/2015 00:01:41  Subscriber
10               1735   7/1/2015 00:01:50    Customer
11                832   7/1/2015 00:01:58  Subscriber
12               1210   7/1/2015 00:02:06  Subscriber
13                746   7/1/2015 00:02:07  Subscriber
14                749   7/1/2015 00:02:26  Subscriber
15                463   7/1/2015 00:02:26  Subscriber
16                331   7/1/2015 00:02:35  Subscriber
17                951   7/1/2015 00:02:43    Customer
18               1352   7/1/2015 00:02:47    Customer
19                275   7/1/2015 00:02:47  Subscriber
20                199   7/1/2015 00:03:05  Subscriber
21                383   7/1/2015 00:03:16    Customer
22               4210   7/1/2015 00:03:27  Subscriber
23                584   7/1/2015 00:03:34  Subscriber
24                735   7/1/2015 00:03:48  Subscriber
25                827   7/1/2015 00:03:56  Subscriber
26                677   7/1/2015 00:03:57  Subscriber
27               2371   7/1/2015 00:03:58    Customer
28                666   7/1/2015 00:04:03  Subscriber
29                999   7/1/2015 00:04:17  Subscriber
...               ...                 ...         ...
1085646           243  7/31/2015 23:57:25  Subscriber
1085647          1378  7/31/2015 23:57:29    Customer
1085648           230  7/31/2015 23:57:32  Subscriber
1085649          1669  7/31/2015 23:57:33  Subscriber
1085650           493  7/31/2015 23:57:44  Subscriber
1085651           822  7/31/2015 23:57:54  Subscriber
1085652           617  7/31/2015 23:58:03  Subscriber
1085653           349  7/31/2015 23:58:08  Subscriber
1085654           818  7/31/2015 23:58:12    Customer
1085655          2062  7/31/2015 23:58:15  Subscriber
1085656           945  7/31/2015 23:58:18    Customer
1085657           346  7/31/2015 23:58:24  Subscriber
1085658           399  7/31/2015 23:58:27  Subscriber
1085659           641  7/31/2015 23:58:42  Subscriber
1085660          1872  7/31/2015 23:58:43  Subscriber
1085661         12065  7/31/2015 23:58:51    Customer
1085662           265  7/31/2015 23:58:53  Subscriber
1085663           936  7/31/2015 23:58:58  Subscriber
1085664           395  7/31/2015 23:59:04  Subscriber
1085665           238  7/31/2015 23:59:10  Subscriber
1085666           551  7/31/2015 23:59:24  Subscriber
1085667           423  7/31/2015 23:59:23    Customer
1085668          1623  7/31/2015 23:59:24  Subscriber
1085669          1632  7/31/2015 23:59:24  Subscriber
1085670           305  7/31/2015 23:59:38  Subscriber
1085671           275  7/31/2015 23:59:40  Subscriber
1085672           530  7/31/2015 23:59:41  Subscriber
1085673           273  7/31/2015 23:59:42    Customer
1085674          1273  7/31/2015 23:59:56  Subscriber
1085675          1667  7/31/2015 23:59:59  Subscriber

我的問題

訂戶在任何工作日(星期一至星期五)的平均旅行持續時間是多少?

我的密碼

函數a4()應該返回平均值(浮點到兩位小數):

def a4(rides):
    df1 = rides[rides['User Type'] == 'Subscriber']
    df1['starttime'] = df1['starttime'].apply(pd.to_datetime) #convert obect into datetime

我被tripduration以獲取工作日(星期一至星期五)來計算tripduration 我試圖解析starttime使用parser.parse(df1['starttime'])但得到了一個錯誤:

TypeError: Parser must be a string or character stream, not Series

獲取工作日平均值的正確方法是什么?

我認為您首先需要將to_datetimestarttime轉換。

然后通過boolean indexing過濾。

如果所有workday都需要一個標量值,請使用loc來選擇帶有mean列:

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    m = (rides['starttime'].dt.dayofweek < 5) & (rides['User Type'] == 'Subscriber')
    return round(rides.loc[m, 'tripduration'].mean(), 2)

print (a4(rides))
825.33

如果需要每天分別用dayofweek添加新條件,然后以合計mean groupby

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
    return df1.groupby(df1['starttime'].dt.dayofweek)['tripduration'].mean().round(2)

print (a4(rides))
starttime
2    840.96
4    809.71
Name: tripduration, dtype: float64

如果不需要天數,請使用weekday_name

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    df1 = rides[(rides['User Type'] == 'Subscriber') & (rides['starttime'].dt.dayofweek < 5)]
    return df1.groupby(df1['starttime'].dt.weekday_name)['tripduration'].mean().round(2)

print (a4(rides))
starttime
Friday       809.71
Wednesday    840.96
Name: tripduration, dtype: float64
df = pd.read_csv(...., parse_dates='starttime')

使用布爾索引進行過濾,然后groupby dayofweek以計算mean

df = df[(df.starttime.dt.dayofweek < 5) & df['User Type'].eq('Subscriber')]   
g = np.round(df.groupby(df.starttime.dt.dayofweek).tripduration.mean(), 2)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM