熊猫DataFrame高级索引

Question

I am looking for some help with pandas DataFrame sorting. 我正在寻找有关熊猫DataFrame排序的帮助。 I have a Data frame of 8 columns that go like; 我有一个8列的数据框。

 ['Date' , 'S ID', 'Se ID', 'S #', 'File Size (Mb)', 'HD name', 'Start Time', 'End time']

I've then done a: 然后，我做了一个：

 DataFile.groupby(['HD Name','Date','Se ID','S ID'])['File Size (Mb)'].agg({'Sequence #':'count','File Size (Mb)':'sum'}).reset_index().rename(columns={'Sequence #':'# of Files'})

which takes my data and groups it by the matching grouby() parameters and sums the file sizes. 这将获取我的数据，并通过匹配的grouby（）参数对其进行分组，并对文件大小求和。 I would like to add two columns on to this which holds the first 'Start Time' and last 'End Time' how would I go about doing that? 我想在此添加两列，其中包含第一个“开始时间”和最后一个“结束时间”，我将如何去做？
I'm thinking my only option may be to loop over the data or create a duplicate dataframe to et start and end times of grouped data. 我在想我唯一的选择可能是遍历数据或创建重复的数据帧以分组数据的开始和结束时间。
Any ideas would be appreciated! 任何想法，将不胜感激！

Example dataframe: 示例数据框：

'Hard Drive Name' : [H5 , H5 , H5 , H5 , H5] “硬盘名称”：[H5，H5，H5，H5，H5]
'S ID' : [LA , LA , LA , SD , SD] 'S ID'：[LA，LA，LA，SD，SD]
'Se ID' : [1200, 1200, 1200, 30, 30] “ Se ID”：[1200、1200、1200、30、30]
'Date' : ['10/01/2018' , '10/01/2018' , '10/01/2018' , '09/03/2018' , '09/03/2018'] '日期'：['10 / 01/2018'，'10 / 01/2018'，'10 / 01/2018'，'09 / 03/2018'，'09 / 03/2018']
'#' : [1 , 2 , 3 , 1 , 2] '＃'：[1、2、3、1、2]
'Start Time' : [[08:09:54] , [08:58:31] , [09:39:38] , [05:04:13] , [05:41:13] ] '开始时间'：[[08:09:54]，[08:58:31]，[09:39:38]，[05:04:13]，[05:41:13]]
'End Time' : [[08:28:54] , [09:17:31] , [09:58:38] , [05:23:12] , [06:00:12]] “结束时间”：[[08:28:54]，[09:17:31]，[09:58:38]，[05:23:12]，[06:00:12]]

 {'Date': {34: '10/01/2018',
35: '10/01/2018',
36: '10/01/2018',
37: '10/01/2018',
38: '10/01/2018',
39: '10/01/2018',
40: '10/01/2018',
41: '10/01/2018',
42: '10/01/2018',
661: '09/03/2018'},  

'End Time': {34: ['08:28:54'],
35: ['09:17:31'],
36: ['09:58:38'],
37: ['10:37:41'],
38: ['11:21:32'],
39: ['12:04:42'],
40: ['12:45:31'],
41: ['13:25:23'],
42: ['14:04:03'],
661: ['05:53:36']},  

'File Size (Mb)': {34: 1074.256284,
35: 1074.842244,
36: 1074.759444,
37: 1074.836956,
38: 1074.516156,
39: 1074.547044,
40: 1074.8363,
41: 1074.891492,
42: 1074.792068,
661: 1074.428204},  

'Hard Drive Name': {34: 'H5',
35: 'H5',
36: 'H5',
37: 'H5',
38: 'H5',
39: 'H5',
40: 'H5',
41: 'H5',
42: 'H5',
661: 'H5'},  

'Sensor ID': {34: '1207',
35: '1207',
36: '1207',
37: '1207',
38: '1207',
39: '1207',
40: '1207',
41: '1207',
42: '1207',
661: '1207'},  

'Sequence #': {34: 's005',
35: 's006',
36: 's007',
37: 's008',
38: 's009',
39: 's010',
40: 's011',
41: 's012',
42: 's013',
661: 's000'},  

'Site ID': {34: 'SD',
35: 'SD',
36: 'SD',
37: 'SD',
38: 'SD',
39: 'SD',
40: 'SD',
41: 'SD',
42: 'SD',
661: 'SDO'},  

'Start Time': {34: ['08:09:54'],
35: ['08:58:31'],
36: ['09:39:38'],
37: ['10:18:41'],
38: ['11:02:32'],
39: ['11:45:42'],
40: ['12:26:31'],
41: ['13:06:23'],
42: ['13:45:03'],
661: ['05:34:37']}}

Answer 1

Okay, you need to use pd.to_timedelta with .str accessor: 好的，您需要将pd.to_timedelta与.str访问器一起使用：

Where d equals to your df.head(10).to_dict() output: 其中d等于df.head（10）.to_dict（）输出：

df = pd.DataFrame(d)

df['Start Time'] = pd.to_timedelta(df['Start Time'].str[0])
df['End Time']  = pd.to_timedelta(df['End Time'].str[0])

df_out = df.groupby(['Hard Drive Name','Date','Sensor ID','Site ID'])['Sequence #',
                                                                      'File Size (Mb)',
                                                                      'Start Time',
                                                                      'End Time']\
           .agg({'Sequence #':'count',
                 'File Size (Mb)':'sum',
                 'Start Time':'min',
                 'End Time':'max'})\
           .reset_index()\
           .rename(columns={'Sequence #':'# of Files'})

Output: 输出：

  Hard Drive Name        Date Sensor ID Site ID  # of Files  File Size (Mb) Start Time End Time
0              H5  09/03/2018      1207     SDO           1     1074.428204   05:34:37 05:53:36
1              H5  10/01/2018      1207      SD           9     9672.277988   08:09:54 14:04:03

熊猫DataFrame高级索引

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-07-02 20:07:29

熊猫DataFrame高级索引

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-07-02 20:07:29

解决方案1
0 已采纳 2018-07-02 20:07:29