使用熊猫GroupBy或ivot_table查找最低每日价值

Question

I have a Dataframe obtained from a csv file (after some filtering) that looks like this: 我有一个从csv文件获得的数据帧（经过一些过滤），看起来像这样：

 df3.head(n = 10)

        DateTime            Det_ID  Speed
16956   2014-01-01 07:00:00 1201085 65.0
16962   2014-01-01 07:00:00 1201110 69.5
19377   2014-01-01 08:00:00 1201085 65.0
19383   2014-01-01 08:00:00 1201110 65.0
21798   2014-01-01 09:00:00 1201085 65.0
21804   2014-01-01 09:00:00 1201110 65.4
75060   2014-01-02 07:00:00 1201085 64.9
75066   2014-01-02 07:00:00 1201110 66.1
77481   2014-01-02 08:00:00 1201085 65.0
77487   2014-01-02 08:00:00 1201110 62.5

This represents the speeds measured by different detectors (two for now) at various times of day. 这代表了一天中不同时间由不同检测器（现在为两个）测量的速度。 I have converted the DateTime column to a datetime object. 我已经将DateTime列转换为datetime对象。

I need to know for each detector, the minimum daily value of the speed. 我需要知道每个检测器的最低每日速度值。

Basically, something like this, which I can then use to build a heat map. 基本上是这样的，然后我可以用它来构建热图。

df4 = df3.pivot_table(index='DateTime',columns='Det_ID',aggfunc=min)
df4.head()

                      Speed
Det_ID             1201085  1201110
DateTime        
2014-01-01 07:00:00 65.0    69.5
2014-01-01 08:00:00 65.0    65.0
2014-01-01 09:00:00 65.0    65.4
2014-01-02 07:00:00 64.9    66.1
2014-01-02 08:00:00 65.0    62.5

Clearly, the way I've used the pivot table is incorrect as I'm getting multiple values of daily speeds, not just one. 显然，我使用数据透视表的方式是不正确的，因为我获得了多个每日速度值，而不仅仅是一个。 I suspect it is because the minimum is being calculated over each unique DateTime field, not just the for the date part. 我怀疑这是因为最小值是在每个唯一的DateTime字段上计算的，而不仅仅是日期部分的。

Also trying groupby options. 还尝试使用groupby选项。

list(df3.groupby(['DateTime'], sort = False)['Speed'].min())

But it just gives a list of numbers, without any other columns. 但是它只是给出了一个数字列表，没有任何其他列。

65.0,
 65.0,
 65.0,
 64.900000000000006,
 62.5,
 64.200000000000003,
 54.700000000000003,
 62.600000000000001,
 64.799999999999997,
 59.5,

etc. 等等

How do I isolate just the date part in the DateTime field? 如何在DateTime字段中仅隔离日期部分？ Am I even going in the right direction? 我什至朝着正确的方向前进吗？ Thanks. 谢谢。

Answer 1

Call .dt.strftime and reformat your DateTime column. 调用.dt.strftime并重新格式化DateTime列。

df.DateTime = df.DateTime.dt.strftime('%m/%d/%Y')
df

        DateTime   Det_ID  Speed
16956  01/01/2014  1201085   65.0
16962  01/01/2014  1201110   69.5
19377  01/01/2014  1201085   65.0
19383  01/01/2014  1201110   65.0
21798  01/01/2014  1201085   65.0
21804  01/01/2014  1201110   65.4
75060  01/02/2014  1201085   64.9
75066  01/02/2014  1201110   66.1
77481  01/02/2014  1201085   65.0
77487  01/02/2014  1201110   62.5

Now, call pivot_table : 现在，调用pivot_table ：

df = df.pivot_table(index='DateTime', columns='Det_ID', values='Speed', aggfunc=np.min)
df
Det_ID      1201085  1201110
DateTime                    
01/01/2014     65.0     65.0
01/02/2014     64.9     62.5

Answer 2

Or using unstack 或使用unstack

df.DateTime = df.DateTime.dt.strftime('%m/%d/%Y')
df.groupby(['DateTime','Det_ID']).Speed.min().unstack()
Out[300]: 
Det_ID      1201085  1201110
DateTime                    
01/01/2014     65.0     65.0
01/02/2014     64.9     62.5

使用熊猫GroupBy或ivot_table查找最低每日价值

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-09-10 01:22:36

解决方案2
1 2017-09-10 03:18:48

使用熊猫GroupBy或ivot_table查找最低每日价值

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-09-10 01:22:36

解决方案2 1 2017-09-10 03:18:48

解决方案1
2 已采纳 2017-09-10 01:22:36

解决方案2
1 2017-09-10 03:18:48