pandas：获取Dataframe的每日描述

Question

I have a dataframe that looks like this:我有一个看起来像这样的 dataframe：

        provider    timestamp                   vehicle_id
id          
103107  a           2019-09-11 20:05:47+02:00   x
1192195 b           2019-09-11 00:02:46+02:00   y
434508  c           2019-09-11 00:32:39+02:00   z
530388  c           2019-09-11 08:12:56+02:00   z
1773721 b           2019-09-11 20:02:55+02:00   w
...

I would like to get some statistics on the different vehicle_ids per day.我想获得一些关于每天不同车辆 ID 的统计数据。 I have this which allows me to do a describe manually:我有这个允许我手动进行describe ：

df.groupby(['provider', df['timestamp'].dt.strftime('%Y-%m-%d')])[['vehicle_id']].nunique() : df.groupby(['provider', df['timestamp'].dt.strftime('%Y-%m-%d')])[['vehicle_id']].nunique() ：

                        vehicle_id
provider    timestamp   
a           2019-09-11  1224
            2019-09-12  1054
b           2019-09-11  2859
            2019-09-12  2761
            2019-09-17  700

How do I wrangle the data so I can get a daily min / max / average for each day?如何整理数据，以便获得每天的最小值/最大值/平均值？ I'm kind of lost, any help is very appreciated.我有点迷茫，非常感谢任何帮助。

Answer 1

Try this:尝试这个：

aggregations = ['mean', 'min', 'max', 'std']
result = grouped_df.groupby('timestamp')[vehicle_id].agg(aggregations)

Note: You might need to flatten your columns indexes first:注意：您可能需要先展平列索引：

grouped_df.columns = [col[1] if col[1] != '' else col[0] for col in grouped_df.columns]

Answer 2

Try groupby().agg() :尝试groupby().agg() ：

new_df.groupby('timestamp').vehicle_id.agg({'min','max','mean'})

Note : Since you only care about one column in your original data, you can just pass a series in the first groupby instead of a data frame, ie,注意：由于您只关心原始数据中的一列，因此您可以在第一个 groupby 中传递一个系列而不是数据框，即

# note the number of [] around 'vehicle_id'
new_df = (df.groupby(['provider', 
                     df['timestamp'].dt.strftime('%Y-%m-%d')])
          ['vehicle_id'].nunique()
         )

Then new_df is a series named vehicle_id , and the next command is just那么new_df就是一个名为vehicle_id的系列，下一个命令就是

# note the difference before .agg
new_df.groupby('timestamp').agg({'min', 'max', 'mean'})

Answer 3

If I correctly understand your problem, all you need to do is this:如果我正确理解您的问题，您需要做的就是：

df.groupby(['provider', df['timestamp'].dt.strftime('%Y-%m-%d')])[['vehicle_id']].nunique()\
  .groupby('timestamp')['vehicle_id'].describe()

In first groupby you'll get the dataframe with with number of unique vehicle_id by provider and day.在第一个 groupby 中，您将获得vehicle_id以及provider和日期的唯一车辆 ID 数量。 For provided data sample it is:对于提供的数据样本，它是：

                     vehicle_id
provider timestamp             
a        2019-09-11           1
b        2019-09-11           2
c        2019-09-11           1

And in the second it'll be statistics per day.第二个是每天的统计数据。 So the result will be所以结果将是

            count      mean      std  min  25%  50%  75%  max
timestamp                                                    
2019-09-11    3.0  1.333333  0.57735  1.0  1.0  1.0  1.5  2.0

pandas：获取Dataframe的每日描述

问题描述

3 个解决方案

解决方案1
1 2019-09-19 13:24:19

解决方案2
1 已采纳 2019-09-19 13:25:37

解决方案3
0 2019-09-19 13:48:19

pandas：获取Dataframe的每日描述

问题描述

3 个解决方案

解决方案1 1 2019-09-19 13:24:19

解决方案2 1 已采纳 2019-09-19 13:25:37

解决方案3 0 2019-09-19 13:48:19

解决方案1
1 2019-09-19 13:24:19

解决方案2
1 已采纳 2019-09-19 13:25:37

解决方案3
0 2019-09-19 13:48:19