熊猫TimeGrouper和Pivot？

Question

This is what my dataframe looks like: 这就是我的数据框架：

  Timestamp               CAT
0 2016-12-02 23:35:28     200
1 2016-12-02 23:37:43     200
2 2016-12-02 23:40:49     300
3 2016-12-02 23:58:53     400
4 2016-12-02 23:59:02     300
...

This is what I'm trying to do in Pandas (notice the timestamps are grouped): 这就是我在Pandas中要做的事情（请注意时间戳已分组）：

Timestamp BINS         200   300   400   500
2016-12-02 23:30         2     0     0     0
2016-12-02 23:40         0     1     0     0
2016-12-02 23:50         0     1     1     0
...

I'm trying to create bins of 10-minute time intervals so I can make a bar graph. 我正在尝试创建10分钟时间间隔的箱子，这样我就可以制作条形图。 And have the columns as the CAT values, so I can have a count of how many times each CAT occurs within that time bin. 并将列作为CAT值，因此我可以计算每个CAT在该时间段内出现的次数。

What I have so far can create the time bins: 到目前为止，我可以创建时间箱：

def create_hist(df, timestamp, freq, fontsize, outfile):
    """ Create a histogram of the number of CATs per time period."""

    df.set_index(timestamp,drop=False,inplace=True)
    to_plot = df[timestamp].groupby(pandas.TimeGrouper(freq=freq)).count()
    ...

But my issue is I cannot for the life of me figure out how to group by both the CATs and by time bins. 但我的问题是我不能为我的生活弄清楚如何按CAT和时间箱分组。 My latest try was to use df.pivot(columns="CAT") before doing the groupby but it just gives me errors: 我最近的尝试是在执行groupby之前使用df.pivot(columns="CAT") ，但它只是给了我错误：

def create_hist(df, timestamp, freq, fontsize, outfile):
    """ Create a histogram of the number of CATs per time period."""

    df.pivot(columns="CAT")
    df.set_index(timestamp,drop=False,inplace=True)
    to_plot = df[timestamp].groupby(pandas.TimeGrouper(freq=freq)).count()
    ...

Which gives me: ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 这给了我： ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Answer 1

Using pd.TimeGrouper 使用pd.TimeGrouper

df.set_index('Timestamp') \
  .groupby([pd.TimeGrouper('10min'), 'CAT']) \
  .size().unstack(fill_value=0)

CAT                  200  300  400
Timestamp                         
2016-12-02 23:30:00    2    0    0
2016-12-02 23:40:00    0    1    0
2016-12-02 23:50:00    0    1    1

Answer 2

You can also use get_dummies and resample : 您还可以使用get_dummies并resample ：

In [11]: df1 = df.set_index("Timestamp")

In [12]: pd.get_dummies(df1["CAT"])
Out[12]:
                     200  300  400
Timestamp
2016-12-02 23:35:28    1    0    0
2016-12-02 23:37:43    1    0    0
2016-12-02 23:40:49    0    1    0
2016-12-02 23:58:53    0    0    1
2016-12-02 23:59:02    0    1    0

In [13]: pd.get_dummies(df1["CAT"]).resample("10min").sum()
Out[13]:
                     200  300  400
Timestamp
2016-12-02 23:30:00    2    0    0
2016-12-02 23:40:00    0    1    0
2016-12-02 23:50:00    0    1    1

Answer 3

IIUC: IIUC：

In [246]: df.pivot_table(index='Timestamp', columns='CAT', aggfunc='size', fill_value=0) \
            .resample('10T').sum()
Out[246]:
CAT                  200  300  400
Timestamp
2016-12-02 23:30:00    2    0    0
2016-12-02 23:40:00    0    1    0
2016-12-02 23:50:00    0    1    1

熊猫TimeGrouper和Pivot？

问题描述

3 个解决方案

解决方案1
5 2017-02-09 22:52:33

解决方案2
5 已采纳 2017-02-09 23:15:22

解决方案3
4 2017-02-09 22:52:02

熊猫TimeGrouper和Pivot？

问题描述

3 个解决方案

解决方案1 5 2017-02-09 22:52:33

解决方案2 5 已采纳 2017-02-09 23:15:22

解决方案3 4 2017-02-09 22:52:02

解决方案1
5 2017-02-09 22:52:33

解决方案2
5 已采纳 2017-02-09 23:15:22

解决方案3
4 2017-02-09 22:52:02