Pandas 分组按月和年

Question

I have the following dataframe:我有以下 dataframe：

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80

I need to group the data by year and month.我需要按年和月对数据进行分组。 Ie, Group by Jan 2013, Feb 2013, Mar 2013, etc...即，按 2013 年 1 月、2013 年 2 月、2013 年 3 月等分组...

I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.我将使用新分组的数据创建一个 plot 显示每年/每月的 abc 与 xyz。

I've tried various combinations of groupby and sum, but I just can't seem to get anything to work.我尝试了 groupby 和 sum 的各种组合，但我似乎无法得到任何工作。 How can I do it?我该怎么做？

Answer 1

You can use either resample or Grouper (which resamples under the hood).您可以使用重新采样或Grouper （在引擎盖下重新采样）。

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime ).首先确保日期时间列实际上是日期时间（用pd.to_datetime打它）。 It's easier if it's a DatetimeIndex:如果它是 DatetimeIndex 则更容易：

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M") .注意：以前pd.Grouper(freq="M")写为pd.TimeGrouper("M") 。 The latter is now deprecated since 0.21.后者自 0.21 起已被弃用。

I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.).我曾认为以下内容会起作用，但它不会（由于没有尊重as_index ？我不确定。）。 I'm including this for interest's sake.为了利益，我将其包括在内。

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime ), you can use the PeriodIndex:如果它是一列（它必须是 datetime64 列！正如我所说，用to_datetime命中它），您可以使用 PeriodIndex：

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

To get the desired result we have to reindex...要获得所需的结果，我们必须重新索引...

Answer 2

Why not keep it simple?!为什么不保持简单？！

GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum()

giving you,给你，

print(GB)
        abc  xyz
2013 6   80  250
     8   40   -5
2014 1   25   15
     2   60   80

and then you can plot like asked using,然后你可以按照要求进行绘图，

GB.plot('abc','xyz',kind='scatter')

Answer 3

There are different ways to do that.有不同的方法可以做到这一点。

I created the data frame to showcase the different techniques to filter your data.我创建了数据框来展示过滤数据的不同技术。

 df = pd.DataFrame({'Date':['01-Jun-13','03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],
'abc':[100,-20,40,25,60],'xyz':[200,50,-5,15,80] }) 'abc':[100,-20,40,25,60],'xyz':[200,50,-5,15,80] })

I separated months/year/day and seperated month-year as you explained.正如您所解释的，我将月/年/日分开，并将月年分开。

 def getMonth(s): return s.split("-")[1] def getDay(s): return s.split("-")[0] def getYear(s): return s.split("-")[2] def getYearMonth(s): return s.split("-")[1]+"-"+s.split("-")[2]

I created new columns: year , month , day and ' yearMonth '.我创建了新列： year 、 month 、 day和 ' yearMonth '。 In your case, you need one of both.在您的情况下，您需要两者之一。 You can group using two columns 'year','month' or using one column yearMonth您可以使用两列'year','month'或使用一列yearMonth

 df['year']= df['Date'].apply(lambda x: getYear(x)) df['month']= df['Date'].apply(lambda x: getMonth(x)) df['day']= df['Date'].apply(lambda x: getDay(x)) df['YearMonth']= df['Date'].apply(lambda x: getYearMonth(x))

Output:输出：

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
3  20-Jan-14   25   15   14   Jan  20    Jan-14
4  21-Feb-14   60   80   14   Feb  21    Feb-14

You can go through the different groups in groupby(..) items.您可以浏览 groupby(..) 项目中的不同组。

In this case, we are grouping by two columns:在这种情况下，我们按两列分组：

 for key,g in df.groupby(['year','month']): print key,g

Output:输出：

('13', 'Jun')         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
('13', 'Aug')         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
('14', 'Jan')         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
('14', 'Feb')         Date  abc  xyz year month day YearMonth

In this case, we are grouping by one column:在这种情况下，我们按一列分组：

 for key,g in df.groupby(['YearMonth']): print key,g

Output:输出：

Jun-13         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
Aug-13         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
Jan-14         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
Feb-14         Date  abc  xyz year month day YearMonth
4  21-Feb-14   60   80   14   Feb  21    Feb-14

In case you wanna access to specific item, you can use get_group如果您想访问特定项目，可以使用get_group

print df.groupby(['YearMonth']).get_group('Jun-13')打印 df.groupby(['YearMonth']).get_group('Jun-13')

Output:输出：

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13

Similar to get_group .类似于get_group 。 This hack would help to filter values and get the grouped values.此 hack 将有助于过滤值并获取分组值。

This also would give the same result.这也会产生相同的结果。

print df[df['YearMonth']=='Jun-13']

Output:输出：

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13

You can select list of abc or xyz values during Jun-13您可以在Jun-13期间选择abc或xyz值列表

print df[df['YearMonth']=='Jun-13'].abc.values
print df[df['YearMonth']=='Jun-13'].xyz.values

Output:输出：

[100 -20]  #abc values
[200  50]  #xyz values

You can use this to go through the dates that you have classified as "year-month" and apply cretiria on it to get related data.您可以使用它来查看您归类为“年-月”的日期，并对其应用 cretiria 以获取相关数据。

for x in set(df.YearMonth): 
    print df[df['YearMonth']==x].abc.values
    print df[df['YearMonth']==x].xyz.values

I recommend also to check this answer as well.我也建议检查这个答案。

Answer 4

You can also do it by creating a string column with the year and month as follows:您还可以通过创建一个带有年份和月份的字符串列来实现，如下所示：

df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')

However this doesn't preserve the order when you loop over the groups, eg但是，当您遍历组时，这不会保留顺序，例如

for name, group in grouped:
    print(name)

Will give:会给：

So then, if you want to preserve the order, you must do as suggested by @Q-man above:那么，如果您想保留顺序，则必须按照上面@Q-man 的建议进行操作：

grouped = df.groupby([df.index.year, df.index.month])

This will preserve the order in the above loop:这将保留上述循环中的顺序：

(2007, 11)
(2007, 12)
(2008, 1)
(2008, 2)
(2008, 3)
(2008, 4)
(2008, 5)
(2008, 6)
(2008, 7)
(2008, 8)
(2008, 9)
(2008, 10)

Answer 5

Some of the answers are using Date as an index instead of a column (and there's nothing wrong with doing that).一些答案是使用Date作为索引而不是列（这样做没有错）。

However, for anyone who has the dates stored as a column (instead of an index), remember to access the column's dt attribute.但是，对于将日期存储为列（而不是索引）的任何人，请记住访问列的dt属性。 That is:那是：

# First make sure `Date` is a datetime column
df['Date'] = pd.to_datetime(
  arg=df['Date'],
  format='%d-%b-%y' # Assuming dd-Mon-yy format
)

# Group by year and month
df.groupby(
  [
    df['Date'].dt.year,
    df['Date'].dt.month 
  ]
).sum()

Pandas 分组按月和年

问题描述

5 个解决方案

解决方案1
130 已采纳 2014-10-30 09:24:40

解决方案2
77 2016-11-23 17:09:48

解决方案3
9

解决方案4
5 2017-11-23 10:35:25

解决方案5
0 2022-09-23 15:51:14

Pandas 分组按月和年

问题描述

5 个解决方案

解决方案1 130 已采纳 2014-10-30 09:24:40

解决方案2 77 2016-11-23 17:09:48

解决方案3 9

解决方案4 5 2017-11-23 10:35:25

解决方案5 0 2022-09-23 15:51:14

解决方案1
130 已采纳 2014-10-30 09:24:40

解决方案2
77 2016-11-23 17:09:48

解决方案3
9

解决方案4
5 2017-11-23 10:35:25

解决方案5
0 2022-09-23 15:51:14