简体   繁体   English

如何使用Python / Pandas从“日期”字段按月分组

[英]How can I Group By Month from a Date field with Python/Pandas

I have a Dataframe df as follows: 我有一个Dataframe df,如下所示:

date               value_1    value_2
2018.07.06           10          0
2018.07.14           20          1
2018.07.27           20          2
2018.08.06           30          1
2018.08.09           40          3
2018.08.13           20          2
2018.09.10           30          1
2018.09.22           50          2
2018.10.09           20          3
2018.10.27           20          1

I need to group the above data by month to get output as: 我需要按月对上述数据进行分组,以得到如下输出:

date              value_1    value_2
2018.07.01           50          3
2018.08.01           90          6
2018.09.01           80          3
2018.10.01           40          4

How can I do this efficiently in pandas? 如何在熊猫中有效地做到这一点?

Try, groupby using pd.Grouper with freq = 'MS': 尝试使用带有freq ='MS'的pd.Grouper进行分组:

df.groupby(pd.Grouper(freq='MS', key='date')).sum().reset_index()

Output: 输出:

        date  value_1  value_2
0 2018-07-01       50        3
1 2018-08-01       90        6
2 2018-09-01       80        3
3 2018-10-01       40        4

And, if you want get dot date format back, you can use this: 而且,如果您希望恢复点日期格式,可以使用以下命令:

df_out = df.groupby(pd.Grouper(freq='MS', key='date')).sum().reset_index()

df_out['date'] = df_out['date'].dt.strftime('%Y.%m.%d')

df_out

Output: 输出:

         date  value_1  value_2
0  2018.07.01       50        3
1  2018.08.01       90        6
2  2018.09.01       80        3
3  2018.10.01       40        4

Do with

df.date=pd.to_datetime(df.date)
df.groupby(df.date+pd.offsets.MonthBegin(-1)).sum()
Out[171]: 
            value_1  value_2
date                        
2018-07-01       50        3
2018-08-01       90        6
2018-09-01       80        3
2018-10-01       40        4

If you have date as the index, it's as simple as resampling. 如果将日期作为索引,则就像重新采样一样简单。

df.resample('MS').sum()

If you don't have it as the index alreay, you can set_index . 如果没有它作为索引set_index ,则可以set_index

df.set_index('date').resample('MS').sum()

Both give you 两者都给你

            value_1  value_2
date                        
2018-07-01       50        3
2018-08-01       90        6
2018-09-01       80        3
2018-10-01       40        4

Use the dt accessor to get the months from the date column: 使用dt访问器从日期列获取月份:

df = pd.read_csv(r'C:\Users\Tim\Desktop\data.txt')
df['date'] = pd.to_datetime(df['date'])
df.groupby(df['date'].dt.month).sum()

this will create the following output: 这将创建以下输出:

     value_1    value_2
date        
7   50  3
8   90  6
9   80  3
10  40  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM