[英]Find maximum of column for each business quarter pandas
Assume that I have the following data set 假设我有以下数据集
import pandas as pd, numpy, datetime
start, end = datetime.datetime(2015, 1, 1), datetime.datetime(2015, 12, 31)
date_list = pd.date_range(start, end, freq='B')
numdays = len(date_list)
value = numpy.random.normal(loc=1e3, scale=50, size=numdays)
ids = numpy.repeat([1], numdays)
test_df = pd.DataFrame({'Id': ids,
'Date': date_list,
'Value': value})
I would now like to calculate the maximum within each business quarter for test_df
. 我现在想计算每个业务季度中test_df
。 One possiblity is to use resample
using rule='BQ', how='max'
. 一种可能是使用rule='BQ', how='max'
进行resample
。 However, I'd like to keep the structure of the array and just generate another column with the maximum for each BQ, have you guys got any suggestions on how to do this? 但是,我想保留数组的结构,只为每个BQ生成具有最大值的另一列,你们对如何执行此操作有任何建议吗?
I think the following should work for you, this groups on the quarter and calls transform
on the 'Value' column and returns the maximum value as a Series with it's index aligned to the original df: 我认为以下内容对您来说应该有用,可以在四分之一上进行分组,然后在“值”列上调用transform
并以与原始df的索引对齐的Series形式返回最大值:
In [26]:
test_df['max'] = test_df.groupby(test_df['Date'].dt.quarter)['Value'].transform('max')
test_df
Out[26]:
Date Id Value max
0 2015-01-01 1 1005.498555 1100.197059
1 2015-01-02 1 1032.235987 1100.197059
2 2015-01-05 1 986.906171 1100.197059
3 2015-01-06 1 984.473338 1100.197059
........
256 2015-12-25 1 997.965285 1145.215837
257 2015-12-28 1 929.652812 1145.215837
258 2015-12-29 1 1086.128017 1145.215837
259 2015-12-30 1 921.663949 1145.215837
260 2015-12-31 1 938.189566 1145.215837
[261 rows x 4 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.