[英]In a Pandas dataframe how do I calculate the median value for each decile within each month
I have a dataframe with 50 data points per month.我有一个每月有 50 个数据点的数据框。 I'd like to calculate the median value for each decile within each month.我想计算每个月内每个十分位数的中值。 In my groupby call I lead with the date, then qcut.在我的 groupby 电话中,我以日期为首,然后是 qcut。 But qcut calculates the bins over the whole dataset, not by month.但是 qcut 计算整个数据集的 bin,而不是按月计算。 Here's what I have so far:这是我到目前为止所拥有的:
import numpy as np
import pandas as pd
datecol = pd.date_range('12/31/2018','12/31/2019', freq='M')
for ii in range(0,49):
datecol = datecol.append(pd.date_range('12/31/2018','12/31/2019', freq='M'))
datecol = datecol.sort_values()
df = pd.DataFrame(np.random.randn(len(datecol), 1), index=datecol, columns=['Data'])
dfg = df.groupby([df.index, pd.qcut(df['Data'], 10)])['Data'].median()
I've tried to run a qcut on the monthly grouping, but that hasn't worked.我试图在每月分组上运行一个 qcut,但没有奏效。
First, groupby
month to create the quantile labels within month.首先, groupby
月份以创建月份内的分位数标签。 Then groupby
month and quantile to find the median.然后groupby
月份和分位数以找到中位数。
df['q'] = df.groupby(df.index).Data.apply(lambda x: pd.qcut(x, 10, labels=False))
df.groupby([df.index, 'q']).median()
Data
q
2018-12-31 0 -1.592383
1 -0.959931
2 -0.662911
3 -0.421994
4 -0.098636
5 0.394583
6 0.578562
... ...
2019-12-31 5 0.022384
6 0.398127
7 0.562900
8 0.765605
9 1.355345
[130 rows x 1 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.