[英]Exclude a column from calculated value
I'm new to the library and am trying to figure out how to add columns to a pivot table with the mean and standard deviation of the row data for the last three months of transaction data.我是图书馆的新手,正在尝试弄清楚如何将列添加到 pivot 表中,其中包含过去三个月交易数据的行数据的平均值和标准差。
Here's the code that sets up the pivot table:下面是设置 pivot 表的代码:
previousThreeMonths = [prev_month_for_analysis, prev_month2_for_analysis, prev_month3_for_analysis]
dfPreviousThreeMonths = df[df['Month'].isin(previousThreeMonths)]
ptHistoricalConsumption = dfPreviousThreeMonths.pivot_table(dfPreviousThreeMonths,
index=['Customer Part #'],
columns=['Month'],
aggfunc={'Qty Shp':np.sum}
)
ptHistoricalConsumption['Mean'] = ptHistoricalConsumption.mean(numeric_only=True, axis=1)
ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.std(numeric_only=True, axis=1)
ptHistoricalConsumption
The resulting pivot table looks like this:生成的 pivot 表如下所示:
The problem is that the standard deviation column is including the Mean in its calculations , whereas I just want it to use the raw data for the previous three months.问题是标准差列在其计算中包含了平均值,而我只希望它使用前三个月的原始数据。 For example, the Std Dev
of part number 2225 should be 11.269, not 9.2.例如,部件号 2225 的Std Dev
偏差应该是 11.269,而不是 9.2。
I'm sure there's a better way to do this and I'm just missing something.我确信有更好的方法可以做到这一点,我只是错过了一些东西。
One way would be to remove the Mean
column temporarily before call .std()
:一种方法是在调用.std()
之前暂时删除Mean
列:
ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.drop('Mean', axis=1).std(numeric_only=True, axis=1)
That wouldn't remove it from the permanently, it would just remove it from the copy fed to .std()
.这不会将其永久删除,它只会将其从馈送到.std()
的副本中删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.