Exclude a column from calculated value

Question

I'm new to the library and am trying to figure out how to add columns to a pivot table with the mean and standard deviation of the row data for the last three months of transaction data.

Here's the code that sets up the pivot table:

previousThreeMonths = [prev_month_for_analysis, prev_month2_for_analysis, prev_month3_for_analysis]
dfPreviousThreeMonths = df[df['Month'].isin(previousThreeMonths)]

ptHistoricalConsumption = dfPreviousThreeMonths.pivot_table(dfPreviousThreeMonths,
                                                            index=['Customer Part #'],
                                                            columns=['Month'],
                                                            aggfunc={'Qty Shp':np.sum}
                                                            )

ptHistoricalConsumption['Mean'] = ptHistoricalConsumption.mean(numeric_only=True, axis=1)
ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.std(numeric_only=True, axis=1)
ptHistoricalConsumption

The resulting pivot table looks like this:

The problem is that the standard deviation column is including the Mean in its calculations , whereas I just want it to use the raw data for the previous three months. For example, the Std Dev of part number 2225 should be 11.269, not 9.2.

I'm sure there's a better way to do this and I'm just missing something.

Answer 1

One way would be to remove the Mean column temporarily before call .std() :

ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.drop('Mean', axis=1).std(numeric_only=True, axis=1)

That wouldn't remove it from the permanently, it would just remove it from the copy fed to .std() .

Exclude a column from calculated value

Question

1 answers

solution1
0 2021-11-20 15:30:32

Exclude a column from calculated value

Question

1 answers

solution1 0 2021-11-20 15:30:32

solution1
0 2021-11-20 15:30:32