简体   繁体   中英

Exclude a column from calculated value

I'm new to the library and am trying to figure out how to add columns to a pivot table with the mean and standard deviation of the row data for the last three months of transaction data.

Here's the code that sets up the pivot table:

previousThreeMonths = [prev_month_for_analysis, prev_month2_for_analysis, prev_month3_for_analysis]
dfPreviousThreeMonths = df[df['Month'].isin(previousThreeMonths)]

ptHistoricalConsumption = dfPreviousThreeMonths.pivot_table(dfPreviousThreeMonths,
                                                            index=['Customer Part #'],
                                                            columns=['Month'],
                                                            aggfunc={'Qty Shp':np.sum}
                                                            )

ptHistoricalConsumption['Mean'] = ptHistoricalConsumption.mean(numeric_only=True, axis=1)
ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.std(numeric_only=True, axis=1)
ptHistoricalConsumption

The resulting pivot table looks like this: 数据透视表

The problem is that the standard deviation column is including the Mean in its calculations , whereas I just want it to use the raw data for the previous three months. For example, the Std Dev of part number 2225 should be 11.269, not 9.2.

I'm sure there's a better way to do this and I'm just missing something.

One way would be to remove the Mean column temporarily before call .std() :

ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.drop('Mean', axis=1).std(numeric_only=True, axis=1)

That wouldn't remove it from the permanently, it would just remove it from the copy fed to .std() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM