简体   繁体   English

从计算值中排除列

[英]Exclude a column from calculated value

I'm new to the library and am trying to figure out how to add columns to a pivot table with the mean and standard deviation of the row data for the last three months of transaction data.我是图书馆的新手,正在尝试弄清楚如何将列添加到 pivot 表中,其中包含过去三个月交易数据的行数据的平均值和标准差。

Here's the code that sets up the pivot table:下面是设置 pivot 表的代码:

previousThreeMonths = [prev_month_for_analysis, prev_month2_for_analysis, prev_month3_for_analysis]
dfPreviousThreeMonths = df[df['Month'].isin(previousThreeMonths)]

ptHistoricalConsumption = dfPreviousThreeMonths.pivot_table(dfPreviousThreeMonths,
                                                            index=['Customer Part #'],
                                                            columns=['Month'],
                                                            aggfunc={'Qty Shp':np.sum}
                                                            )

ptHistoricalConsumption['Mean'] = ptHistoricalConsumption.mean(numeric_only=True, axis=1)
ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.std(numeric_only=True, axis=1)
ptHistoricalConsumption

The resulting pivot table looks like this:生成的 pivot 表如下所示: 数据透视表

The problem is that the standard deviation column is including the Mean in its calculations , whereas I just want it to use the raw data for the previous three months.问题是标准差列在其计算中包含了平均值,而我只希望它使用前三个月的原始数据。 For example, the Std Dev of part number 2225 should be 11.269, not 9.2.例如,部件号 2225 的Std Dev偏差应该是 11.269,而不是 9.2。

I'm sure there's a better way to do this and I'm just missing something.我确信有更好的方法可以做到这一点,我只是错过了一些东西。

One way would be to remove the Mean column temporarily before call .std() :一种方法是在调用.std()之前暂时删除Mean列:

ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.drop('Mean', axis=1).std(numeric_only=True, axis=1)

That wouldn't remove it from the permanently, it would just remove it from the copy fed to .std() .这不会将其永久删除,它只会将其从馈送到.std()的副本中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pyspark - 根据另一个计算列的计算值更新列 - pyspark - Updating a column based on a calculated value from another calculated column 动态添加列并从另一列分配计算值 - add column dynamically and assign calculated value from another column 使用来自不同数据帧的计算值填充数据帧列 - Filling dataframe column with calculated value from a different dataframe Pyspark:防止列值在计算后发生变化 - Pyspark: Prevent Column value from changing once calculated 用从数据框中另一列的最大值计算的值替换字符串 - Replacing string with value calculated from the max of another column in a dataframe 使用先前计算的值(来自同一列)和Pandas Dataframe中另一列的值来计算值 - Calculate value using previously-calculated value (from the same column) and value from another column in a Pandas Dataframe 在从另一个列值派生的数据框中设置计算的列值 - Set a calculated column value in a data frame derived from another columns value 添加从现有列计算的ASCII列 - Add column in ASCII that is calculated from existing column 根据从当前行计算的 timedelta 从列中获取最大值的最有效方法 - Most effective method to get the max value from a column based on a timedelta calculated from the current row 为数据框中的每个组设置列的计算值 - Setting calculated value for column for each group in a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM