创建熊猫数据透视表时存储所有值

Question

Basically, I'm aggregating prices over three indices to determine: mean, std, as well as an upper/lower limit. 基本上，我正在汇总三个指数的价格来确定：均值，标准差以及上限/下限。 So far so good. 到现在为止还挺好。 However, now I want to also find the lowest identified price which is still >= the computed lower limit. 但是，现在我还想找到确定的最低价格，该价格仍然> =计算的下限。

My first idea was to use np.min to find the lowest price -> this obviously disregards the lower-limit and is not useful. 我的第一个想法是使用np.min查找最低价格->这显然无视下限，并且没有用。 Now I'm trying to store all the values the pivot table identified to find the price which still is >= lower-limit. 现在，我正在尝试存储枢纽分析表确定的所有值，以找到仍然> =下限的价格。 Any ideas? 有任何想法吗？

pivot = pd.pivot_table(temp, index=['A','B','C'],values=['price'], aggfunc=[np.mean,np.std],fill_value=0)

pivot['lower_limit'] = pivot['mean'] - 2 * pivot['std']
pivot['upper_limit'] = pivot['mean'] + 2 * pivot['std']

Answer 1

First, merge pivoted[lower_limit] back into temp . 首先，将pivoted[lower_limit]合并回temp 。 Thus, for each price in temp there is also a lower_limit value. 因此，对于temp每个price ，还有一个lower_limit限值。

temp = pd.merge(temp, pivoted['lower_limit'].reset_index(), on=ABC)

Then you can restrict your attention to those rows in temp for which the price is >= lower_limit : 然后，您可以将注意力集中在price > = lower_limit那些temp行中：

temp.loc[temp['price'] >= temp['lower_limit']]

The desired result can be found by computing a groupby/min : 可以通过计算groupby/min找到所需的结果：

result = temp.loc[temp['price'] >= temp['lower_limit']].groupby(ABC)['price'].min()

For example, 例如，

import numpy as np
import pandas as pd

np.random.seed(2017)
N = 1000
ABC = list('ABC')
temp = pd.DataFrame(np.random.randint(2, size=(N,3)), columns=ABC)
temp['price'] = np.random.random(N)
pivoted = pd.pivot_table(temp, index=['A','B','C'],values=['price'], 
                         aggfunc=[np.mean,np.std],fill_value=0)
pivoted['lower_limit'] = pivoted['mean'] - 2 * pivoted['std']
pivoted['upper_limit'] = pivoted['mean'] + 2 * pivoted['std']

temp = pd.merge(temp, pivoted['lower_limit'].reset_index(), on=ABC)
result = temp.loc[temp['price'] >= temp['lower_limit']].groupby(ABC)['price'].min()
print(result)

yields 产量

A  B  C
0  0  0    0.003628
      1    0.000132
   1  0    0.005833
      1    0.000159
1  0  0    0.006203
      1    0.000536
   1  0    0.001745
      1    0.025713

创建熊猫数据透视表时存储所有值

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-06-18 14:14:00

创建熊猫数据透视表时存储所有值

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-06-18 14:14:00

解决方案1
1 已采纳 2017-06-18 14:14:00