[英]Storing all values when creating a Pandas Pivot Table
Basically, I'm aggregating prices over three indices to determine: mean, std, as well as an upper/lower limit. 基本上,我正在汇总三个指数的价格来确定:均值,标准差以及上限/下限。 So far so good.
到现在为止还挺好。 However, now I want to also find the lowest identified price which is still >= the computed lower limit.
但是,现在我还想找到确定的最低价格,该价格仍然> =计算的下限。
My first idea was to use np.min to find the lowest price -> this obviously disregards the lower-limit and is not useful. 我的第一个想法是使用np.min查找最低价格->这显然无视下限,并且没有用。 Now I'm trying to store all the values the pivot table identified to find the price which still is >= lower-limit.
现在,我正在尝试存储枢纽分析表确定的所有值,以找到仍然> =下限的价格。 Any ideas?
有任何想法吗?
pivot = pd.pivot_table(temp, index=['A','B','C'],values=['price'], aggfunc=[np.mean,np.std],fill_value=0)
pivot['lower_limit'] = pivot['mean'] - 2 * pivot['std']
pivot['upper_limit'] = pivot['mean'] + 2 * pivot['std']
First, merge pivoted[lower_limit]
back into temp
. 首先,将
pivoted[lower_limit]
合并回temp
。 Thus, for each price
in temp
there is also a lower_limit
value. 因此,对于
temp
每个price
,还有一个lower_limit
限值。
temp = pd.merge(temp, pivoted['lower_limit'].reset_index(), on=ABC)
Then you can restrict your attention to those rows in temp
for which the price
is >= lower_limit
: 然后,您可以将注意力集中在
price
> = lower_limit
那些temp
行中:
temp.loc[temp['price'] >= temp['lower_limit']]
The desired result can be found by computing a groupby/min
: 可以通过计算
groupby/min
找到所需的结果:
result = temp.loc[temp['price'] >= temp['lower_limit']].groupby(ABC)['price'].min()
For example, 例如,
import numpy as np
import pandas as pd
np.random.seed(2017)
N = 1000
ABC = list('ABC')
temp = pd.DataFrame(np.random.randint(2, size=(N,3)), columns=ABC)
temp['price'] = np.random.random(N)
pivoted = pd.pivot_table(temp, index=['A','B','C'],values=['price'],
aggfunc=[np.mean,np.std],fill_value=0)
pivoted['lower_limit'] = pivoted['mean'] - 2 * pivoted['std']
pivoted['upper_limit'] = pivoted['mean'] + 2 * pivoted['std']
temp = pd.merge(temp, pivoted['lower_limit'].reset_index(), on=ABC)
result = temp.loc[temp['price'] >= temp['lower_limit']].groupby(ABC)['price'].min()
print(result)
yields 产量
A B C
0 0 0 0.003628
1 0.000132
1 0 0.005833
1 0.000159
1 0 0 0.006203
1 0.000536
1 0 0.001745
1 0.025713
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.