[英]Time series conditional rolling mean in 1 pandas dataframe
I am currently looking into solving a conditional rolling average. 我目前正在寻找解决条件滚动平均值的方法。 I have created a simplified data set to demonstrate: In this data set, we have 3 stores and 2 products, and their sold quantities over 4 days.
我创建了一个简化的数据集来演示:在此数据集中,我们有3家商店和2种产品,其在4天内的销售量。
Picture of the dataset , Link to download the dataset 数据集图片 , 链接以下载数据集
Considering the real data set includes thousands of stores and hundreds of products, I am trying to achieve a rolling mean calculation for each combination of store/product within the same dataframe. 考虑到实际数据集包括数千个商店和数百种产品,我正在尝试为同一数据帧内的商店/产品的每种组合实现滚动均值计算。
By using the code below, I'm able to calculate the rolling average per line, in the same manner other data scientist calculate a 10 days or 20 days moving average for a share price : 通过使用下面的代码,我能够以其他数据科学家计算股价的10天或20天移动平均值的方式 ,计算每行的滚动平均值:
import pandas as pd
df = pd.read_csv (r'path\ConditionalRollingMean.csv')
df['Rolling_Mean'] = df.Quantity.rolling(2).mean()
or even 甚至
df['Rolling_Mean'] = df.Quantity.rolling(window=2).mean()
The issue with this approach is that the calculation is done line by line, regardless of the store/product combination. 这种方法的问题在于,不管商店/产品组合如何,都逐行进行计算。 What I am looking for is a conditional rolling mean that keeps track of the store/products combinations while going through the dataframe and line by line populates a df['Rolling_Mean'] column.
我正在寻找的是一种有条件的滚动平均值,它在遍历数据框的同时跟踪商店/产品组合,并逐行填充df ['Rolling_Mean']列。 (something like this )
(像这样 )
This rolling average will then be used for a rolling standard deviation calculation, for which I have only figured out how to do it across the whole dataframe, without the rolling aspect of it. 然后,该滚动平均值将用于滚动标准偏差计算,为此,我只想出了如何在整个数据帧中做到这一点,而没有滚动方面。
df['mean']=df.groupby(['Quantity']).Qty.transform('mean')
df['std']=df.groupby(['Quantity']).Qty.transform('std')
It would be simpler to separate the stores/products in different dataframes and then run the df.Quantity.rolling(2).mean() function, but in the case I'm working on, it would mean creating more than 150 000 dataframes. 将商店/产品分离到不同的数据框中然后运行df.Quantity.rolling(2).mean()函数会更简单,但是在我正在研究的情况下,这意味着创建超过15万个数据框。 Hence why I am trying to solve this inside 1 dataframe.
因此,为什么我要在1个数据框中解决这个问题。
Thank you in advance for your help. 预先感谢您的帮助。
I'm not 100% sure this is what you wanted, but I just did an iteration over the dataframe's lines and did a check with if conditionals to channel the rolling mean. 我不是100%确定这就是您想要的,但是我只是对数据框的行进行了一次迭代,并检查了是否有条件引导滚动平均值。
import pandas as pd
data = pd.read_csv('ConditionalRollingMean.csv')
data['rolling_mean'] = 0
nstore = 0
nquant = 0
for i in range(len(data)):
q = data['Quantity'][i]
p = data['Product'][i]
s = data['StoreNb'][i]
if s == 1.0 and p == 'A':
nstore += 1
nquant += q
data.loc[i,'rolling_mean'] = nquant/nstore
else:
data.loc[i,'rolling_mean'] = nquant/nstore
print(data)
EDIT: I wrote a version, which finds all combinations of store/product from the dataframe and creates dedicated rolling mean columns for each combination. 编辑:我编写了一个版本,该版本从数据框中查找商店/产品的所有组合,并为每个组合创建专用的滚动平均值列。 I hope that's what you really want, because the cartesian product of thousands of stores and hundreds of products is pretty big:
我希望这是您真正想要的,因为成千上万家商店和数百种产品的笛卡尔乘积非常大:
import pandas as pd
import itertools as it
data = pd.read_csv('ConditionalRollingMean.csv')
# Obtain all unique stores and products and find their cartesian product.
stores = set(pd.Series(data['StoreNb']).dropna())
products = set(data['Product'].dropna())
combs = it.product(stores,products)
# iterate over every combination of store/product and calculate rolling mean.
for comb in combs:
store, product = comb
# Set new, empty column for combination
name = 'rm'+str(store)+product
data[name] = 0
# set starting values for rolling mean.
nstore = 0
nquant = 0
# iterate over lines and do conditional checks to funnel results into
# appropreate rolling mean column
for i in range(len(data)):
q = data['Quantity'][i]
p = data['Product'][i]
s = data['StoreNb'][i]
if s == store and p == product:
nstore += 1
nquant += q
data.loc[i,name] = nquant/nstore
else:
if nstore == 0:
data.loc[i,name] = 0
else:
data.loc[i,name] = nquant/nstore
# write dataframe to new file.
data.to_csv('res.csv')
Hope this helps. 希望这可以帮助。
The solution I'll be using is as follows: 我将使用的解决方案如下:
df["Mean"] = df.groupby(['Store','Product'])['Quantity'].rolling(2).mean()
It gives me the output I wanted. 它给了我想要的输出。 Thank you for your input.
谢谢您的意见。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.