I am currently looking into solving a conditional rolling average. I have created a simplified data set to demonstrate: In this data set, we have 3 stores and 2 products, and their sold quantities over 4 days.
Picture of the dataset , Link to download the dataset
Considering the real data set includes thousands of stores and hundreds of products, I am trying to achieve a rolling mean calculation for each combination of store/product within the same dataframe.
By using the code below, I'm able to calculate the rolling average per line, in the same manner other data scientist calculate a 10 days or 20 days moving average for a share price :
import pandas as pd
df = pd.read_csv (r'path\ConditionalRollingMean.csv')
df['Rolling_Mean'] = df.Quantity.rolling(2).mean()
or even
df['Rolling_Mean'] = df.Quantity.rolling(window=2).mean()
The issue with this approach is that the calculation is done line by line, regardless of the store/product combination. What I am looking for is a conditional rolling mean that keeps track of the store/products combinations while going through the dataframe and line by line populates a df['Rolling_Mean'] column. (something like this )
This rolling average will then be used for a rolling standard deviation calculation, for which I have only figured out how to do it across the whole dataframe, without the rolling aspect of it.
df['mean']=df.groupby(['Quantity']).Qty.transform('mean')
df['std']=df.groupby(['Quantity']).Qty.transform('std')
It would be simpler to separate the stores/products in different dataframes and then run the df.Quantity.rolling(2).mean() function, but in the case I'm working on, it would mean creating more than 150 000 dataframes. Hence why I am trying to solve this inside 1 dataframe.
Thank you in advance for your help.
I'm not 100% sure this is what you wanted, but I just did an iteration over the dataframe's lines and did a check with if conditionals to channel the rolling mean.
import pandas as pd
data = pd.read_csv('ConditionalRollingMean.csv')
data['rolling_mean'] = 0
nstore = 0
nquant = 0
for i in range(len(data)):
q = data['Quantity'][i]
p = data['Product'][i]
s = data['StoreNb'][i]
if s == 1.0 and p == 'A':
nstore += 1
nquant += q
data.loc[i,'rolling_mean'] = nquant/nstore
else:
data.loc[i,'rolling_mean'] = nquant/nstore
print(data)
EDIT: I wrote a version, which finds all combinations of store/product from the dataframe and creates dedicated rolling mean columns for each combination. I hope that's what you really want, because the cartesian product of thousands of stores and hundreds of products is pretty big:
import pandas as pd
import itertools as it
data = pd.read_csv('ConditionalRollingMean.csv')
# Obtain all unique stores and products and find their cartesian product.
stores = set(pd.Series(data['StoreNb']).dropna())
products = set(data['Product'].dropna())
combs = it.product(stores,products)
# iterate over every combination of store/product and calculate rolling mean.
for comb in combs:
store, product = comb
# Set new, empty column for combination
name = 'rm'+str(store)+product
data[name] = 0
# set starting values for rolling mean.
nstore = 0
nquant = 0
# iterate over lines and do conditional checks to funnel results into
# appropreate rolling mean column
for i in range(len(data)):
q = data['Quantity'][i]
p = data['Product'][i]
s = data['StoreNb'][i]
if s == store and p == product:
nstore += 1
nquant += q
data.loc[i,name] = nquant/nstore
else:
if nstore == 0:
data.loc[i,name] = 0
else:
data.loc[i,name] = nquant/nstore
# write dataframe to new file.
data.to_csv('res.csv')
Hope this helps.
The solution I'll be using is as follows:
df["Mean"] = df.groupby(['Store','Product'])['Quantity'].rolling(2).mean()
It gives me the output I wanted. Thank you for your input.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.