简体   繁体   中英

Time series conditional rolling mean in 1 pandas dataframe

I am currently looking into solving a conditional rolling average. I have created a simplified data set to demonstrate: In this data set, we have 3 stores and 2 products, and their sold quantities over 4 days.

Picture of the dataset , Link to download the dataset


Considering the real data set includes thousands of stores and hundreds of products, I am trying to achieve a rolling mean calculation for each combination of store/product within the same dataframe.

By using the code below, I'm able to calculate the rolling average per line, in the same manner other data scientist calculate a 10 days or 20 days moving average for a share price :

import pandas as pd
df = pd.read_csv (r'path\ConditionalRollingMean.csv')
df['Rolling_Mean'] = df.Quantity.rolling(2).mean()

or even

df['Rolling_Mean'] = df.Quantity.rolling(window=2).mean()

The issue with this approach is that the calculation is done line by line, regardless of the store/product combination. What I am looking for is a conditional rolling mean that keeps track of the store/products combinations while going through the dataframe and line by line populates a df['Rolling_Mean'] column. (something like this )

This rolling average will then be used for a rolling standard deviation calculation, for which I have only figured out how to do it across the whole dataframe, without the rolling aspect of it.

df['mean']=df.groupby(['Quantity']).Qty.transform('mean')
df['std']=df.groupby(['Quantity']).Qty.transform('std')

It would be simpler to separate the stores/products in different dataframes and then run the df.Quantity.rolling(2).mean() function, but in the case I'm working on, it would mean creating more than 150 000 dataframes. Hence why I am trying to solve this inside 1 dataframe.

Thank you in advance for your help.

I'm not 100% sure this is what you wanted, but I just did an iteration over the dataframe's lines and did a check with if conditionals to channel the rolling mean.

import pandas as pd

data = pd.read_csv('ConditionalRollingMean.csv')
data['rolling_mean'] = 0

nstore = 0
nquant = 0

for i in range(len(data)):
    q = data['Quantity'][i]
    p = data['Product'][i]
    s = data['StoreNb'][i]

    if s == 1.0 and p == 'A':
        nstore += 1
        nquant += q
        data.loc[i,'rolling_mean'] = nquant/nstore
    else:
        data.loc[i,'rolling_mean'] = nquant/nstore

print(data)

EDIT: I wrote a version, which finds all combinations of store/product from the dataframe and creates dedicated rolling mean columns for each combination. I hope that's what you really want, because the cartesian product of thousands of stores and hundreds of products is pretty big:

import pandas as pd
import itertools as it

data = pd.read_csv('ConditionalRollingMean.csv')

# Obtain all unique stores and products and find their cartesian product.
stores = set(pd.Series(data['StoreNb']).dropna())
products = set(data['Product'].dropna())
combs = it.product(stores,products)

# iterate over every combination of store/product and calculate rolling mean.
for comb in combs:

    store, product = comb

    # Set new, empty column for combination
    name = 'rm'+str(store)+product
    data[name] = 0

    # set starting values for rolling mean.
    nstore = 0
    nquant = 0

    # iterate over lines and do conditional checks to funnel results into
    # appropreate rolling mean column
    for i in range(len(data)):
        q = data['Quantity'][i]
        p = data['Product'][i]
        s = data['StoreNb'][i]

        if s == store and p == product:
            nstore += 1
            nquant += q
            data.loc[i,name] = nquant/nstore
        else:
            if nstore == 0:
                data.loc[i,name] = 0
            else:
                data.loc[i,name] = nquant/nstore


# write dataframe to new file.
data.to_csv('res.csv')

Hope this helps.

The solution I'll be using is as follows:

df["Mean"] = df.groupby(['Store','Product'])['Quantity'].rolling(2).mean()

It gives me the output I wanted. Thank you for your input.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM