简体   繁体   English

1个熊猫数据框中的时间序列条件滚动平均值

[英]Time series conditional rolling mean in 1 pandas dataframe

I am currently looking into solving a conditional rolling average. 我目前正在寻找解决条件滚动平均值的方法。 I have created a simplified data set to demonstrate: In this data set, we have 3 stores and 2 products, and their sold quantities over 4 days. 我创建了一个简化的数据集来演示:在此数据集中,我们有3家商店和2种产品,其在4天内的销售量。

Picture of the dataset , Link to download the dataset 数据集图片链接以下载数据集


Considering the real data set includes thousands of stores and hundreds of products, I am trying to achieve a rolling mean calculation for each combination of store/product within the same dataframe. 考虑到实际数据集包括数千个商店和数百种产品,我正在尝试为同一数据帧内的商店/产品的每种组合实现滚动均值计算。

By using the code below, I'm able to calculate the rolling average per line, in the same manner other data scientist calculate a 10 days or 20 days moving average for a share price : 通过使用下面的代码,我能够以其他数据科学家计算股价的10天或20天移动平均值的方式 ,计算每行的滚动平均值:

import pandas as pd
df = pd.read_csv (r'path\ConditionalRollingMean.csv')
df['Rolling_Mean'] = df.Quantity.rolling(2).mean()

or even 甚至

df['Rolling_Mean'] = df.Quantity.rolling(window=2).mean()

The issue with this approach is that the calculation is done line by line, regardless of the store/product combination. 这种方法的问题在于,不管商店/产品组合如何,都逐行进行计算。 What I am looking for is a conditional rolling mean that keeps track of the store/products combinations while going through the dataframe and line by line populates a df['Rolling_Mean'] column. 我正在寻找的是一种有条件的滚动平均值,它在遍历数据框的同时跟踪商店/产品组合,并逐行填充df ['Rolling_Mean']列。 (something like this ) (像这样

This rolling average will then be used for a rolling standard deviation calculation, for which I have only figured out how to do it across the whole dataframe, without the rolling aspect of it. 然后,该滚动平均值将用于滚动标准偏差计算,为此,我只想出了如何在整个数据帧中做到这一点,而没有滚动方面。

df['mean']=df.groupby(['Quantity']).Qty.transform('mean')
df['std']=df.groupby(['Quantity']).Qty.transform('std')

It would be simpler to separate the stores/products in different dataframes and then run the df.Quantity.rolling(2).mean() function, but in the case I'm working on, it would mean creating more than 150 000 dataframes. 将商店/产品分离到不同的数据框中然后运行df.Quantity.rolling(2).mean()函数会更简单,但是在我正在研究的情况下,这意味着创建超过15万个数据框。 Hence why I am trying to solve this inside 1 dataframe. 因此,为什么我要在1个数据框中解决这个问题。

Thank you in advance for your help. 预先感谢您的帮助。

I'm not 100% sure this is what you wanted, but I just did an iteration over the dataframe's lines and did a check with if conditionals to channel the rolling mean. 我不是100%确定这就是您想要的,但是我只是对数据框的行进行了一次迭代,并检查了是否有条件引导滚动平均值。

import pandas as pd

data = pd.read_csv('ConditionalRollingMean.csv')
data['rolling_mean'] = 0

nstore = 0
nquant = 0

for i in range(len(data)):
    q = data['Quantity'][i]
    p = data['Product'][i]
    s = data['StoreNb'][i]

    if s == 1.0 and p == 'A':
        nstore += 1
        nquant += q
        data.loc[i,'rolling_mean'] = nquant/nstore
    else:
        data.loc[i,'rolling_mean'] = nquant/nstore

print(data)

EDIT: I wrote a version, which finds all combinations of store/product from the dataframe and creates dedicated rolling mean columns for each combination. 编辑:我编写了一个版本,该版本从数据框中查找商店/产品的所有组合,并为每个组合创建专用的滚动平均值列。 I hope that's what you really want, because the cartesian product of thousands of stores and hundreds of products is pretty big: 我希望这是您真正想要的,因为成千上万家商店和数百种产品的笛卡尔乘积非常大:

import pandas as pd
import itertools as it

data = pd.read_csv('ConditionalRollingMean.csv')

# Obtain all unique stores and products and find their cartesian product.
stores = set(pd.Series(data['StoreNb']).dropna())
products = set(data['Product'].dropna())
combs = it.product(stores,products)

# iterate over every combination of store/product and calculate rolling mean.
for comb in combs:

    store, product = comb

    # Set new, empty column for combination
    name = 'rm'+str(store)+product
    data[name] = 0

    # set starting values for rolling mean.
    nstore = 0
    nquant = 0

    # iterate over lines and do conditional checks to funnel results into
    # appropreate rolling mean column
    for i in range(len(data)):
        q = data['Quantity'][i]
        p = data['Product'][i]
        s = data['StoreNb'][i]

        if s == store and p == product:
            nstore += 1
            nquant += q
            data.loc[i,name] = nquant/nstore
        else:
            if nstore == 0:
                data.loc[i,name] = 0
            else:
                data.loc[i,name] = nquant/nstore


# write dataframe to new file.
data.to_csv('res.csv')

Hope this helps. 希望这可以帮助。

The solution I'll be using is as follows: 我将使用的解决方案如下:

df["Mean"] = df.groupby(['Store','Product'])['Quantity'].rolling(2).mean()

It gives me the output I wanted. 它给了我想要的输出。 Thank you for your input. 谢谢您的意见。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM