简体   繁体   English

计算简单移动平均线 pandas for loop

[英]Calculating simple moving average pandas for loop

Im currently trying to calculate the simple moving average on a dataset of several stocks.我目前正在尝试计算几只股票数据集的简单移动平均线。 Im trying the code on just two companies (and 4 days time) for simplicity to get it working, but there seem to be some problem with the output.为了简单起见,我只在两家公司(和 4 天的时间)上尝试了代码以使其正常工作,但 output 似乎存在一些问题。 Below is my code.下面是我的代码。

for index, row in df3.iloc[4:].iterrows():
    if df3.loc[index,'CompanyId'] == df3.loc[index-4,'CompanyId']:
        df3['SMA4'] = df3.iloc[:,1].rolling(window=4).mean()
    else:
        df3['SMA4'] = 0

And here is the output: Output这是 output: Output

The dataframe is sorted by date and company id. dataframe 按日期和公司 ID 排序。 So what needs to happen is that when the company id are not equal as stated in the code, the output should be zero since i cant calculate a moving average of two different companies.所以需要发生的是,当公司 id 不等于代码中所述时,output 应该为零,因为我无法计算两家不同公司的移动平均值。 Instead it output a moving average over both companies like at row 7,8,9.相反,它 output 是两家公司的移动平均线,如第 7、8、9 行。

Use groupby.rolling使用groupby.rolling

df['SMA4']=df.groupby('CompanyId',sort=False).rolling(window=4).Price.mean().reset_index(drop='CompanyId')
print(df)

    CompanyId  Price   SMA4
0           1     75    NaN
1           1     74    NaN
2           1     77    NaN
3           1     78  76.00
4           1     80  77.25
5           1     79  78.50
6           1     80  79.25
7           0     10    NaN
8           0      9    NaN
9           0     12    NaN
10          0     11  10.50
11          0     11  10.75
12          0      8  10.50
13          0      9   9.75
14          0      8   9.00
15          0      8   8.25
16          0     11   9.00

While ansev is right that you should use the specialized function because manual loops are much slower, I want to show why your code didn't work: In both the if branch and the else branch, the entire SMA4 column gets assigned to ( df3['SMA4'] ), and because on the last run through the loop, the if statement is true, so the else statement doesn't have any effect and SMA4 is never 0. So to fix that you could first create the column populated with rolling averages (note that this is not in a for loop):虽然 ansev 是正确的,你应该使用专门的 function 因为手动循环要慢得多,但我想说明为什么你的代码不起作用:在 if 分支和 else 分支中,整个 SMA4 列都分配给( df3['SMA4'] ),并且因为在循环的最后一次运行中,if 语句为真,所以 else 语句没有任何效果,并且 SMA4 永远不会为 0。所以要解决这个问题,您可以首先创建填充有的列滚动平均值(请注意,这不在 for 循环中):

df3['SMA4'] = df3.iloc[:,1].rolling(window=4).mean()

And then you run the loop to set invalid rows to 0 (though nan would be better. I kept in the other bugs, assuming that the numbers in ansev's answer are correct):然后运行循环将无效行设置为 0(尽管 nan 会更好。我保留了其他错误,假设 ansev 答案中的数字是正确的):

for index, row in df3.iloc[4:].iterrows(): 
    if df3.loc[index,'CompanyId'] != df3.loc[index-4,'CompanyId']: 
        df3.loc[index,'SMA4'] = 0 

Output (probably still buggy): Output(可能仍然有问题):

    CompanyId  Price   SMA4
0           1     75    NaN
1           1     74    NaN
2           1     77    NaN
3           1     78  76.00
4           1     80  77.25
5           1     79  78.50
6           1     80  79.25
7           2     10   0.00
8           2      9   0.00
9           2     12   0.00
10          2     11   0.00
11          2     11  10.75
12          2      8  10.50
13          2      9   9.75
14          2      8   9.00
15          2      8   8.25
16          2     11   9.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM