简体   繁体   English

如何在Python / SQLAlchemy / Flask中计算累积移动平均值

[英]How to calculate cumulative moving average in Python/SQLAlchemy/Flask

I'll give some context so it makes sense. 我会提供一些上下文,这样才有意义。 I'm capturing Customer Ratings for Products in a table (Rating) and want to be able to return a Cumulative Moving Average of the ratings based on time. 我要在表(评分)中捕获产品的客户评分,并希望能够基于时间返回评分的累积移动平均值。

A basic example follows taking a rating per day: 下面是一个基本示例,每天进行一次评分:

02 FEB - Rating: 5 - Cum Avg: 5
03 FEB - Rating: 4 - Cum Avg: (5+4)/2 = 4.5
04 FEB - Rating: 1 - Cum Avg: (5+4+1)/3 = 3.3
05 FEB - Rating: 5 - Cum Avg: (5+4+1+5)/4 = 3.75
Etc...

I'm trying to think of an approach that won't scale horribly. 我正在尝试一种不会可怕地扩展的方法。

My current idea is to have a function that is tripped when a row is inserted into the Rating table that works out the Cum Avg based on the previous row for that product 我当前的想法是要有一个函数,当在“评级”表中插入一行时,该函数会根据该产品的前一行计算出“累计平均值”

So the fields would be something like: 因此,字段将类似于:

TABLE: Rating
| RatingId | DateTime | ProdId | RatingVal | RatingCnt | CumAvg |

But this seems like a fairly dodgy way to store the data. 但这似乎是一种相当狡猾的存储数据的方式。

What would be the (or any) way to accomplish this? (或任何一种)方式可以做到这一点? If I was to use the 'trigger' of sorts, how do you go about doing that in SQLAlchemy? 如果要使用“触发”功能,您如何在SQLAlchemy中进行操作?

Any and all advice appreciated! 任何和所有建议表示赞赏!

I don't know about SQLAlchemy, but I might use an approach like this: 我不了解SQLAlchemy,但我可能会使用如下方法:

  • Store the cumulative average and rating count separately from individual ratings. 分别将累积平均值和等级计数与各个等级分开存储。
  • Every time you get a new rating, update the cumulative average and rating count: 每次获得新的评分时,请更新累积平均值和评分计数:
    • new_count = old_count + 1 new_count = old_count + 1
    • new_average = ((old_average * old_count) + new_rating) / new_count new_average =(((old_average * old_count)+ new_rating)/ new_count
  • Optionally, store a row for each new rating. (可选)为每个新评分存储一行。

Updating the average and rating count could be done with a single SQL statement. 可以使用单个SQL语句来更新平均值和评级数。

I think you should store the MA in a 2 element list, it would be much more simple: 我认为您应该将MA存储在2个元素列表中,这会更加简单:

#first rating 5 is rating number 0
a = [5,0]

#next:
for i in rating:
a = [(a[0]*a[1]+lastRating)/(a[1]+1),a[1]+1]

Bye 再见

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM