[英]How to calculate time weighted average using Python?
So i have data with irregular intervals in a day.所以我有一天中不规则间隔的数据。
Event Time![]() |
Value![]() |
---|---|
17-5-2021 03:00 ![]() |
84.9 ![]() |
17-5-2021 11:00 ![]() |
84.9 ![]() |
17-5-2021 15:00 ![]() |
84.7 ![]() |
17-5-2021 23:00 ![]() |
84.7 ![]() |
18-5-2021 03:00 ![]() |
84.5 ![]() |
18-5-2021 11:00 ![]() |
84.5 ![]() |
18-5-2021 15:00 ![]() |
84.9 ![]() |
18-5-2021 23:00 ![]() |
84.9 ![]() |
I want to calculate time weighted average using python on the above data as value was only 83.7 for 37.5% (9 hours out of 24) where as if calculate normal average it will be accounted for 50% for 17-5-2021.我想在上述数据上使用 python 计算时间加权平均值,因为 37.5%(24 小时中有 9 小时)的值仅为 83.7,而如果计算正常平均值,它将占 17-5-2021 的 50%。
Assumption: If we don't have value for particular interval then last available value is taken eg: value at 17-5-2021 04:00 is 84.9 as that was the last available value.假设:如果我们没有特定间隔的值,则采用最后一个可用值,例如:17-5-2021 04:00 的值为 84.9,因为这是最后一个可用值。 Any input would be helpful as I am not able to figure a right way to approach this.
任何输入都会有所帮助,因为我无法找到解决此问题的正确方法。 Expected output:
预期输出:
Please see the image for Calculation计算请看图片
Final result最后结果
Event Time![]() |
Weighted Average![]() |
---|---|
17-5-2021 ![]() |
84.79166 ![]() |
18-5-2021 ![]() |
84.71666 ![]() |
Once you've parsed the data appropriately, you can use datetime
to translate the dates/times as, for example,适当地解析数据后,您可以使用
datetime
将日期/时间转换为,例如,
from datetime import datetime
datetime.strptime('17-5-2021 03:00','%d-%m-%Y %H:%M')
This will create a datetime
object datetime.datetime(2021, 5, 17, 3, 0)
.这将创建一个
datetime
对象datetime.datetime(2021, 5, 17, 3, 0)
。
The timedelta
object can then be computed between two subsequent (valid) values, just by subtracting the two datetime
objects.然后可以在两个后续(有效)值之间计算
timedelta
对象,只需将两个datetime
对象相减即可。 To get a weight for the value, you could use the .total_seconds()
method of the resulting timedelta
object.要获得该值的权重,您可以使用生成的
timedelta
对象的.total_seconds()
方法。
For example, these two entries 17-5-2021 11:00 84.9 17-5-2021 15:00 84.7
may be used to compute the weight for the 2nd as例如,这两个条目
17-5-2021 11:00 84.9 17-5-2021 15:00 84.7
可用于计算第二个的权重为
w=(datetime.strptime(t2,'%d-%m-%Y %H:%M')-datetime.strptime(t1,'%d-%m-%Y %H:%M')).total_seconds()
where, of course,当然,在哪里
t1='17-5-2021 11:00'
t2='17-5-2021 15:00'
and the result is w=14400.结果是 w=14400。
Assuming you have your data in a list of tuples, as假设您的数据位于元组列表中,如
b="""17-5-2021 03:00 84.9
17-5-2021 11:00 84.9
17-5-2021 15:00 84.7
17-5-2021 23:00 84.7
18-5-2021 03:00 84.5
18-5-2021 11:00 84.5
18-5-2021 15:00 84.9
18-5-2021 23:00 84.9""".split()
items=[(' '.join(b[i:i+2]),float(b[i+2])) for i in range(0,len(b),3)]
which yield for items
items
产量
[('17-5-2021 03:00', 84.9), ('17-5-2021 11:00', 84.9), ('17-5-2021 15:00', 84.7), ('17-5-2021 23:00', 84.7), ('18-5-2021 03:00', 84.5), ('18-5-2021 11:00', 84.5), ('18-5-2021 15:00', 84.9), ('18-5-2021 23:00', 84.9)]
then you can sum up each individual (w * val) and divide by the total duration in the end, as然后你可以总结每个人(w * val)并最终除以总持续时间,如
t1,val1=items[0]
dt1=datetime.strptime(t1,'%d-%m-%Y %H:%M')
dt0=dt1
result=0.
for item in items[1:]:
t2,val2=item
if val2==None: val2=val1 # if value doesn't exist, use previous
dt2=datetime.strptime(t2,'%d-%m-%Y %H:%M')
result+=val2*(dt2-dt1).total_seconds()
dt1=dt2
val1=val2
result/=(dt1-dt0).total_seconds()
If the value is not available, I've assumed None
.如果该值不可用,我假设
None
。 This won't work if the first value doesn't exist, of course.当然,如果第一个值不存在,这将不起作用。
I'll just mention, for the table you've provided, the result is 84.73636363636363
.我只想提一下,对于您提供的表,结果是
84.73636363636363
。
I think you can use pandas diff , groupby and rolling function to achieve this.我认为您可以使用 pandas diff 、 groupby和滚动功能来实现这一点。 You can use the following steps to achieve this:
您可以使用以下步骤来实现此目的:
import pandas as pd
df = pd.read_csv('test.csv')
df['Event Time'] = pd.to_datetime(df['Event Time'])
df['Time Diff'] = df['Event Time'].diff(periods=1).dt.total_seconds()/3600
df['Time Diff'] = df['Time Diff'].fillna(4)
# You dont need to do the above step in large data. Dropping would be better for large data
df['Weighted Value'] = df['Value']*df['Time Diff']
# calculate the weighted average based on number of periods
df['Weighted Average'] = df['Weighted Value'].rolling(2).sum()/12
# calculate average for each day.day starts at 12AM
df['Daily Weighted Fixed Window'] = df.groupby(df['Event Time'].dt.date)['Weighted Value'].transform('sum')/24
# calculate the weighted average for last one day (stats from current time minus 24 hours)
df.set_index('Event Time', inplace=True)
df['Daily Weighted Rolling'] = df['Weighted Value'].rolling('1D').sum()/24
Event Time![]() |
Value![]() |
Time Diff![]() |
Weighted Value![]() |
Weighted Average![]() |
Daily Weighted Fixed Window![]() |
Daily Weighted Rolling![]() |
---|---|---|---|---|---|---|
2021-05-17 03:00:00 ![]() |
84.9 ![]() |
4 ![]() |
339.6 ![]() |
nan![]() |
84.8 ![]() |
14.15 ![]() |
2021-05-17 11:00:00 ![]() |
84.9 ![]() |
8 ![]() |
679.2 ![]() |
84.9 ![]() |
84.8 ![]() |
42.45 ![]() |
2021-05-17 15:00:00 ![]() |
84.7 ![]() |
4 ![]() |
338.8 ![]() |
84.8333 ![]() |
84.8 ![]() |
56.5667 ![]() |
2021-05-17 23:00:00 ![]() |
84.7 ![]() |
8 ![]() |
677.6 ![]() |
84.7 ![]() |
84.8 ![]() |
84.8 ![]() |
2021-05-18 03:00:00 ![]() |
84.5 ![]() |
4 ![]() |
338 ![]() |
84.6333 ![]() |
84.7 ![]() |
84.7333 ![]() |
2021-05-18 11:00:00 ![]() |
84.5 ![]() |
8 ![]() |
676 ![]() |
84.5 ![]() |
84.7 ![]() |
84.6 ![]() |
2021-05-18 15:00:00 ![]() |
84.9 ![]() |
4 ![]() |
339.6 ![]() |
84.6333 ![]() |
84.7 ![]() |
84.6333 ![]() |
2021-05-18 23:00:00 ![]() |
84.9 ![]() |
8 ![]() |
679.2 ![]() |
84.9 ![]() |
84.7 ![]() |
84.7 ![]() |
I have update the answer.我已经更新了答案。 If you need anything more let me know.
如果您还需要什么,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.