简体   繁体   English

如何使用 Python 计算时间加权平均值?

[英]How to calculate time weighted average using Python?

So i have data with irregular intervals in a day.所以我有一天中不规则间隔的数据。

Event Time活动时间 Value价值
17-5-2021 03:00 17-5-2021 03:00 84.9 84.9
17-5-2021 11:00 17-5-2021 11:00 84.9 84.9
17-5-2021 15:00 17-5-2021 15:00 84.7 84.7
17-5-2021 23:00 17-5-2021 23:00 84.7 84.7
18-5-2021 03:00 18-5-2021 03:00 84.5 84.5
18-5-2021 11:00 18-5-2021 11:00 84.5 84.5
18-5-2021 15:00 18-5-2021 15:00 84.9 84.9
18-5-2021 23:00 18-5-2021 23:00 84.9 84.9

I want to calculate time weighted average using python on the above data as value was only 83.7 for 37.5% (9 hours out of 24) where as if calculate normal average it will be accounted for 50% for 17-5-2021.我想在上述数据上使用 python 计算时间加权平均值,因为 37.5%(24 小时中有 9 小时)的值仅为 83.7,而如果计算正常平均值,它将占 17-5-2021 的 50%。

Assumption: If we don't have value for particular interval then last available value is taken eg: value at 17-5-2021 04:00 is 84.9 as that was the last available value.假设:如果我们没有特定间隔的值,则采用最后一个可用值,例如:17-5-2021 04:00 的值为 84.9,因为这是最后一个可用值。 Any input would be helpful as I am not able to figure a right way to approach this.任何输入都会有所帮助,因为我无法找到解决此问题的正确方法。 Expected output:预期输出:

Please see the image for Calculation计算请看图片

Final result最后结果

Event Time活动时间 Weighted Average加权平均
17-5-2021 17-5-2021 84.79166 84.79166
18-5-2021 18-5-2021 84.71666 84.71666

Once you've parsed the data appropriately, you can use datetime to translate the dates/times as, for example,适当地解析数据后,您可以使用datetime将日期/时间转换为,例如,

from datetime import datetime
datetime.strptime('17-5-2021 03:00','%d-%m-%Y %H:%M')

This will create a datetime object datetime.datetime(2021, 5, 17, 3, 0) .这将创建一个datetime对象datetime.datetime(2021, 5, 17, 3, 0)

The timedelta object can then be computed between two subsequent (valid) values, just by subtracting the two datetime objects.然后可以在两个后续(有效)值之间计算timedelta对象,只需将两个datetime对象相减即可。 To get a weight for the value, you could use the .total_seconds() method of the resulting timedelta object.要获得该值的权重,您可以使用生成的timedelta对象的.total_seconds()方法。

For example, these two entries 17-5-2021 11:00 84.9 17-5-2021 15:00 84.7 may be used to compute the weight for the 2nd as例如,这两个条目17-5-2021 11:00 84.9 17-5-2021 15:00 84.7可用于计算第二个的权重为

w=(datetime.strptime(t2,'%d-%m-%Y %H:%M')-datetime.strptime(t1,'%d-%m-%Y %H:%M')).total_seconds()

where, of course,当然,在哪里

t1='17-5-2021 11:00'
t2='17-5-2021 15:00'

and the result is w=14400.结果是 w=14400。

Assuming you have your data in a list of tuples, as假设您的数据位于元组列表中,如

b="""17-5-2021 03:00        84.9
17-5-2021 11:00     84.9
17-5-2021 15:00     84.7
17-5-2021 23:00     84.7
18-5-2021 03:00     84.5
18-5-2021 11:00     84.5
18-5-2021 15:00     84.9
18-5-2021 23:00     84.9""".split()
items=[(' '.join(b[i:i+2]),float(b[i+2])) for i in range(0,len(b),3)]

which yield for items items产量

[('17-5-2021 03:00', 84.9), ('17-5-2021 11:00', 84.9), ('17-5-2021 15:00', 84.7), ('17-5-2021 23:00', 84.7), ('18-5-2021 03:00', 84.5), ('18-5-2021 11:00', 84.5), ('18-5-2021 15:00', 84.9), ('18-5-2021 23:00', 84.9)]

then you can sum up each individual (w * val) and divide by the total duration in the end, as然后你可以总结每个人(w * val)并最终除以总持续时间,如

t1,val1=items[0]
dt1=datetime.strptime(t1,'%d-%m-%Y %H:%M')
dt0=dt1
result=0.
for item in items[1:]:
  t2,val2=item
  if val2==None: val2=val1 # if value doesn't exist, use previous
  dt2=datetime.strptime(t2,'%d-%m-%Y %H:%M')
  result+=val2*(dt2-dt1).total_seconds()
  dt1=dt2
  val1=val2

result/=(dt1-dt0).total_seconds()

If the value is not available, I've assumed None .如果该值不可用,我假设None This won't work if the first value doesn't exist, of course.当然,如果第一个值不存在,这将不起作用。

I'll just mention, for the table you've provided, the result is 84.73636363636363 .我只想提一下,对于您提供的表,结果是84.73636363636363

I think you can use pandas diff , groupby and rolling function to achieve this.我认为您可以使用 pandas diffgroupby滚动功能来实现这一点。 You can use the following steps to achieve this:您可以使用以下步骤来实现此目的:

  1. Convert Event Time into datetime将事件时间转换为日期时间
  2. Calculate the time difference between consecutive times using diff function and get the difference as seconds using total_seconds and divide it by 3600 to convert in hours.使用 diff 函数计算连续时间之间的时间差,并使用 total_seconds 将差值作为秒计算,然后除以 3600 以小时为单位进行转换。
  3. Compute the weighted values by taking product of Value and time difference通过取值和时间差的乘积计算加权值
  4. Compute the weighted average using rolling function.使用滚动函数计算加权平均值。 Keep the window length as 2. Divide this by sum of hours in the period.保持窗口长度为 2。将其除以期间内的小时数总和。 Here it is 12 hrs这里是12小时
  5. Compute the daily average of weighted values using groupby and transform.使用 groupby 和 transform 计算加权值的每日平均值。 Time start is 12AM时间开始是 12AM
  6. Compute Rolling daily average by setting datetimeindex and passing window as 1D.通过设置日期时间索引和传递窗口为 1D 来计算滚动日平均值。
import pandas as pd

df = pd.read_csv('test.csv')
df['Event Time'] = pd.to_datetime(df['Event Time'])
df['Time Diff'] = df['Event Time'].diff(periods=1).dt.total_seconds()/3600
df['Time Diff'] = df['Time Diff'].fillna(4) 
# You dont need to do the above step in large data. Dropping would be better for large data
df['Weighted Value'] = df['Value']*df['Time Diff']
# calculate the weighted average based on number of periods
df['Weighted Average'] = df['Weighted Value'].rolling(2).sum()/12
# calculate average for each day.day starts at 12AM
df['Daily Weighted Fixed Window'] = df.groupby(df['Event Time'].dt.date)['Weighted Value'].transform('sum')/24
# calculate the weighted average for last one day (stats from current time minus 24 hours)
df.set_index('Event Time', inplace=True)
df['Daily Weighted Rolling'] = df['Weighted Value'].rolling('1D').sum()/24 
Event Time活动时间 Value价值 Time Diff时差 Weighted Value加权值 Weighted Average加权平均 Daily Weighted Fixed Window每日加权固定窗口 Daily Weighted Rolling每日加权滚动
2021-05-17 03:00:00 2021-05-17 03:00:00 84.9 84.9 4 4 339.6 339.6 nan 84.8 84.8 14.15 14.15
2021-05-17 11:00:00 2021-05-17 11:00:00 84.9 84.9 8 8 679.2 679.2 84.9 84.9 84.8 84.8 42.45 42.45
2021-05-17 15:00:00 2021-05-17 15:00:00 84.7 84.7 4 4 338.8 338.8 84.8333 84.8333 84.8 84.8 56.5667 56.5667
2021-05-17 23:00:00 2021-05-17 23:00:00 84.7 84.7 8 8 677.6 677.6 84.7 84.7 84.8 84.8 84.8 84.8
2021-05-18 03:00:00 2021-05-18 03:00:00 84.5 84.5 4 4 338 338 84.6333 84.6333 84.7 84.7 84.7333 84.7333
2021-05-18 11:00:00 2021-05-18 11:00:00 84.5 84.5 8 8 676 676 84.5 84.5 84.7 84.7 84.6 84.6
2021-05-18 15:00:00 2021-05-18 15:00:00 84.9 84.9 4 4 339.6 339.6 84.6333 84.6333 84.7 84.7 84.6333 84.6333
2021-05-18 23:00:00 2021-05-18 23:00:00 84.9 84.9 8 8 679.2 679.2 84.9 84.9 84.7 84.7 84.7 84.7

I have update the answer.我已经更新了答案。 If you need anything more let me know.如果您还需要什么,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM