简体   繁体   English

根据经过的时间计算平均值

[英]Calculate mean based on time elapsed in Pandas

I tried to ask this question previously, but it was too ambiguous so here goes again. 我以前曾尝试问过这个问题,但是它太模棱两可了,所以再来一次。 I am new to programming, so I am still learning how to ask questions in a useful way. 我是编程新手,所以我仍在学习如何以一种有用的方式提出问题。

In summary, I have a pandas dataframe that resembles "INPUT DATA" that I would like to convert to "DESIRED OUTPUT", as shown below. 总而言之,我有一个熊猫数据框,类似于“ INPUT DATA”,我想将其转换为“ DESIRED OUTPUT”,如下所示。

Each row contains an ID, a DateTime, and a Value. 每行包含一个ID,一个DateTime和一个Value。 For each unique ID, the first row corresponds to timepoint 'zero', and each subsequent row contains a value 5 minutes following the previous row and so on. 对于每个唯一ID,第一行对应于时间点“零”,随后的每一行在前一行之后5分钟包含一个值,依此类推。

I would like to calculate the mean of all the IDs for every 'time elapsed' timepoint. 我想计算每个“经过时间”时间点所有ID的平均值。 For example, in "DESIRED OUTPUT" Time Elapsed=0.0 would have the value 128.3 (100+105+180/3); 例如,在“期望的输出”中,经过的时间= 0.0将具有值128.3(100 + 105 + 180/3); Time Elapsed=5.0 would have the value 150.0 (150+110+190/3); 经过的时间= 5.0将具有值150.0(150 + 110 + 190/3); Time Elapsed=10.0 would have the value 133.3 (125+90+185/3) and so on for Time Elapsed=15,20,25 etc. 经过的时间= 10.0的值为133.3(125 + 90 + 185/3),依此类推,对于经过的时间= 15、20、25等。

I'm not sure how to create a new column which has the value for the time elapsed for each ID (eg 0.0, 5.0, 10.0 etc). 我不确定如何创建一个新列,该列具有每个ID所用时间的值(例如0.0、5.0、10.0等)。 I think that once I know how to do that, then I can use the groupby function to calculate the means for each time elapsed. 我认为,一旦我知道该怎么做,就可以使用groupby函数来计算每次经过的均值。

INPUT DATA 输入数据

ID  DateTime            Value
1   2018-01-01 15:00:00 100
1   2018-01-01 15:05:00 150
1   2018-01-01 15:10:00 125
2   2018-02-02 13:15:00 105
2   2018-02-02 13:20:00 110
2   2018-02-02 13:25:00 90
3   2019-03-03 05:05:00 180
3   2019-03-03 05:10:00 190
3   2019-03-03 05:15:00 185

DESIRED OUTPUT 期望的输出


Time Elapsed    Mean Value
0.0             128.3
5.0             150.0
10.0            133.3

Here is one way , using transform with groupby get the group key 'Time Elapsed' , then just groupby it get the mean 这是一种方法, transform groupbytransform一起使用可获取组键'Time Elapsed' ,然后仅groupby获得mean

df['Time Elapsed']=df.DateTime-df.groupby('ID').DateTime.transform('first')
df.groupby('Time Elapsed').Value.mean()
Out[998]: 
Time Elapsed
00:00:00    128.333333
00:05:00    150.000000
00:10:00    133.333333
Name: Value, dtype: float64

You can do this explicitly by taking advantage of the datetime attributes of the DateTime column in your DataFrame 您可以利用DataFrame DateTime列的datetime属性来明确地执行此DataFrame

First get the year, month and day for each DateTime since they are all changing in your data 首先获取每个DateTime的年,月和日,因为它们在数据中都在变化

df['month'] = df['DateTime'].dt.month
df['day'] = df['DateTime'].dt.day
df['year'] = df['DateTime'].dt.year

print(df)
   ID            DateTime  Value  month  day  year
1   1 2018-01-01 15:00:00    100      1    1  2018
1   1 2018-01-01 15:05:00    150      1    1  2018
1   1 2018-01-01 15:10:00    125      1    1  2018
2   2 2018-02-02 13:15:00    105      2    2  2018
2   2 2018-02-02 13:20:00    110      2    2  2018
2   2 2018-02-02 13:25:00     90      2    2  2018
3   3 2019-03-03 05:05:00    180      3    3  2019
3   3 2019-03-03 05:10:00    190      3    3  2019
3   3 2019-03-03 05:15:00    185      3    3  2019

Then append a sequential DateTime counter column (per this SO post ) 然后附加一个顺序的DateTime计数器列(根据此SO post

  • the counter is computed within (1) each year, (2) then each month and then (3) each day 计数器是在每年(1),(2)然后每个月,然后每天(3)内计算的
  • since the data are in multiples of 5 minutes, use this to scale the counter values (ie the counter will be in multiples of 5 minutes, rather than a sequence of increasing integers) 因为数据是5分钟的倍数,所以使用它来缩放计数器值(即计数器将是5分钟的倍数,而不是递增的整数序列)
df['Time Elapsed'] = df.groupby(['year', 'month', 'day']).cumcount() + 1
df['Time Elapsed'] *= 5

print(df)
   ID            DateTime  Value  month  day  year  cumulative_record
1   1 2018-01-01 15:00:00    100      1    1  2018                  5
1   1 2018-01-01 15:05:00    150      1    1  2018                 10
1   1 2018-01-01 15:10:00    125      1    1  2018                 15
2   2 2018-02-02 13:15:00    105      2    2  2018                  5
2   2 2018-02-02 13:20:00    110      2    2  2018                 10
2   2 2018-02-02 13:25:00     90      2    2  2018                 15
3   3 2019-03-03 05:05:00    180      3    3  2019                  5
3   3 2019-03-03 05:10:00    190      3    3  2019                 10
3   3 2019-03-03 05:15:00    185      3    3  2019                 15

Perform the groupby over the newly appended counter column 在新添加的计数器列上执行groupby

dfg = df.groupby('Time Elapsed')['Value'].mean()

print(dfg)
Time Elapsed
5     128.333333
10    150.000000
15    133.333333
Name: Value, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM