[英]Calculate mean based on time elapsed in Pandas
I tried to ask this question previously, but it was too ambiguous so here goes again. 我以前曾尝试问过这个问题,但是它太模棱两可了,所以再来一次。 I am new to programming, so I am still learning how to ask questions in a useful way. 我是编程新手,所以我仍在学习如何以一种有用的方式提出问题。
In summary, I have a pandas dataframe that resembles "INPUT DATA" that I would like to convert to "DESIRED OUTPUT", as shown below. 总而言之,我有一个熊猫数据框,类似于“ INPUT DATA”,我想将其转换为“ DESIRED OUTPUT”,如下所示。
Each row contains an ID, a DateTime, and a Value. 每行包含一个ID,一个DateTime和一个Value。 For each unique ID, the first row corresponds to timepoint 'zero', and each subsequent row contains a value 5 minutes following the previous row and so on. 对于每个唯一ID,第一行对应于时间点“零”,随后的每一行在前一行之后5分钟包含一个值,依此类推。
I would like to calculate the mean of all the IDs for every 'time elapsed' timepoint. 我想计算每个“经过时间”时间点所有ID的平均值。 For example, in "DESIRED OUTPUT" Time Elapsed=0.0 would have the value 128.3 (100+105+180/3); 例如,在“期望的输出”中,经过的时间= 0.0将具有值128.3(100 + 105 + 180/3); Time Elapsed=5.0 would have the value 150.0 (150+110+190/3); 经过的时间= 5.0将具有值150.0(150 + 110 + 190/3); Time Elapsed=10.0 would have the value 133.3 (125+90+185/3) and so on for Time Elapsed=15,20,25 etc. 经过的时间= 10.0的值为133.3(125 + 90 + 185/3),依此类推,对于经过的时间= 15、20、25等。
I'm not sure how to create a new column which has the value for the time elapsed for each ID (eg 0.0, 5.0, 10.0 etc). 我不确定如何创建一个新列,该列具有每个ID所用时间的值(例如0.0、5.0、10.0等)。 I think that once I know how to do that, then I can use the groupby function to calculate the means for each time elapsed. 我认为,一旦我知道该怎么做,就可以使用groupby函数来计算每次经过的均值。
INPUT DATA 输入数据
ID DateTime Value
1 2018-01-01 15:00:00 100
1 2018-01-01 15:05:00 150
1 2018-01-01 15:10:00 125
2 2018-02-02 13:15:00 105
2 2018-02-02 13:20:00 110
2 2018-02-02 13:25:00 90
3 2019-03-03 05:05:00 180
3 2019-03-03 05:10:00 190
3 2019-03-03 05:15:00 185
DESIRED OUTPUT 期望的输出
Time Elapsed Mean Value
0.0 128.3
5.0 150.0
10.0 133.3
Here is one way , using transform
with groupby
get the group key 'Time Elapsed'
, then just groupby
it get the mean
这是一种方法, transform
groupby
与transform
一起使用可获取组键'Time Elapsed'
,然后仅groupby
获得mean
df['Time Elapsed']=df.DateTime-df.groupby('ID').DateTime.transform('first')
df.groupby('Time Elapsed').Value.mean()
Out[998]:
Time Elapsed
00:00:00 128.333333
00:05:00 150.000000
00:10:00 133.333333
Name: Value, dtype: float64
You can do this explicitly by taking advantage of the datetime
attributes of the DateTime
column in your DataFrame
您可以利用DataFrame
DateTime
列的datetime
属性来明确地执行此DataFrame
First get the year, month and day for each DateTime
since they are all changing in your data 首先获取每个DateTime
的年,月和日,因为它们在数据中都在变化
df['month'] = df['DateTime'].dt.month
df['day'] = df['DateTime'].dt.day
df['year'] = df['DateTime'].dt.year
print(df)
ID DateTime Value month day year
1 1 2018-01-01 15:00:00 100 1 1 2018
1 1 2018-01-01 15:05:00 150 1 1 2018
1 1 2018-01-01 15:10:00 125 1 1 2018
2 2 2018-02-02 13:15:00 105 2 2 2018
2 2 2018-02-02 13:20:00 110 2 2 2018
2 2 2018-02-02 13:25:00 90 2 2 2018
3 3 2019-03-03 05:05:00 180 3 3 2019
3 3 2019-03-03 05:10:00 190 3 3 2019
3 3 2019-03-03 05:15:00 185 3 3 2019
Then append a sequential DateTime
counter column (per this SO post ) 然后附加一个顺序的DateTime
计数器列(根据此SO post )
df['Time Elapsed'] = df.groupby(['year', 'month', 'day']).cumcount() + 1
df['Time Elapsed'] *= 5
print(df)
ID DateTime Value month day year cumulative_record
1 1 2018-01-01 15:00:00 100 1 1 2018 5
1 1 2018-01-01 15:05:00 150 1 1 2018 10
1 1 2018-01-01 15:10:00 125 1 1 2018 15
2 2 2018-02-02 13:15:00 105 2 2 2018 5
2 2 2018-02-02 13:20:00 110 2 2 2018 10
2 2 2018-02-02 13:25:00 90 2 2 2018 15
3 3 2019-03-03 05:05:00 180 3 3 2019 5
3 3 2019-03-03 05:10:00 190 3 3 2019 10
3 3 2019-03-03 05:15:00 185 3 3 2019 15
Perform the groupby
over the newly appended counter column 在新添加的计数器列上执行groupby
dfg = df.groupby('Time Elapsed')['Value'].mean()
print(dfg)
Time Elapsed
5 128.333333
10 150.000000
15 133.333333
Name: Value, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.