[英]How to aggregate time-series data over specific ranges?
I have a pandas dataframe that looks like this, whereby each row represents data collected on a different day (days 1 -> 5) for each participant (long form).我有一个 pandas dataframe 看起来像这样,其中每一行代表每个参与者在不同的一天(第 1 天 - > 5 天)收集的数据(长格式)。
ID Heart_Rate
1 89
1 98
1 99
1 73
1 54
...
24 88
24 90
24 79
24 92
24 97
How can I aggregate the data over the first 3 days for each participant such that I create a new data frame with 1 row for each patient whereby the data represents the mean heart rate over 72 hours.如何汇总每个参与者前 3 天的数据,以便为每个患者创建一个包含 1 行的新数据框,其中数据代表 72 小时内的平均心率。
We can set the index
of dataframe to ID
then group
the dataframe on level=0
and aggregate using head
to select first three rows for each user ID
then take mean
on level=0
to get the average heart rate over the first 72
hours:我们可以将 dataframe 的index
设置为ID
,然后将mean
group
到 level level=0
level=0
并使用head
聚合到72
的前三行,以获得每个用户ID
的前 7 个平均心率:
out = df.set_index('ID').groupby(level=0).head(3).mean(level=0)
Alternate approach which is more efficient but applicable only if there are always equal number of rows present corresponding to each user ID
and dataframe is sorted on ID
column:更有效但仅适用于每个用户ID
对应的行数始终相等且 dataframe 在ID
列上排序的替代方法:
n_days = 5 # Number of rows present for each user ID
n_days_to_avg = 3 # First n rows/days to average
m = np.isin(np.r_[:len(df)] % n_days, np.r_[:n_days_to_avg])
out = df[m].groupby('ID').mean()
>>> out
Heart_Rate
ID
1 95.333333
24 85.666667
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.