简体   繁体   English

如何聚合特定范围内的时间序列数据?

[英]How to aggregate time-series data over specific ranges?

I have a pandas dataframe that looks like this, whereby each row represents data collected on a different day (days 1 -> 5) for each participant (long form).我有一个 pandas dataframe 看起来像这样,其中每一行代表每个参与者在不同的一天(第 1 天 - > 5 天)收集的数据(长格式)。

ID    Heart_Rate
1         89
1         98
1         99 
1         73 
1         54
...
24        88
24        90
24        79
24        92
24        97

How can I aggregate the data over the first 3 days for each participant such that I create a new data frame with 1 row for each patient whereby the data represents the mean heart rate over 72 hours.如何汇总每个参与者前 3 天的数据,以便为每个患者创建一个包含 1 行的新数据框,其中数据代表 72 小时内的平均心率。

We can set the index of dataframe to ID then group the dataframe on level=0 and aggregate using head to select first three rows for each user ID then take mean on level=0 to get the average heart rate over the first 72 hours:我们可以将 dataframe 的index设置为ID ,然后将mean group到 level level=0 level=0并使用head聚合到72的前三行,以获得每个用户ID的前 7 个平均心率:

out = df.set_index('ID').groupby(level=0).head(3).mean(level=0)

Alternate approach which is more efficient but applicable only if there are always equal number of rows present corresponding to each user ID and dataframe is sorted on ID column:有效但仅适用于每个用户ID对应的行数始终相等且 dataframe 在ID列上排序的替代方法:

n_days = 5 # Number of rows present for each user ID
n_days_to_avg = 3 # First n rows/days to average

m = np.isin(np.r_[:len(df)] % n_days, np.r_[:n_days_to_avg])
out = df[m].groupby('ID').mean()

>>> out

    Heart_Rate
ID            
1    95.333333
24   85.666667

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM