简体   繁体   English

在数据框中选择时间窗口

[英]Selecting time-window in a dataframe

I have a dataframe, df, which looks like this: 我有一个数据帧,df,看起来像这样:

                     HeartRate_smooth
2018-01-01 00:07:00  58.000000
2018-01-01 00:13:00  59.333333
2018-01-01 00:14:00  57.333333
2018-01-01 00:20:00  59.333333
2018-01-01 00:21:00  59.333333
2018-01-01 00:22:00  57.333333
2018-01-01 00:34:00  59.666667
2018-01-01 00:36:00  58.666667
2018-01-01 00:37:00  57.666667
2018-01-01 00:38:00  55.000000
2018-01-01 00:39:00  58.333333
2018-01-01 01:03:00  57.666667
2018-01-01 01:08:00  59.666667
2018-01-01 01:09:00  56.333333
2018-01-01 01:10:00  54.666667
2018-01-01 01:32:00  59.666667
2018-01-01 01:33:00  57.000000
2018-01-01 01:34:00  54.333333
2018-01-01 01:56:00  56.000000
2018-01-01 01:57:00  58.000000
2018-01-01 01:58:00  59.000000
2018-01-01 02:03:00  59.666667
2018-01-01 02:07:00  58.666667
2018-01-01 03:00:00  59.666667
2018-01-01 03:09:00  59.333333
2018-01-01 03:10:00  58.333333
2018-01-01 03:31:00  58.666667
2018-01-01 10:46:00  59.666667
2018-01-01 12:40:00  58.333333
2018-01-01 14:42:00  59.000000

This dataframe is collection of the timepoints for when the patient's heartrate is bellow a threshold. 该数据帧是患者心率低于阈值时的时间点的集合。 I am assuming that these points are either when the patient is at rest or asleep. 我假设这些要点是患者休息或睡着时。 I am trying to find away where I can identify the period where the patient is asleep. 我试图找到可以识别患者睡眠时间的地方。 I assume the patient is asleep when there is data present for more than an hour with less than 30mins interval between each row of a time period. 我假设当数据存在超过一小时时患者处于睡眠状态,并且每段时间间隔之间的间隔小于30分钟。

In the given dataframe, I can assume that the patient is asleep from 00:07 to 02:07. 在给定的数据框架中,我可以假设患者在00:07到02:07睡着了。 This is because there is less than 30mins of missing data between each row from 00:07 to 02:07. 这是因为从00:07到02:07,每行之间的缺失数据不到30分钟。 The row that comes after 02:07 has a time difference of more than 30mins and so I assume that the patient has woken. 在02:07之后出现的行的时间差超过30分钟,因此我假设患者已经醒来。

Please note that I would be looping through multiple patient data and the period that the patient is asleep will be different. 请注意,我将循环访问多个患者数据,患者睡着的时间段会有所不同。 It may not always begin from the first entry in the dataframe. 它可能并不总是从数据帧中的第一个条目开始。

My questions are: 我的问题是:
1. How would I identify the period that the patient is asleep and split the current dataframe into 2, where one of the dfs is used to store data when the patient is asleep and the other, when the patient is awake? 1.我如何识别患者睡眠的时间段并将当前数据帧拆分为2,其中一个dfs用于在患者睡着时存储数据,另一个用于患者清醒时?
2. This is not neccessary, but if possible, how can I print out the time and amount of time that the patient is asleep? 2.这不是必要的,但如果可能,我如何打印出患者睡着的时间和时间?

Sample data output based on sample dataframe provided: 根据提供的样本数据框输出样本数据:
Asleep_df: Asleep_df:

                     HeartRate_smooth
2018-01-01 00:07:00  58.000000
2018-01-01 00:13:00  59.333333
2018-01-01 00:14:00  57.333333
2018-01-01 00:20:00  59.333333
2018-01-01 00:21:00  59.333333
2018-01-01 00:22:00  57.333333
2018-01-01 00:34:00  59.666667
2018-01-01 00:36:00  58.666667
2018-01-01 00:37:00  57.666667
2018-01-01 00:38:00  55.000000
2018-01-01 00:39:00  58.333333
2018-01-01 01:03:00  57.666667
2018-01-01 01:08:00  59.666667
2018-01-01 01:09:00  56.333333
2018-01-01 01:10:00  54.666667
2018-01-01 01:32:00  59.666667
2018-01-01 01:33:00  57.000000
2018-01-01 01:34:00  54.333333
2018-01-01 01:56:00  56.000000
2018-01-01 01:57:00  58.000000
2018-01-01 01:58:00  59.000000
2018-01-01 02:03:00  59.666667
2018-01-01 02:07:00  58.666667

Awake_df: Awake_df:

                     HeartRate_smooth
2018-01-01 03:00:00  59.666667
2018-01-01 03:09:00  59.333333
2018-01-01 03:10:00  58.333333
2018-01-01 03:31:00  58.666667
2018-01-01 10:46:00  59.666667
2018-01-01 12:40:00  58.333333
2018-01-01 14:42:00  59.000000

"Patient was asleep from 00:07 to 03:31 for 3Hours and 24 minutes" “患者在00:07到03:31睡着了3小时24分钟”

I find it's easier to handle time which is not index: 我觉得处理时间不是索引更容易:

df.reset_index(inplace=True)

# df now has a timestamp column named 'index'

# difference with previous row larger than 30 mins
# cumsum for consecutive block:
df['block'] = df['index'].diff().dt.seconds.ge(30*60).cumsum()

# all sleep chunks
awake_df = (df.set_index('index')
              .groupby('block')[['HeartRate_smooth']]
              .apply(lambda x: x if len(x) > 1 else None)
           )

Output awake_df : 输出awake_df

+--------+----------------------+-------------------+
|        |                      | HeartRate_smooth  |
+--------+----------------------+-------------------+
| block  | index                |                   |
+--------+----------------------+-------------------+    
| 0      | 2018-01-01 00:07:00  | 58.000000         |
|        | 2018-01-01 00:13:00  | 59.333333         |
|        | 2018-01-01 00:14:00  | 57.333333         |
|        | 2018-01-01 00:20:00  | 59.333333         |
|        | 2018-01-01 00:21:00  | 59.333333         |
|        | 2018-01-01 00:22:00  | 57.333333         |
|        | 2018-01-01 00:34:00  | 59.666667         |
|        | 2018-01-01 00:36:00  | 58.666667         |
|        | 2018-01-01 00:37:00  | 57.666667         |
|        | 2018-01-01 00:38:00  | 55.000000         |
|        | 2018-01-01 00:39:00  | 58.333333         |
|        | 2018-01-01 01:03:00  | 57.666667         |
|        | 2018-01-01 01:08:00  | 59.666667         |
|        | 2018-01-01 01:09:00  | 56.333333         |
|        | 2018-01-01 01:10:00  | 54.666667         |
|        | 2018-01-01 01:32:00  | 59.666667         |
|        | 2018-01-01 01:33:00  | 57.000000         |
|        | 2018-01-01 01:34:00  | 54.333333         |
|        | 2018-01-01 01:56:00  | 56.000000         |
|        | 2018-01-01 01:57:00  | 58.000000         |
|        | 2018-01-01 01:58:00  | 59.000000         |
|        | 2018-01-01 02:03:00  | 59.666667         |
|        | 2018-01-01 02:07:00  | 58.666667         |
| 1      | 2018-01-01 03:00:00  | 59.666667         |
|        | 2018-01-01 03:09:00  | 59.333333         |
|        | 2018-01-01 03:10:00  | 58.333333         |
|        | 2018-01-01 03:31:00  | 58.666667         |
+--------+----------------------+-------------------+  

Note that there are two sleeping chunks since your data actually has a 53 min gap between 02:07 to 03:00 . 请注意,有两个睡眠块,因为您的数据实际上在02:0703:00之间有53分钟的间隙。 And to get the sleeping time: 并获得睡眠时间:

(awake_df.reset_index(level=1)
         .groupby('block')['index']
         .apply(lambda x: x.max()-x.min())
)

gives: 得到:

block
0     02:00:00
1     00:22:00
Name: index, dtype: timedelta64[ns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM