简体   繁体   English

如何将我的 data.frame 调整为值/秒?

[英]How to adjust my data.frame to value/second?

I have to do some analysis on the heart-rate (HR) measurement values of a device.我必须对设备的心率 (HR) 测量值进行一些分析。 Howver, this device gives a very odd output of HR/second.然而,这个设备给出了一个非常奇怪的 output HR/秒。 There is a column called 'duration' of how many seconds a certain HR was measured for, than in the same row there is a value for HR in the column 'heart_rate' and then there is another column with a date and time stamp called 'startdate'.有一列称为“持续时间”,表示测量某个 HR 的秒数,而在同一行中,“heart_rate”列中有 HR 的值,然后还有另一列带有日期和时间戳,称为“开始日期'。 However, the duration given for example in row 2 (let's say it is 3) is the duration from the startdate time stamp of row 1 (it means startdate in row 1 would be for example 06.09.21 07:24:23 and in row 2 06.09.21 07:24:26) and therefore this duration in row 2 depicts for how many seconds the 'heart_rate' value in row 1 was measured.但是,例如在第 2 行中给出的持续时间(假设它是 3)是从第 1 行的 startdate 时间戳开始的持续时间(这意味着第 1 行中的 startdate 将是例如 06.09.21 07:24:23 和行2 06.09.21 07:24:26),因此第 2 行中的持续时间描述了测量第 1 行中的“heart_rate”值的秒数。 It looks like this:它看起来像这样:

   duration heart_rate startdate          
      <dbl>      <dbl> <dttm>             
 1        1         74 2021-09-06 07:25:33
 2        1         74 2021-09-06 07:25:34
 3        2         71 2021-09-06 07:25:36
 4        4         71 2021-09-06 07:25:40
 5        2         72 2021-09-06 07:25:42
 6        6         72 2021-09-06 07:25:48
 7        2         74 2021-09-06 07:25:50
 8        5         76 2021-09-06 07:25:55
 9        4         75 2021-09-06 07:25:59
10        2         75 2021-09-06 07:26:01

I adjusted the 10 rows above to the desired format manually.我手动将上面的 10 行调整为所需的格式。 What I want it to look is this:我希望它看起来是这样的:

  duration heart_rate startdate          
      <dbl>      <dbl> <dttm>             
 1        1         74 2021-09-06 07:25:33
 2        1         74 2021-09-06 07:25:34
 3        1         74 2021-09-06 07:25:35
 4        1         71 2021-09-06 07:25:36
 5        1         71 2021-09-06 07:25:37
 6        1         71 2021-09-06 07:25:38
 7        1         71 2021-09-06 07:25:39
 8        1         71 2021-09-06 07:25:40
 9        1         71 2021-09-06 07:25:41
10        1         72 2021-09-06 07:25:42
11        1         72 2021-09-06 07:25:43
12        1         72 2021-09-06 07:25:44
13        1         72 2021-09-06 07:25:45
14        1         72 2021-09-06 07:25:46
15        1         72 2021-09-06 07:25:47
16        1         72 2021-09-06 07:25:48
17        1         72 2021-09-06 07:25:49
18        1         74 2021-09-06 07:25:50
19        1         74 2021-09-06 07:25:51
20        1         74 2021-09-06 07:25:52
21        1         74 2021-09-06 07:25:53
22        1         74 2021-09-06 07:25:54
23        1         76 2021-09-06 07:25:55
24        1         76 2021-09-06 07:25:56
25        1         76 2021-09-06 07:25:57
26        1         76 2021-09-06 07:25:58
27        1         75 2021-09-06 07:25:59
28        1         75 2021-09-06 07:26:00

Additionally it is crucial to get the time stamp for every second within the whole data.frame, because the device produces alot of NA values, so I'd like to see for which time periods (when and how many seconds) the data is missing.此外,获取整个 data.frame 中每一秒的时间戳至关重要,因为设备会产生很多 NA 值,所以我想查看数据丢失的时间段(何时以及多少秒) . I am new to R and this is a new kind of challenge I did not even closely had to handle so far, so I am kind of lost right now, as I have no idea on how to tackle this properly.我是 R 的新手,这是一种新的挑战,到目前为止我什至没有密切处理,所以我现在有点迷茫,因为我不知道如何正确解决这个问题。 Thank you everyone for your help!谢谢你们每一个人的帮助!

sounds like a job for a rolling join.. (using data.table )听起来像是滚动连接的工作..(使用data.table

library(data.table)
# sample data
DT <- fread("   duration heart_rate startdate          
        1         74 2021-09-06T07:25:33
        1         74 2021-09-06T07:25:34
        2         71 2021-09-06T07:25:36
        4         71 2021-09-06T07:25:40
        2         72 2021-09-06T07:25:42
        6         72 2021-09-06T07:25:48
        2         74 2021-09-06T07:25:50
        5         76 2021-09-06T07:25:55
        4         75 2021-09-06T07:25:59
        2         75 2021-09-06T07:26:01")
DT[, startdate := as.POSIXct(startdate, "%Y-%m-%dT%H:%M:%S")]

# create new data.table by second
DT2 <- data.table( timestamp = seq(min(DT$startdate), max(DT$startdate), by = 1))
# join in data using a rolling join
DT2[, heart_rate := DT[DT2, heart_rate, on = .(startdate = timestamp), roll = Inf]]
#               timestamp heart_rate
#  1: 2021-09-06 07:25:33         74
#  2: 2021-09-06 07:25:34         74
#  3: 2021-09-06 07:25:35         74
#  4: 2021-09-06 07:25:36         71
#  5: 2021-09-06 07:25:37         71
#  6: 2021-09-06 07:25:38         71
#  7: 2021-09-06 07:25:39         71
#  8: 2021-09-06 07:25:40         71
#  9: 2021-09-06 07:25:41         71
# 10: 2021-09-06 07:25:42         72
# 11: 2021-09-06 07:25:43         72
# 12: 2021-09-06 07:25:44         72
# 13: 2021-09-06 07:25:45         72
# 14: 2021-09-06 07:25:46         72
# 15: 2021-09-06 07:25:47         72
# 16: 2021-09-06 07:25:48         72
# 17: 2021-09-06 07:25:49         72
# 18: 2021-09-06 07:25:50         74
# 19: 2021-09-06 07:25:51         74
# 20: 2021-09-06 07:25:52         74
# 21: 2021-09-06 07:25:53         74
# 22: 2021-09-06 07:25:54         74
# 23: 2021-09-06 07:25:55         76
# 24: 2021-09-06 07:25:56         76
# 25: 2021-09-06 07:25:57         76
# 26: 2021-09-06 07:25:58         76
# 27: 2021-09-06 07:25:59         75
# 28: 2021-09-06 07:26:00         75
# 29: 2021-09-06 07:26:01         75
#               timestamp heart_rate

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM