繁体   English   中英

在 R 的数据表中查找字符串的第一次迭代

[英]Finding first iteration of a string in a datatable in R

我对 R 还很陌生,所以我想弄清楚如何才能做得更好。 我有一个 data.table,它包含两列(Day 和 Sleepstatus)。 我如何根据列 day 找到睡眠和清醒的第一次迭代,并改变另一列以指示人何时开始睡眠(第一行睡眠)和停止睡眠(第一行清醒)。 睡眠持续时间的 rest,该列应显示 NA

睡眠状态
1个 睡眠
1个 睡眠
1个 睡眠
1个 苏醒
2个 睡眠
2个 睡眠
2个 睡眠
2个 苏醒

所需 Output

睡眠状态 最终状态
1个 睡眠 开始睡眠
1个 睡眠 北美
1个 睡眠 停止睡眠
1个 苏醒 北美
2个 睡眠 开始睡眠
2个 睡眠 北美
2个 睡眠 停止睡眠
2个 苏醒 北美

这是一个潜在的解决方案:

library(data.table)

dt <- data.table::data.table(
          Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
  SleepStatus = c("Sleeping","Sleeping","Sleeping",
                  "Awake","Sleeping","Sleeping","Sleeping","Awake")
)

dt[, `Final Status` := {ifelse(
  cumsum(SleepStatus != "Sleeping") != shift(cumsum(SleepStatus != "Sleeping"), fill = 0, type = "lag"),
  "Stop Sleep", "Start Sleep")}]
dt[, `Final Status` := {ifelse(
  `Final Status` == shift(`Final Status`, fill = "NA", type = "lag"),
  NA, `Final Status`)}]
dt
#>    Day SleepStatus Final Status
#> 1:   1    Sleeping  Start Sleep
#> 2:   1    Sleeping         <NA>
#> 3:   1    Sleeping         <NA>
#> 4:   1       Awake   Stop Sleep
#> 5:   2    Sleeping  Start Sleep
#> 6:   2    Sleeping         <NA>
#> 7:   2    Sleeping         <NA>
#> 8:   2       Awake   Stop Sleep

如果将代码分解成更小的块,代码会更有意义。 我已经使用下面的 tidyverse 函数完成了此操作,因为我觉得它更容易理解,但如果您愿意,我可以将其更改为 data.table 语法。

library(data.table)

dt <- data.table::data.table(
          Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
  SleepStatus = c("Sleeping","Sleeping","Sleeping",
                  "Awake","Sleeping","Sleeping","Sleeping","Awake")
)

library(tidyverse)
df <- as.data.frame(dt)

# When the Sleepstatus is not "Sleeping", increment the variable by one
df2 <- df %>%
  mutate(Sleeping = cumsum(SleepStatus != "Sleeping"))
df2
#>   Day SleepStatus Sleeping
#> 1   1    Sleeping        0
#> 2   1    Sleeping        0
#> 3   1    Sleeping        0
#> 4   1       Awake        1
#> 5   2    Sleeping        1
#> 6   2    Sleeping        1
#> 7   2    Sleeping        1
#> 8   2       Awake        2

# If the previous value in "Sleeping" is different to the current value,
# add the "stop sleeping" flag (i.e. show when "Sleeping" changes)
df3 <- df2 %>%
  mutate(Sleep_label = ifelse(Sleeping != lag(Sleeping, default = 0), "Stop sleeping", "Start sleeping"))
df3
#>   Day SleepStatus Sleeping    Sleep_label
#> 1   1    Sleeping        0 Start sleeping
#> 2   1    Sleeping        0 Start sleeping
#> 3   1    Sleeping        0 Start sleeping
#> 4   1       Awake        1  Stop sleeping
#> 5   2    Sleeping        1 Start sleeping
#> 6   2    Sleeping        1 Start sleeping
#> 7   2    Sleeping        1 Start sleeping
#> 8   2       Awake        2  Stop sleeping

# Then, if the value in Sleep_label is equal to the previous label,
# change it to NA
df4 <- df3 %>%
  mutate(Final_status = ifelse(Sleep_label == lag(Sleep_label, default = "NA"), NA, Sleep_label))
df4
#>   Day SleepStatus Sleeping    Sleep_label   Final_status
#> 1   1    Sleeping        0 Start sleeping Start sleeping
#> 2   1    Sleeping        0 Start sleeping           <NA>
#> 3   1    Sleeping        0 Start sleeping           <NA>
#> 4   1       Awake        1  Stop sleeping  Stop sleeping
#> 5   2    Sleeping        1 Start sleeping Start sleeping
#> 6   2    Sleeping        1 Start sleeping           <NA>
#> 7   2    Sleeping        1 Start sleeping           <NA>
#> 8   2       Awake        2  Stop sleeping  Stop sleeping

reprex package (v2.0.1) 创建于 2022-05-20

那有意义吗? 还是我只是让事情变得更混乱了?

在 Base R 中,您可以执行以下操作:

x <- dt$SleepStatus
is.na(x) <- -cumsum(c(1,head(rle(x)$lengths,-1)))
dt$final_status <- c(Sleeping = 'Start Sleep', Awake = 'Stop Sleep')[x]
dt

  Day SleepStatus final_status
1   1    Sleeping  Start Sleep
2   1    Sleeping         <NA>
3   1    Sleeping         <NA>
4   1       Awake   Stop Sleep
5   2    Sleeping  Start Sleep
6   2    Sleeping         <NA>
7   2    Sleeping         <NA>
8   2       Awake   Stop Sleep

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM