简体   繁体   English

基于特定标准对 R 中的序列进行子集化

[英]Subsetting sequences in R based on certain criteria

I would like to know if there is a way of subsetting a huge R dataframe [df] so that only certain sequences remain for each group [device].我想知道是否有一种方法可以对巨大的 R dataframe [df] 进行子集化,以便每个组 [device] 只保留某些序列。

I have a dataframe [df] like this:我有一个 dataframe [df] 这样的:

id   device   date                pressure    
1    B3       2020-04-15 08:00    112         
2    B3       2020-04-15 09:00    100         
3    B3       2020-04-15 10:00    89          
4    B3       2020-04-15 11:00    90          
5    B3       2020-04-15 12:00    60          
6    B3       2020-04-15 13:00    28          
7    B3       2020-04-16 09:00    120         
8    B3       2020-04-16 10:00    80          
9    B3       2020-04-16 11:00    73          
10   B3       2020-04-16 12:00    61          
11   B3       2020-04-16 13:00    30   

I would like to get only the rows where the pressure drops from 120 down to 60 [or first value lower than 60].我只想获得压力从 120 下降到 60 [或第一个值低于 60] 的行。

The expected result would be as follows:预期结果如下:

id   device   date                pressure    group
1    B3       2020-04-15 08:00    112         1
2    B3       2020-04-15 09:00    100         1
3    B3       2020-04-15 10:00    89          1
4    B3       2020-04-15 11:00    90          1
5    B3       2020-04-15 12:00    60          1
7    B3       2020-04-16 09:00    120         2
8    B3       2020-04-16 10:00    80          2
9    B3       2020-04-16 11:00    73          2
10   B3       2020-04-16 12:00    61          2
11   B3       2020-04-16 13:00    30          2

Would this be possible?这可能吗? Thank you for any suggestions.谢谢你的任何建议。

You can create a new group when the current value is greater than 60 and the previous value was less than 60 and select only the rows till we encounter first row less than equal to 60.当当前值大于 60 且之前的值小于 60 和 select 时,您可以创建一个新组,直到我们遇到第一行小于等于 60 的行。

library(dplyr)
df %>%
  group_by(device, 
           group = cumsum(pressure > 60 & lag(pressure, default = 0) < 60)) %>%
  slice(seq_len(which.max(pressure <= 60)))

#      id device date            pressure group
#   <int> <chr>  <chr>              <int> <int>
# 1     1 B3     2020-04-1508:00      112     1
# 2     2 B3     2020-04-1509:00      100     1
# 3     3 B3     2020-04-1510:00       89     1
# 4     4 B3     2020-04-1511:00       90     1
# 5     5 B3     2020-04-1512:00       60     1
# 6     7 B3     2020-04-1609:00      120     2
# 7     8 B3     2020-04-1610:00       80     2
# 8     9 B3     2020-04-1611:00       73     2
# 9    10 B3     2020-04-1612:00       61     2
#10    11 B3     2020-04-1613:00       30     2

If you want to do it without dplyr and pipes, you can loop through the pressures to annotate the groups:如果你想在没有dplyr和管道的情况下这样做,你可以循环遍历压力来注释组:

d$group=NA
d$group[1]=1
for(i in 2:dim(d)[1]){
  if(d$pressure[i]>60 & d$pressure[i-1] < 60){
    d$group[i]=d$group[i-1]+1
  } else if (d$pressure[i]>d$pressure[i-1] & d$pressure[i]<60){
    d$group[i]=d$group[i-1]+1
  } else{
    d$group[i]=d$group[i-1]
  }
}

In such an if-elise if block, you can add as many different conditions as you want (eg changing devices, changing dates,...)在这样的 if-elise if 块中,您可以根据需要添加任意数量的不同条件(例如更改设备、更改日期...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM