简体   繁体   中英

R dplyr - select values from one column based on position of a specific value in another column

I am working with gait-cycle data. I have 8 events marked for each id and gait trial. The values "LFCH" and "RFCH" occurs twice in each trial, as these represent the beginning and the end of the gait cycles from left and right leg.

Sample Data Frame:

df <- data.frame(ID = rep(1:5, each = 16),
                 Gait_nr = rep(1:2, each = 8, times=5),
                 Frame = rep(c(1,5,7,9,10,15,22,25), times = 10),
                 Marks = rep(c("LFCH", "LHL", "RFCH", "LTO", "RHL", "LFCH", "RTO", "RFCH"), times =10) 

head(df,8)
  ID Gait_nr Frame Marks
1  1       1     1  LFCH
2  1       1     5   LHL
3  1       1     7  RFCH
4  1       1     9   LTO
5  1       1    10   RHL
6  1       1    15  LFCH
7  1       1    22   RTO
8  1       1    25  RFCH

I wold like to create something like

Total_gait_left = Frame[The last time Marks == "LFCH"] - Frame[The first time Marks == "LFCH"]

My current code solves the problem, but depends on the position of the Frame values rather than actual values in Marks. Any individual not following the normal gait pattern will have wrong values produced by the code.

library(tidyverse)
l <- df %>% group_by(ID, Gait_nr) %>% filter(grepl("L.+", Marks)) %>%
  summarize(Total_gait = Frame[4] - Frame[1],
            Side = "left")

r <- df %>% group_by(ID, Gait_nr) %>% filter(grepl("R.+", Marks)) %>%
  summarize(Total_gait = Frame[4] - Frame[1],
            Side = "right")

val <- union(l,r, by=c("ID", "Gait_nr", "Side")) %>% arrange(ID, Gait_nr, Side)

Can you help me make my code more stable by helping me change eg Frame[4] to something like Frame[Marks=="LFCH" the last time ]?

If both LFCH and RFCH happen exactly twice, you can filter and then use diff in summarize :

df %>% 
    group_by(ID, Gait_nr) %>% 
    summarise(
        left = diff(Frame[Marks == 'LFCH']), 
        right = diff(Frame[Marks == 'RFCH'])
    )

# A tibble: 10 x 4
# Groups:   ID [?]
#      ID Gait_nr  left right
#   <int>   <int> <dbl> <dbl>
# 1     1       1    14    18
# 2     1       2    14    18
# 3     2       1    14    18
# 4     2       2    14    18
# 5     3       1    14    18
# 6     3       2    14    18
# 7     4       1    14    18
# 8     4       2    14    18
# 9     5       1    14    18
#10     5       2    14    18

We can use first and last from the dplyr package.

library(dplyr)

df2 <- df %>%
  filter(Marks %in% "LFCH") %>%
  group_by(ID, Gait_nr) %>%
  summarise(Total_gait = last(Frame) - first(Frame)) %>%
  ungroup()
df2
# # A tibble: 10 x 3
#       ID Gait_nr Total_gait
#    <int>   <int>      <dbl>
#  1     1       1         14
#  2     1       2         14
#  3     2       1         14
#  4     2       2         14
#  5     3       1         14
#  6     3       2         14
#  7     4       1         14
#  8     4       2         14
#  9     5       1         14
# 10     5       2         14

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM