简体   繁体   中英

wrangling data using r

I need your kind help tidying data using R.

My original data looks like this:

   > dput(mydata)
structure(list(subject = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), .Label = c("N1", "E1"), class = "factor"), item_number = c(1, 
2, 1, 7, 1, 2, 2, 10), block = c(1, 1, 3, 3, 1, 1, 3, 3), condition = c("L", 
"L", "EI", "I", "L", "L", "EI", "I")), row.names = c(NA, 8L), class = "data.frame")

 > mydata
  subject item_number block condition
1      N1           1     1         L
2      N1           2     1         L
3      N1           1     3        EI
4      N1           7     3         I
5      E1           1     1         L
6      E1           2     1         L
7      E1           2     3        EI
8      E1          10     3         I

For some programming error, I could not label conditions in block 1 correctly. So, I am trying to adjust that by renaming condition in block 1 for different subjects and for different item numbers. Ideally, any item_number in block 1 that is given the value L for condition should be renamed based on the condition label given to the same item_number in block 3. For example, for the subject N1, if the item_number 1 exists in block 3 and is given the label EI for condition, then, the condition label for item_number 1 in block 1 should be set to the same label which is 'EI'. If the item_number 2 does not exist in block 3 for subject N1, then the condition label for item number 2 in block 1 should be 'E'.

The desired output should look like this:

dput(mydata_cleaned)
structure(list(subject = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), .Label = c("N1", "E1"), class = "factor"), item_number = c(1, 
2, 1, 7, 1, 2, 2, 10), block = c(1, 1, 3, 3, 1, 1, 3, 3), condition = c("EI", 
"E", "EI", "I", "E", "EI", "EI", "I")), row.names = c(NA, 8L), class = "data.frame")
> mydata_cleaned
  subject item_number block condition
1      N1           1     1        EI
2      N1           2     1         E
3      N1           1     3        EI
4      N1           7     3         I
5      E1           1     1         E
6      E1           2     1        EI
7      E1           2     3        EI
8      E1          10     3         I

Any help is greatly appreciated.

An option is to reshape to 'wide' format with column names created from 'block', then do the replacement on the column 1 based on values of 3 and reshape back to 'long' format

library(dplyr)
library(tidyr)
mydata %>%
 pivot_wider(names_from = block, values_from = condition) %>% 
 mutate(`1` = case_when(`3` %in% "EI" & `1` %in% "L"  ~ `3`, 
       is.na(`3`) ~ 'E', TRUE ~ `1`)) %>%
 pivot_longer(cols = c(`1`, `3`), names_to = 'block',
          values_to = 'condition', values_drop_na = TRUE)

-output

# A tibble: 8 x 4
#  subject item_number block condition
#  <fct>         <dbl> <chr> <chr>    
#1 N1                1 1     EI       
#2 N1                1 3     EI       
#3 N1                2 1     E        
#4 N1                7 3     I        
#5 E1                1 1     E        
#6 E1                2 1     EI       
#7 E1                2 3     EI       
#8 E1               10 3     I       

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM