简体   繁体   中英

R: Best practice to add information to data in wide format and transform it to long format

I am trying to transform some time course data from wide format into long format, which also requires adding some more information to the original data. The original data follows this structure:

ID      Exp_01 Exp_02 Exp_03 Exp_04 Exp_05 ...
"AA"    0.01   0.02   0.03   0.05   0.01
"BB"    0.01   0.02   0.03   0.05   0.01
"CC"    0.01   0.02   0.03   0.05   0.01
"DD"    0.01   0.02   0.03   0.05   0.01
"EE"    0.01   0.02   0.03   0.05   0.01
...

The "Exp_XY" tags are a time series with 7 time points that are each measured three times under three different experimental conditions, so they continue until "Exp_63".

To illustrate, Exp_01, Exp_02, Exp_03 belong to time point t1 for condition cond1 . Exp_04, Exp_05, Exp_06 belong to time point t2 for condition cond1 , and so on.

I managed to transform the data into long format with pivot_longer with this command:

raw %>% pivot_longer(!ID, names_to="experimentID", values_to="count")

I want to add the information of the time points as well as the experimental condition to the original data, transform it into long format, and use it for downstream analysis. However, I am stuck at this point, and hope that somebody could help to solve the following questions:

  1. What is the best practice to add information about the time points and experimental conditions to the original data. Is it better to add it before or after the transformation into long format?

  2. Depending on the answer for 1), how is the information actually added?

The desired output would look something like this:

ID   experimentID time condition count
"AA" "Exp_01"     "t1" "cond1"   0.01
"AA" "Exp_02"     "t1" "cond1"   0.02
"AA" "Exp_03"     "t1" "cond1"   0.03

"AA" "Exp_04"     "t2" "cond1"   0.05
"AA" "Exp_05"     "t2" "cond1"   0.01
"AA" "Exp_06"     "t2" "cond1"   0.03
...
"AA" "Exp_61"     "t7" "cond3"   0.05
"AA" "Exp_62"     "t7" "cond3"   0.05
"AA" "Exp_63"     "t7" "cond3"   0.05
"BB" "Exp_01"     "t1" "cond1"   0.01
"BB" "Exp_02"     "t1" "cond1"   0.02
...

I'd appreciate any help!

You can make use of rep and paste0 to create time and condition columns.

library(dplyr)
library(tidyr)

result <- raw %>% 
  pivot_longer(!ID, names_to="experimentID", values_to="count") %>%
  group_by(ID) %>%
  mutate(time = rep(paste0('time', 1:7), each = 3, 3), 
         condition = rep(paste0('cond', 1:3), each = 21))

result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM