In psychology and related disciplines, we often have a ton of variable name pairs, which are appended eg "_T1" or "_T3" to signify time points.
I would like to substract columns with the appendix "_T1" from the ones with appendix "_T3" for each row, creating a new column (ie difference score) for every row (ie participant), based on each variable pair.
Would prefer a dplyr solution, but anything goes.
Apologies for egregiously breaking any codes of conduct on this first post of mine.
A solution using dplyr
and tidyr
.
First, let's create an example data frame. This data frame contains T1
and T3
data from two participants A
and B
.
# Set the seed for reproducibility
set.seed(123)
# Create an example data frame
dt <- data.frame(ID = 1:10,
A_T1 = runif(10),
A_T3 = runif(10),
B_T1 = runif(10),
B_T3 = runif(10))
dt
# ID A_T1 A_T3 B_T1 B_T3
# 1 1 0.2875775 0.95683335 0.8895393 0.96302423
# 2 2 0.7883051 0.45333416 0.6928034 0.90229905
# 3 3 0.4089769 0.67757064 0.6405068 0.69070528
# 4 4 0.8830174 0.57263340 0.9942698 0.79546742
# 5 5 0.9404673 0.10292468 0.6557058 0.02461368
# 6 6 0.0455565 0.89982497 0.7085305 0.47779597
# 7 7 0.5281055 0.24608773 0.5440660 0.75845954
# 8 8 0.8924190 0.04205953 0.5941420 0.21640794
# 9 9 0.5514350 0.32792072 0.2891597 0.31818101
# 10 10 0.4566147 0.95450365 0.1471136 0.23162579
We can use dplyr
and tidyr
to convert the data frame from wide format to long format and perform the operation. Diff
is the difference between T1
and T3
.
# Load packages
library(dplyr)
library(tidyr)
dt2 <- dt %>%
gather(Column, Value, -ID) %>%
separate(Column, into = c("Participant", "Group")) %>%
spread(Group, Value) %>%
mutate(Diff = T1 - T3)
dt2
# ID Participant T1 T3 Diff
# 1 1 A 0.2875775 0.95683335 -0.66925583
# 2 1 B 0.8895393 0.96302423 -0.07348492
# 3 2 A 0.7883051 0.45333416 0.33497098
# 4 2 B 0.6928034 0.90229905 -0.20949564
# 5 3 A 0.4089769 0.67757064 -0.26859371
# 6 3 B 0.6405068 0.69070528 -0.05019846
# 7 4 A 0.8830174 0.57263340 0.31038400
# 8 4 B 0.9942698 0.79546742 0.19880236
# 9 5 A 0.9404673 0.10292468 0.83754260
# 10 5 B 0.6557058 0.02461368 0.63109211
# 11 6 A 0.0455565 0.89982497 -0.85426847
# 12 6 B 0.7085305 0.47779597 0.23073450
# 13 7 A 0.5281055 0.24608773 0.28201775
# 14 7 B 0.5440660 0.75845954 -0.21439351
# 15 8 A 0.8924190 0.04205953 0.85035951
# 16 8 B 0.5941420 0.21640794 0.37773408
# 17 9 A 0.5514350 0.32792072 0.22351430
# 18 9 B 0.2891597 0.31818101 -0.02902127
# 19 10 A 0.4566147 0.95450365 -0.49788891
# 20 10 B 0.1471136 0.23162579 -0.08451214
If the original format is desirable, we can further spread
the data frame to the original format.
dt3 <- dt2 %>%
select(-starts_with("T")) %>%
spread(Participant, Diff)
dt3
# ID A B
# 1 1 -0.6692558 -0.07348492
# 2 2 0.3349710 -0.20949564
# 3 3 -0.2685937 -0.05019846
# 4 4 0.3103840 0.19880236
# 5 5 0.8375426 0.63109211
# 6 6 -0.8542685 0.23073450
# 7 7 0.2820178 -0.21439351
# 8 8 0.8503595 0.37773408
# 9 9 0.2235143 -0.02902127
# 10 10 -0.4978889 -0.08451214
Assuming all data are in dataframe d
, the following will store the variables in columns ending in _diff
:
library(stringr)
t1_vars <- grep("_T1", colnames(d), value=TRUE)
t3_vars <- grep("_T3", colnames(d), value=TRUE)
d[, paste0(str_sub(t1_vars, end=-4), "_diff")] <- d[, t3_vars] - d[, t1_vars]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.