简体   繁体   中英

Create new columns by substracting column pairs from each other in R

In psychology and related disciplines, we often have a ton of variable name pairs, which are appended eg "_T1" or "_T3" to signify time points.

I would like to substract columns with the appendix "_T1" from the ones with appendix "_T3" for each row, creating a new column (ie difference score) for every row (ie participant), based on each variable pair.

Would prefer a dplyr solution, but anything goes.

Apologies for egregiously breaking any codes of conduct on this first post of mine.

A solution using dplyr and tidyr .

First, let's create an example data frame. This data frame contains T1 and T3 data from two participants A and B .

# Set the seed for reproducibility
set.seed(123)

# Create an example data frame
dt <- data.frame(ID = 1:10,
                 A_T1 = runif(10),
                 A_T3 = runif(10),
                 B_T1 = runif(10),
                 B_T3 = runif(10))
dt
#     ID      A_T1       A_T3      B_T1       B_T3
#  1   1 0.2875775 0.95683335 0.8895393 0.96302423
#  2   2 0.7883051 0.45333416 0.6928034 0.90229905
#  3   3 0.4089769 0.67757064 0.6405068 0.69070528
#  4   4 0.8830174 0.57263340 0.9942698 0.79546742
#  5   5 0.9404673 0.10292468 0.6557058 0.02461368
#  6   6 0.0455565 0.89982497 0.7085305 0.47779597
#  7   7 0.5281055 0.24608773 0.5440660 0.75845954
#  8   8 0.8924190 0.04205953 0.5941420 0.21640794
#  9   9 0.5514350 0.32792072 0.2891597 0.31818101
# 10  10 0.4566147 0.95450365 0.1471136 0.23162579

We can use dplyr and tidyr to convert the data frame from wide format to long format and perform the operation. Diff is the difference between T1 and T3 .

# Load packages
library(dplyr)
library(tidyr)

dt2 <- dt %>%
  gather(Column, Value, -ID) %>%
  separate(Column, into = c("Participant", "Group")) %>%
  spread(Group, Value) %>%
  mutate(Diff = T1 - T3)

dt2
#    ID Participant        T1         T3        Diff
# 1   1           A 0.2875775 0.95683335 -0.66925583
# 2   1           B 0.8895393 0.96302423 -0.07348492
# 3   2           A 0.7883051 0.45333416  0.33497098
# 4   2           B 0.6928034 0.90229905 -0.20949564
# 5   3           A 0.4089769 0.67757064 -0.26859371
# 6   3           B 0.6405068 0.69070528 -0.05019846
# 7   4           A 0.8830174 0.57263340  0.31038400
# 8   4           B 0.9942698 0.79546742  0.19880236
# 9   5           A 0.9404673 0.10292468  0.83754260
# 10  5           B 0.6557058 0.02461368  0.63109211
# 11  6           A 0.0455565 0.89982497 -0.85426847
# 12  6           B 0.7085305 0.47779597  0.23073450
# 13  7           A 0.5281055 0.24608773  0.28201775
# 14  7           B 0.5440660 0.75845954 -0.21439351
# 15  8           A 0.8924190 0.04205953  0.85035951
# 16  8           B 0.5941420 0.21640794  0.37773408
# 17  9           A 0.5514350 0.32792072  0.22351430
# 18  9           B 0.2891597 0.31818101 -0.02902127
# 19 10           A 0.4566147 0.95450365 -0.49788891
# 20 10           B 0.1471136 0.23162579 -0.08451214

If the original format is desirable, we can further spread the data frame to the original format.

dt3 <- dt2 %>%
  select(-starts_with("T")) %>%
  spread(Participant, Diff)

dt3
#    ID          A           B
# 1   1 -0.6692558 -0.07348492
# 2   2  0.3349710 -0.20949564
# 3   3 -0.2685937 -0.05019846
# 4   4  0.3103840  0.19880236
# 5   5  0.8375426  0.63109211
# 6   6 -0.8542685  0.23073450
# 7   7  0.2820178 -0.21439351
# 8   8  0.8503595  0.37773408
# 9   9  0.2235143 -0.02902127
# 10 10 -0.4978889 -0.08451214

Assuming all data are in dataframe d , the following will store the variables in columns ending in _diff :

library(stringr)
t1_vars <- grep("_T1", colnames(d), value=TRUE)
t3_vars <- grep("_T3", colnames(d), value=TRUE)
d[, paste0(str_sub(t1_vars, end=-4), "_diff")] <- d[, t3_vars] - d[, t1_vars]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM