简体   繁体   中英

Use the same mapply function to create several new variables

I have a data frame ("dat") in which each row represents one participant of a study. For each participant ("code") I have a column that gives their sex ("sex") and age ("age"), and several columns with test results ("v.1" etc.). The data frame looks something like this:

> dat
   code sex age v.1 v.2
1  A1   m   8   4   9
2  B2   f   12  7   2

For each column of test results, I need to look up the value in a corresponding vector outside the data frame (eg "v.1.m.8" for 8 year old male participants or "v.1.f.12" for 12 year old female participants) and insert the value from that vector into a new column in the data frame ("v.1.t"). There are different vectors for male and female participants and for different age groups. The vectors look something like this:

v.1.m.8 <- c(4, 5, 2, 8, 2, ...)
v.2.m.8 <- c(3, 2, 2, 1, 8, ...)
v.1.m.12 <- c(...)
v.2.m.12 <- c(...)
v.1.f.8 <- c(...)
v.2.f.8 <- c(...)
v.1.f.12 <- c(...)
v.2.f.12 <- c(...)

For me, the most logically straightforward way to look up values in the vectors is a for-loop with nestes if-statemenst. Sort or like this:

for (i in nrow(dat)) {
    if (dat[i, ]$age < 8 | dat[i, ]$age > 18) {
        dat[i, ]$v.1.t <- NA
        dat[i, ]$v.2.t <- NA
    } else if (dat[i, ]$age < 12) {
        if (dat[i, ]$dat.sex == "m") {
            dat[i, ]$v.1.t <- v.1.m.8[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.m.8[dat[i, ]$v.2]
        } else {
            dat[i, ]$v.1.t <- v.1.f.8[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.f.8[dat[i, ]$v.2]
        }
    } else {
        if (dat[i, ]$dat.sex == "m") {
            dat[i, ]$v.1.t <- v.1.m.12[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.m.12[dat[i, ]$v.2]
        } else {
            dat[i, ]$v.1.t <- v.1.f.12[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.f.12[dat[i, ]$v.2]
        }
    }
}

To avoid a loop, I might use mapply() in something like this way:

dat$v.1.t <- mapply(
    function(a, b, c) {
        if (a < 8 | a > 18) {
            NA
        } else if (a < 12) {
            if (b == "m") {
                v.1.m.8[c]
            } else {
                v.1.f.8[c]
            }
        } else {
            if (b == "m") {
                v.1.m.12[c]
            } else {
                v.1.f.12[c]
            }
        }
    },
    dat$age,
    dat$dat.sex,
    dat$v.1
)

dat$v.2.t <- mapply(
    function(a, b, c) {
        if (a < 8 | a > 18) {
            NA
        } else if (a < 12) {
            if (b == "m") {
                v.2.m.8[c]
            } else {
                v.2.f.8[c]
            }
        } else {
            if (b == "m") {
                v.2.m.12[c]
            } else {
                v.2.f.12[c]
            }
        }
    },
    dat$age,
    dat$dat.sex,
    dat$v.2
)

The problem with this second solution is that I would have to repeat the whole code for each variable I want to assign.

Is there a better solution?

In my real code I have to look up eleven columns in 44 vectors to create eleven new columns.

I would prefer a solution with base R.

Let say your data looks like this:

dat <- data.frame(code = paste0(LETTERS[1:24], 1:24), sex=c("m", "f"), age=c(8,12, 12, 8), v.1 = sample(1:10, 24, replace=T), v.2 = sample(1:10, 24, replace=T))

Split based on combination of sex and age and call out the v.1 value for each split:

lapply(split(dat, list(dat$sex, dat$age)), '[[', "v.1")

$f.12
[1]  1  9  2  3  3 10

$f.8
[1] 8 3 7 7 3 8

$m.12
[1] 10  3  2  2  4  1

$m.8
[1]  8 10  1  9  5  7

Split based on combination of sex and age and call out the v.2 value for each split:

lapply(split(dat, list(dat$sex, dat$age)), '[[', "v.2")

$f.12
[1] 10  3  5  8  9  2

$f.8
[1] 2 3 4 8 2 5

$m.12
[1] 9 7 1 1 1 2

$m.8
[1]  5  2  1  5  9 10

Edit: Thanks @Sotos for pointing out splitting by two variables

This should be simple with ifelse() .

The following example is for just one new variable:

Data example (thanks @Adam Quek):

dat <- data.frame(code = paste0(LETTERS[1:24], 1:24), sex=c("m", "f"), 
                  age=c(8,12, 12, 8), v.1 = sample(1:10, 24, replace=T),
                  v.2 = sample(1:10, 24, replace=T))

Vector examples:

v.1.m.8 <- c(21:30)
v.1.f.8 <- c(31:40)
v.1.m.12 <- c(41:50)
v.1.f.12 <- c(51:60)

Code for new variable v.1.t :

dat$v.1.t <- with(dat, ifelse(!(age %in% c(8,12)), NA, 
                          ifelse(age == 8 & sex == "m", v.1.m.8[v.1], 
                                 ifelse(age == 8 & sex == "f", v.1.f.8[v.1],
                                        ifelse(age == 12 & sex == "m", v.1.m.12[v.1],
                                               v.1.f.12[v.1])))))

The age restriction can easily be edited to include more categories and to branch out the possible vectors.

Output:

   code sex age v.1 v.2 v.1.t
1    A1   m   8  10   1    30
2    B2   f  12   6   5    56
3    C3   m  12  10   3    50
4    D4   f   8   7  10    37
5    E5   m   8   5   4    25
6    F6   f  12   6   9    56
7    G7   m  12   2   9    42
8    H8   f   8   2   3    32
9    I9   m   8   4   1    24
10  J10   f  12   7   4    57
11  K11   m  12   7   4    47
12  L12   f   8   9  10    39
13  M13   m   8   9   2    29
14  N14   f  12   5   8    55
15  O15   m  12   1  10    41
16  P16   f   8   8   4    38
17  Q17   m   8   6   7    26
18  R18   f  12   4  10    54
19  S19   m  12  10   1    50
20  T20   f   8   9   6    39
21  U21   m   8   9   8    29
22  V22   f  12  10   2    60
23  W23   m  12   6   6    46
24  X24   f   8   6   7    36

If you don't want to write the ifelse() for every of your 11 variables, put the vectors into a list with two layers (list of 11 lists with 4 vectors each) and mapply() over your variables and the list of vector lists.

Edit:

I thought about an implementation with mapply() and I think a simple for() -loop is easier.

The following should do it (example with two variables and 4 vectors each (m8, f8, m12, f12)):

Vectors:

v.1.m.8 <- c(21:30)
v.1.f.8 <- c(31:40)
v.1.m.12 <- c(41:50)
v.1.f.12 <- c(51:60)
v.2.m.8 <- c(61:70)
v.2.f.8 <- c(71:80)
v.2.m.12 <- c(81:90)
v.2.f.12 <- c(91:100)

List of vectors:

myvectors <- list("v.1" = list(v.1.m.8, v.1.f.8, v.1.m.12, v.1.f.12), 
                  "v.2" = list(v.2.m.8, v.2.f.8, v.2.m.12, v.2.f.12))

for() -loop (looping only through the names of the list, so i is c("v.1", "v.2")) :

for(i in names(myvectors)){
  dat[, paste(i, "t", sep = ".")] <- with(dat, ifelse(!(age %in% c(8,12)), NA, 
              ifelse(age == 8 & sex == "m", myvectors[[i]][[1]][eval(parse(text = i))], 
                ifelse(age == 8 & sex == "f", myvectors[[i]][[2]][eval(parse(text = i))],
                  ifelse(age == 12 & sex == "m", myvectors[[i]][[3]][eval(parse(text = i))],
                    myvectors[[i]][[4]][eval(parse(text = i))])))))
}

Output:

   code sex age v.1 v.2 v.1.t v.2.t
1    A1   m   8   3   2    23    62
2    B2   f  12   7  10    57   100
3    C3   m  12   2   3    42    83
4    D4   f   8   7   6    37    76
5    E5   m   8   2  10    22    70
6    F6   f  12   1   9    51    99
7    G7   m  12  10   6    50    86
8    H8   f   8   4   6    34    76
9    I9   m   8   3   1    23    61
10  J10   f  12   5   4    55    94
11  K11   m  12   5   5    45    85
12  L12   f   8   3   8    33    78
13  M13   m   8  10   9    30    69
14  N14   f  12   3   4    53    94
15  O15   m  12   6   2    46    82
16  P16   f   8   8   3    38    73
17  Q17   m   8   9   5    29    65
18  R18   f  12   5   6    55    96
19  S19   m  12   6   4    46    84
20  T20   f   8   2   9    32    79
21  U21   m   8   5   1    25    61
22  V22   f  12   2   1    52    91
23  W23   m  12   3  10    43    90
24  X24   f   8   2   9    32    79

With this, the only thing you would need to prepare is the list of lists of vectors with correctly named sublists on the first level (so "v.1" to "v.11" as shown above with "v.1" and "v.2" . Make sure that the order of the 4 vectors in the sublists is always the same! In my example the order is m8, f8, m12, f12. Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM