简体   繁体   English

使用相同的mapply函数创建几个新变量

[英]Use the same mapply function to create several new variables

I have a data frame ("dat") in which each row represents one participant of a study. 我有一个数据框(“dat”),其中每行代表一个研究的参与者。 For each participant ("code") I have a column that gives their sex ("sex") and age ("age"), and several columns with test results ("v.1" etc.). 对于每个参与者(“代码”),我有一个列给出了他们的性别(“性别”)和年龄(“年龄”),以及几个带有测试结果的列(“v.1”等)。 The data frame looks something like this: 数据框看起来像这样:

> dat
   code sex age v.1 v.2
1  A1   m   8   4   9
2  B2   f   12  7   2

For each column of test results, I need to look up the value in a corresponding vector outside the data frame (eg "v.1.m.8" for 8 year old male participants or "v.1.f.12" for 12 year old female participants) and insert the value from that vector into a new column in the data frame ("v.1.t"). 对于每列测试结果,我需要在数据框外的相应向量中查找值(例如,对于8岁男性参与者的“v.1.m.8”或对于“v.1.f.12”, 12岁的女性参与者)并将该向量中的值插入数据框中的新列(“v.1.t”)。 There are different vectors for male and female participants and for different age groups. 男性和女性参与者以及不同年龄组有不同的载体。 The vectors look something like this: 向量看起来像这样:

v.1.m.8 <- c(4, 5, 2, 8, 2, ...)
v.2.m.8 <- c(3, 2, 2, 1, 8, ...)
v.1.m.12 <- c(...)
v.2.m.12 <- c(...)
v.1.f.8 <- c(...)
v.2.f.8 <- c(...)
v.1.f.12 <- c(...)
v.2.f.12 <- c(...)

For me, the most logically straightforward way to look up values in the vectors is a for-loop with nestes if-statemenst. 对我来说,在向量中查找值的最逻辑直接的方法是使用nestes if-statemenst的for循环。 Sort or like this: 排序或喜欢这个:

for (i in nrow(dat)) {
    if (dat[i, ]$age < 8 | dat[i, ]$age > 18) {
        dat[i, ]$v.1.t <- NA
        dat[i, ]$v.2.t <- NA
    } else if (dat[i, ]$age < 12) {
        if (dat[i, ]$dat.sex == "m") {
            dat[i, ]$v.1.t <- v.1.m.8[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.m.8[dat[i, ]$v.2]
        } else {
            dat[i, ]$v.1.t <- v.1.f.8[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.f.8[dat[i, ]$v.2]
        }
    } else {
        if (dat[i, ]$dat.sex == "m") {
            dat[i, ]$v.1.t <- v.1.m.12[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.m.12[dat[i, ]$v.2]
        } else {
            dat[i, ]$v.1.t <- v.1.f.12[dat[i, ]$v.1]
            dat[i, ]$v.2.t <- v.2.f.12[dat[i, ]$v.2]
        }
    }
}

To avoid a loop, I might use mapply() in something like this way: 为了避免循环,我可能会像这样使用mapply():

dat$v.1.t <- mapply(
    function(a, b, c) {
        if (a < 8 | a > 18) {
            NA
        } else if (a < 12) {
            if (b == "m") {
                v.1.m.8[c]
            } else {
                v.1.f.8[c]
            }
        } else {
            if (b == "m") {
                v.1.m.12[c]
            } else {
                v.1.f.12[c]
            }
        }
    },
    dat$age,
    dat$dat.sex,
    dat$v.1
)

dat$v.2.t <- mapply(
    function(a, b, c) {
        if (a < 8 | a > 18) {
            NA
        } else if (a < 12) {
            if (b == "m") {
                v.2.m.8[c]
            } else {
                v.2.f.8[c]
            }
        } else {
            if (b == "m") {
                v.2.m.12[c]
            } else {
                v.2.f.12[c]
            }
        }
    },
    dat$age,
    dat$dat.sex,
    dat$v.2
)

The problem with this second solution is that I would have to repeat the whole code for each variable I want to assign. 第二个解决方案的问题是我必须为我想要分配的每个变量重复整个代码。

Is there a better solution? 有更好的解决方案吗?

In my real code I have to look up eleven columns in 44 vectors to create eleven new columns. 在我的真实代码中,我必须在44个向量中查找11个列以创建11个新列。

I would prefer a solution with base R. 我更喜欢基础R的解决方案。

Let say your data looks like this: 假设您的数据如下所示:

dat <- data.frame(code = paste0(LETTERS[1:24], 1:24), sex=c("m", "f"), age=c(8,12, 12, 8), v.1 = sample(1:10, 24, replace=T), v.2 = sample(1:10, 24, replace=T))

Split based on combination of sex and age and call out the v.1 value for each split: 根据性别和年龄的组合进行拆分,并为每个拆分调出v.1值:

lapply(split(dat, list(dat$sex, dat$age)), '[[', "v.1")

$f.12
[1]  1  9  2  3  3 10

$f.8
[1] 8 3 7 7 3 8

$m.12
[1] 10  3  2  2  4  1

$m.8
[1]  8 10  1  9  5  7

Split based on combination of sex and age and call out the v.2 value for each split: 根据性别和年龄的组合进行拆分,并为每个拆分调出v.2值:

lapply(split(dat, list(dat$sex, dat$age)), '[[', "v.2")

$f.12
[1] 10  3  5  8  9  2

$f.8
[1] 2 3 4 8 2 5

$m.12
[1] 9 7 1 1 1 2

$m.8
[1]  5  2  1  5  9 10

Edit: Thanks @Sotos for pointing out splitting by two variables 编辑:感谢@Sotos指出两个变量分裂

This should be simple with ifelse() . ifelse()应该很简单。

The following example is for just one new variable: 以下示例仅适用于一个新变量:

Data example (thanks @Adam Quek): 数据示例(感谢@Adam Quek):

dat <- data.frame(code = paste0(LETTERS[1:24], 1:24), sex=c("m", "f"), 
                  age=c(8,12, 12, 8), v.1 = sample(1:10, 24, replace=T),
                  v.2 = sample(1:10, 24, replace=T))

Vector examples: 矢量示例:

v.1.m.8 <- c(21:30)
v.1.f.8 <- c(31:40)
v.1.m.12 <- c(41:50)
v.1.f.12 <- c(51:60)

Code for new variable v.1.t : 新变量v.1.t代码:

dat$v.1.t <- with(dat, ifelse(!(age %in% c(8,12)), NA, 
                          ifelse(age == 8 & sex == "m", v.1.m.8[v.1], 
                                 ifelse(age == 8 & sex == "f", v.1.f.8[v.1],
                                        ifelse(age == 12 & sex == "m", v.1.m.12[v.1],
                                               v.1.f.12[v.1])))))

The age restriction can easily be edited to include more categories and to branch out the possible vectors. 可以轻松编辑年龄限制以包括更多类别并分出可能的向量。

Output: 输出:

   code sex age v.1 v.2 v.1.t
1    A1   m   8  10   1    30
2    B2   f  12   6   5    56
3    C3   m  12  10   3    50
4    D4   f   8   7  10    37
5    E5   m   8   5   4    25
6    F6   f  12   6   9    56
7    G7   m  12   2   9    42
8    H8   f   8   2   3    32
9    I9   m   8   4   1    24
10  J10   f  12   7   4    57
11  K11   m  12   7   4    47
12  L12   f   8   9  10    39
13  M13   m   8   9   2    29
14  N14   f  12   5   8    55
15  O15   m  12   1  10    41
16  P16   f   8   8   4    38
17  Q17   m   8   6   7    26
18  R18   f  12   4  10    54
19  S19   m  12  10   1    50
20  T20   f   8   9   6    39
21  U21   m   8   9   8    29
22  V22   f  12  10   2    60
23  W23   m  12   6   6    46
24  X24   f   8   6   7    36

If you don't want to write the ifelse() for every of your 11 variables, put the vectors into a list with two layers (list of 11 lists with 4 vectors each) and mapply() over your variables and the list of vector lists. 如果你不想为11个变量中的每个变量写ifelse() ,请将向量放入一个包含两个图层的列表(11个列表的列表,每个列表包含4个向量),并在变量和矢量列表上使用mapply()名单。

Edit: 编辑:

I thought about an implementation with mapply() and I think a simple for() -loop is easier. 我想到了mapply()的实现,我认为简单的for() loop更容易。

The following should do it (example with two variables and 4 vectors each (m8, f8, m12, f12)): 以下应该这样做(例如,每个变量有两个变量和4个向量(m8,f8,m12,f12)):

Vectors: 向量:

v.1.m.8 <- c(21:30)
v.1.f.8 <- c(31:40)
v.1.m.12 <- c(41:50)
v.1.f.12 <- c(51:60)
v.2.m.8 <- c(61:70)
v.2.f.8 <- c(71:80)
v.2.m.12 <- c(81:90)
v.2.f.12 <- c(91:100)

List of vectors: 矢量列表:

myvectors <- list("v.1" = list(v.1.m.8, v.1.f.8, v.1.m.12, v.1.f.12), 
                  "v.2" = list(v.2.m.8, v.2.f.8, v.2.m.12, v.2.f.12))

for() -loop (looping only through the names of the list, so i is c("v.1", "v.2")) : for() loop(仅通过列表的名称循环,所以ic("v.1", "v.2"))

for(i in names(myvectors)){
  dat[, paste(i, "t", sep = ".")] <- with(dat, ifelse(!(age %in% c(8,12)), NA, 
              ifelse(age == 8 & sex == "m", myvectors[[i]][[1]][eval(parse(text = i))], 
                ifelse(age == 8 & sex == "f", myvectors[[i]][[2]][eval(parse(text = i))],
                  ifelse(age == 12 & sex == "m", myvectors[[i]][[3]][eval(parse(text = i))],
                    myvectors[[i]][[4]][eval(parse(text = i))])))))
}

Output: 输出:

   code sex age v.1 v.2 v.1.t v.2.t
1    A1   m   8   3   2    23    62
2    B2   f  12   7  10    57   100
3    C3   m  12   2   3    42    83
4    D4   f   8   7   6    37    76
5    E5   m   8   2  10    22    70
6    F6   f  12   1   9    51    99
7    G7   m  12  10   6    50    86
8    H8   f   8   4   6    34    76
9    I9   m   8   3   1    23    61
10  J10   f  12   5   4    55    94
11  K11   m  12   5   5    45    85
12  L12   f   8   3   8    33    78
13  M13   m   8  10   9    30    69
14  N14   f  12   3   4    53    94
15  O15   m  12   6   2    46    82
16  P16   f   8   8   3    38    73
17  Q17   m   8   9   5    29    65
18  R18   f  12   5   6    55    96
19  S19   m  12   6   4    46    84
20  T20   f   8   2   9    32    79
21  U21   m   8   5   1    25    61
22  V22   f  12   2   1    52    91
23  W23   m  12   3  10    43    90
24  X24   f   8   2   9    32    79

With this, the only thing you would need to prepare is the list of lists of vectors with correctly named sublists on the first level (so "v.1" to "v.11" as shown above with "v.1" and "v.2" . Make sure that the order of the 4 vectors in the sublists is always the same! In my example the order is m8, f8, m12, f12. Hope it helps! 有了这个,你需要准备的唯一事情就是在第一级有正确命名的子列表的向量列表列表(所以"v.1""v.11"如上所示,带有"v.1""v.2" 。确保子列表中4个向量的顺序始终相同!在我的例子中,顺序是m8,f8,m12,f12。希望它有所帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 制定函数以将变量提供给mapply函数 - Formulating a function to supply variables to a mapply function ggplot2和函数用于具有相同变量的多个图 - ggplot2 and function for several plots with same variables 如何将相同的 function 应用于 R 中的几个变量? - How to apply the same function to several variables in R? 将函数应用于具有相同名称模式的多个变量 - Apply function to several variables with same name pattern 如何在没有循环或mapply的R中的向量的不同间隔上使用相同的函数? - How to use the same function on different intervals of a vector in R without loops or mapply? 如何基于组合同一列中多个变量的值创建新变量并删除组合时使用的旧变量 - How Can I create a new variable, based on combining the values of several variables in same column and remove the old variables used when combining 如何使用dmvnorm函数和mapply在一起 - how to use the dmvnorm function and mapply together 使用 purrr 根据现有变量的值创建多个新变量 - Using purrr to create several new variables based on values of existing variables 数据表 - 在多个列上应用相同的函数以创建新的数据表列 - Data table - apply the same function on several columns to create new data table columns 如何跨多个列表映射具有多个变量的函数 - How to mapply a function with multiple variables across multiple lists
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM