[英]Use the same mapply function to create several new variables
I have a data frame ("dat") in which each row represents one participant of a study. 我有一个数据框(“dat”),其中每行代表一个研究的参与者。 For each participant ("code") I have a column that gives their sex ("sex") and age ("age"), and several columns with test results ("v.1" etc.).
对于每个参与者(“代码”),我有一个列给出了他们的性别(“性别”)和年龄(“年龄”),以及几个带有测试结果的列(“v.1”等)。 The data frame looks something like this:
数据框看起来像这样:
> dat
code sex age v.1 v.2
1 A1 m 8 4 9
2 B2 f 12 7 2
For each column of test results, I need to look up the value in a corresponding vector outside the data frame (eg "v.1.m.8" for 8 year old male participants or "v.1.f.12" for 12 year old female participants) and insert the value from that vector into a new column in the data frame ("v.1.t"). 对于每列测试结果,我需要在数据框外的相应向量中查找值(例如,对于8岁男性参与者的“v.1.m.8”或对于“v.1.f.12”, 12岁的女性参与者)并将该向量中的值插入数据框中的新列(“v.1.t”)。 There are different vectors for male and female participants and for different age groups.
男性和女性参与者以及不同年龄组有不同的载体。 The vectors look something like this:
向量看起来像这样:
v.1.m.8 <- c(4, 5, 2, 8, 2, ...)
v.2.m.8 <- c(3, 2, 2, 1, 8, ...)
v.1.m.12 <- c(...)
v.2.m.12 <- c(...)
v.1.f.8 <- c(...)
v.2.f.8 <- c(...)
v.1.f.12 <- c(...)
v.2.f.12 <- c(...)
For me, the most logically straightforward way to look up values in the vectors is a for-loop with nestes if-statemenst. 对我来说,在向量中查找值的最逻辑直接的方法是使用nestes if-statemenst的for循环。 Sort or like this:
排序或喜欢这个:
for (i in nrow(dat)) {
if (dat[i, ]$age < 8 | dat[i, ]$age > 18) {
dat[i, ]$v.1.t <- NA
dat[i, ]$v.2.t <- NA
} else if (dat[i, ]$age < 12) {
if (dat[i, ]$dat.sex == "m") {
dat[i, ]$v.1.t <- v.1.m.8[dat[i, ]$v.1]
dat[i, ]$v.2.t <- v.2.m.8[dat[i, ]$v.2]
} else {
dat[i, ]$v.1.t <- v.1.f.8[dat[i, ]$v.1]
dat[i, ]$v.2.t <- v.2.f.8[dat[i, ]$v.2]
}
} else {
if (dat[i, ]$dat.sex == "m") {
dat[i, ]$v.1.t <- v.1.m.12[dat[i, ]$v.1]
dat[i, ]$v.2.t <- v.2.m.12[dat[i, ]$v.2]
} else {
dat[i, ]$v.1.t <- v.1.f.12[dat[i, ]$v.1]
dat[i, ]$v.2.t <- v.2.f.12[dat[i, ]$v.2]
}
}
}
To avoid a loop, I might use mapply() in something like this way: 为了避免循环,我可能会像这样使用mapply():
dat$v.1.t <- mapply(
function(a, b, c) {
if (a < 8 | a > 18) {
NA
} else if (a < 12) {
if (b == "m") {
v.1.m.8[c]
} else {
v.1.f.8[c]
}
} else {
if (b == "m") {
v.1.m.12[c]
} else {
v.1.f.12[c]
}
}
},
dat$age,
dat$dat.sex,
dat$v.1
)
dat$v.2.t <- mapply(
function(a, b, c) {
if (a < 8 | a > 18) {
NA
} else if (a < 12) {
if (b == "m") {
v.2.m.8[c]
} else {
v.2.f.8[c]
}
} else {
if (b == "m") {
v.2.m.12[c]
} else {
v.2.f.12[c]
}
}
},
dat$age,
dat$dat.sex,
dat$v.2
)
The problem with this second solution is that I would have to repeat the whole code for each variable I want to assign. 第二个解决方案的问题是我必须为我想要分配的每个变量重复整个代码。
Is there a better solution? 有更好的解决方案吗?
In my real code I have to look up eleven columns in 44 vectors to create eleven new columns. 在我的真实代码中,我必须在44个向量中查找11个列以创建11个新列。
I would prefer a solution with base R. 我更喜欢基础R的解决方案。
Let say your data looks like this: 假设您的数据如下所示:
dat <- data.frame(code = paste0(LETTERS[1:24], 1:24), sex=c("m", "f"), age=c(8,12, 12, 8), v.1 = sample(1:10, 24, replace=T), v.2 = sample(1:10, 24, replace=T))
Split based on combination of sex and age and call out the v.1 value for each split: 根据性别和年龄的组合进行拆分,并为每个拆分调出v.1值:
lapply(split(dat, list(dat$sex, dat$age)), '[[', "v.1")
$f.12
[1] 1 9 2 3 3 10
$f.8
[1] 8 3 7 7 3 8
$m.12
[1] 10 3 2 2 4 1
$m.8
[1] 8 10 1 9 5 7
Split based on combination of sex and age and call out the v.2 value for each split: 根据性别和年龄的组合进行拆分,并为每个拆分调出v.2值:
lapply(split(dat, list(dat$sex, dat$age)), '[[', "v.2")
$f.12
[1] 10 3 5 8 9 2
$f.8
[1] 2 3 4 8 2 5
$m.12
[1] 9 7 1 1 1 2
$m.8
[1] 5 2 1 5 9 10
Edit: Thanks @Sotos for pointing out splitting by two variables 编辑:感谢@Sotos指出两个变量分裂
This should be simple with ifelse()
. ifelse()
应该很简单。
The following example is for just one new variable: 以下示例仅适用于一个新变量:
Data example (thanks @Adam Quek): 数据示例(感谢@Adam Quek):
dat <- data.frame(code = paste0(LETTERS[1:24], 1:24), sex=c("m", "f"),
age=c(8,12, 12, 8), v.1 = sample(1:10, 24, replace=T),
v.2 = sample(1:10, 24, replace=T))
Vector examples: 矢量示例:
v.1.m.8 <- c(21:30)
v.1.f.8 <- c(31:40)
v.1.m.12 <- c(41:50)
v.1.f.12 <- c(51:60)
Code for new variable v.1.t
: 新变量
v.1.t
代码:
dat$v.1.t <- with(dat, ifelse(!(age %in% c(8,12)), NA,
ifelse(age == 8 & sex == "m", v.1.m.8[v.1],
ifelse(age == 8 & sex == "f", v.1.f.8[v.1],
ifelse(age == 12 & sex == "m", v.1.m.12[v.1],
v.1.f.12[v.1])))))
The age restriction can easily be edited to include more categories and to branch out the possible vectors. 可以轻松编辑年龄限制以包括更多类别并分出可能的向量。
Output: 输出:
code sex age v.1 v.2 v.1.t
1 A1 m 8 10 1 30
2 B2 f 12 6 5 56
3 C3 m 12 10 3 50
4 D4 f 8 7 10 37
5 E5 m 8 5 4 25
6 F6 f 12 6 9 56
7 G7 m 12 2 9 42
8 H8 f 8 2 3 32
9 I9 m 8 4 1 24
10 J10 f 12 7 4 57
11 K11 m 12 7 4 47
12 L12 f 8 9 10 39
13 M13 m 8 9 2 29
14 N14 f 12 5 8 55
15 O15 m 12 1 10 41
16 P16 f 8 8 4 38
17 Q17 m 8 6 7 26
18 R18 f 12 4 10 54
19 S19 m 12 10 1 50
20 T20 f 8 9 6 39
21 U21 m 8 9 8 29
22 V22 f 12 10 2 60
23 W23 m 12 6 6 46
24 X24 f 8 6 7 36
If you don't want to write the ifelse()
for every of your 11 variables, put the vectors into a list with two layers (list of 11 lists with 4 vectors each) and mapply()
over your variables and the list of vector lists. 如果你不想为11个变量中的每个变量写
ifelse()
,请将向量放入一个包含两个图层的列表(11个列表的列表,每个列表包含4个向量),并在变量和矢量列表上使用mapply()
名单。
Edit: 编辑:
I thought about an implementation with mapply()
and I think a simple for()
-loop is easier. 我想到了
mapply()
的实现,我认为简单的for()
loop更容易。
The following should do it (example with two variables and 4 vectors each (m8, f8, m12, f12)): 以下应该这样做(例如,每个变量有两个变量和4个向量(m8,f8,m12,f12)):
Vectors: 向量:
v.1.m.8 <- c(21:30)
v.1.f.8 <- c(31:40)
v.1.m.12 <- c(41:50)
v.1.f.12 <- c(51:60)
v.2.m.8 <- c(61:70)
v.2.f.8 <- c(71:80)
v.2.m.12 <- c(81:90)
v.2.f.12 <- c(91:100)
List of vectors: 矢量列表:
myvectors <- list("v.1" = list(v.1.m.8, v.1.f.8, v.1.m.12, v.1.f.12),
"v.2" = list(v.2.m.8, v.2.f.8, v.2.m.12, v.2.f.12))
for()
-loop (looping only through the names of the list, so i
is c("v.1", "v.2"))
: for()
loop(仅通过列表的名称循环,所以i
是c("v.1", "v.2"))
:
for(i in names(myvectors)){
dat[, paste(i, "t", sep = ".")] <- with(dat, ifelse(!(age %in% c(8,12)), NA,
ifelse(age == 8 & sex == "m", myvectors[[i]][[1]][eval(parse(text = i))],
ifelse(age == 8 & sex == "f", myvectors[[i]][[2]][eval(parse(text = i))],
ifelse(age == 12 & sex == "m", myvectors[[i]][[3]][eval(parse(text = i))],
myvectors[[i]][[4]][eval(parse(text = i))])))))
}
Output: 输出:
code sex age v.1 v.2 v.1.t v.2.t
1 A1 m 8 3 2 23 62
2 B2 f 12 7 10 57 100
3 C3 m 12 2 3 42 83
4 D4 f 8 7 6 37 76
5 E5 m 8 2 10 22 70
6 F6 f 12 1 9 51 99
7 G7 m 12 10 6 50 86
8 H8 f 8 4 6 34 76
9 I9 m 8 3 1 23 61
10 J10 f 12 5 4 55 94
11 K11 m 12 5 5 45 85
12 L12 f 8 3 8 33 78
13 M13 m 8 10 9 30 69
14 N14 f 12 3 4 53 94
15 O15 m 12 6 2 46 82
16 P16 f 8 8 3 38 73
17 Q17 m 8 9 5 29 65
18 R18 f 12 5 6 55 96
19 S19 m 12 6 4 46 84
20 T20 f 8 2 9 32 79
21 U21 m 8 5 1 25 61
22 V22 f 12 2 1 52 91
23 W23 m 12 3 10 43 90
24 X24 f 8 2 9 32 79
With this, the only thing you would need to prepare is the list of lists of vectors with correctly named sublists on the first level (so "v.1"
to "v.11"
as shown above with "v.1"
and "v.2"
. Make sure that the order of the 4 vectors in the sublists is always the same! In my example the order is m8, f8, m12, f12. Hope it helps! 有了这个,你需要准备的唯一事情就是在第一级有正确命名的子列表的向量列表列表(所以
"v.1"
到"v.11"
如上所示,带有"v.1"
和"v.2"
。确保子列表中4个向量的顺序始终相同!在我的例子中,顺序是m8,f8,m12,f12。希望它有所帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.