[英]For each group assign different values from vector
I am trying to generate a fake dataset for testing. 我正在尝试生成一个假数据集进行测试。
It was easy enough to generate the columns that exist in all combinations: 生成所有组合中存在的列非常容易:
subject <- 1:5
visit <- c("D0", "D100", "D500")
isotype <- c("IgG", "IgA", "IgM", "IgD)
testdata <- expand.grid(subject, visit, isotype)
names(testdata) <- c("subject", "visit", "isotype")
Now I need to create two more columns; 现在,我需要再创建两列; "positivity" with a particular value for each group in "visit", and "response" with an random integer with a range dependent on each group in "visit".
在“访问”中为每个组指定特定值的“正数”,在“访问”中为依赖于每个组的随机整数“响应”。
For "positivity", I could do it this way: 对于“积极性”,我可以这样进行:
testdata[testdata$visit == "D0", c("positivity")] <- NA
testdata[testdata$visit == "D100", c("positivity")] <- 1
testdata[testdata$visit == "D500", c("positivity")] <- 0
and for "response", I could do it this way: 对于“响应”,我可以这样进行:
testdata[testdata$visit == "D0", c("response")] <- sample(1:100, 1)
testdata[testdata$visit == "D100", c("response")] <- sample(20000:30000, 1)
testdata[testdata$visit == "D500", c("response")] <- sample(1:100, 1)
but in reality I have many more unique observations in "visit" than this and that would take forever. 但实际上,在“访问”中我有许多独特的观察结果,这将永远存在。 I was hoping I could use dplyr and group_by to loop through each group and assign "positivity" from a vector since the length of that vector should be equal to the number of groups in "visit" and assign "response" with a vector of ranges for the sample method.
我希望我可以使用dplyr和group_by遍历每个组并从矢量分配“正”,因为该矢量的长度应等于“访问”中的组数,并向“响应”分配一个范围为矢量的矢量用于样本方法。
positivityvalues <- c(NA, 1, 0)
responseranges <- c(1:100, 1:500, 1:100)
testdata <- testdata %>%
group_by(visit) %>%
mutate(#i can't figure out what to put here
#positivity[1] = positivityvalues[1] etc...
#response[1] = sample(responseranges[1], 1) etc...
)
to get something like this (for the sake of clarity, only the first two subjects and isotypes are listed) 以获得类似的信息(为清楚起见,仅列出前两个主题和同种型)
subject visit isotype positivity response
1 D0 IgG NA 58
1 D100 IgG 1 27093
1 D500 IgG 0 2
1 D0 IgA NA 42
1 D100 IgA 1 28921
1 D500 IgA 0 85
2 D0 IgG NA 86
2 D100 IgG 1 26039
2 D500 IgG 0 54
2 D0 IgA NA 99
2 D100 IgA 1 29021
2 D500 IgA 0 23
Thanks 谢谢
Edit* finished updates 编辑*完成的更新
Edit2* Solution: Edit2 *解决方案:
ranges <- list(D0=c(1:100), D100=c(25000:32000), D500=c(1:100))
positives <- c(D0=NA, D100=1, D500=0)
testdata$positivity <- positives[testdata$visit]
testdata$responsetemp <- ranges[testdata$visit]
testdata$reponse <- lapply(testdata$responsetemp, function(x) sample(x, 1))
You can do this with a named vector... 您可以使用命名向量来执行此操作...
testdata <- expand.grid(subject=subject, visit=visit, isotype=isotype)
#this way to get column names
positivityvalues <- c(D0=NA, D100=1, D500=0) #add names
testdata$positivity <- positivityvalues[testdata$visit] #adds value by name
You could do something similar with the parameters for the sample
function in the response
column. 您可以对“
response
列中的sample
函数的参数执行类似的操作。
Here is an option using tidyverse
. 这是使用
tidyverse
的选项。 Create a named vector with the unique values of 'visit' (it is not clear how the values will be changed when there are more unique elements in 'visit'. Use that to match the visit elements and replace that with NA, 0, 1 of the matched vector, then split
the data by 'visit', use map2
to sample
from the range
of corresponding vector
创建一个具有唯一值“ visit”的命名向量(尚不清楚“ visit”中有更多唯一元素时如何更改值。使用它来匹配访问元素并将其替换为NA,0、1匹配的向量,然后通过“访问”对数据进行
split
,使用map2
从相应vector
的range
进行sample
library(tidyverse)
v1 <- setNames(c(NA, 1, 0), as.character(unique(testdata$visit)))
testdata %>%
mutate(positivity = v1[visit]) %>%
split(.$visit) %>%
map2_df(., list(1:100, 20000:30000, 1:100), ~
.x %>%
mutate(response = sample(.y, n())))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.