简体   繁体   English

为每个组从向量分配不同的值

[英]For each group assign different values from vector

I am trying to generate a fake dataset for testing. 我正在尝试生成一个假数据集进行测试。

It was easy enough to generate the columns that exist in all combinations: 生成所有组合中存在的列非常容易:

subject <- 1:5
visit <- c("D0", "D100", "D500")
isotype <- c("IgG", "IgA", "IgM", "IgD)

testdata <- expand.grid(subject, visit, isotype)

names(testdata) <- c("subject", "visit", "isotype")

Now I need to create two more columns; 现在,我需要再创建两列; "positivity" with a particular value for each group in "visit", and "response" with an random integer with a range dependent on each group in "visit". 在“访问”中为每个组指定特定值的“正数”,在“访问”中为依赖于每个组的随机整数“响应”。

For "positivity", I could do it this way: 对于“积极性”,我可以这样进行:

testdata[testdata$visit == "D0", c("positivity")] <- NA
testdata[testdata$visit == "D100", c("positivity")] <- 1
testdata[testdata$visit == "D500", c("positivity")] <- 0

and for "response", I could do it this way: 对于“响应”,我可以这样进行:

testdata[testdata$visit == "D0", c("response")] <- sample(1:100, 1)
testdata[testdata$visit == "D100", c("response")] <- sample(20000:30000, 1)
testdata[testdata$visit == "D500", c("response")] <- sample(1:100, 1)

but in reality I have many more unique observations in "visit" than this and that would take forever. 但实际上,在“访问”中我有许多独特的观察结果,这将永远存在。 I was hoping I could use dplyr and group_by to loop through each group and assign "positivity" from a vector since the length of that vector should be equal to the number of groups in "visit" and assign "response" with a vector of ranges for the sample method. 我希望我可以使用dplyr和group_by遍历每个组并从矢量分配“正”,因为该矢量的长度应等于“访问”中的组数,并向“响应”分配一个范围为矢量的矢量用于样本方法。

positivityvalues <- c(NA, 1, 0)
responseranges <- c(1:100, 1:500, 1:100)


testdata <- testdata %>%
            group_by(visit) %>%
            mutate(#i can't figure out what to put here
            #positivity[1] = positivityvalues[1] etc...
            #response[1] = sample(responseranges[1], 1) etc...
            )

to get something like this (for the sake of clarity, only the first two subjects and isotypes are listed) 以获得类似的信息(为清楚起见,仅列出前两个主题和同种型)

subject    visit    isotype    positivity    response
  1         D0       IgG          NA           58
  1         D100     IgG          1            27093
  1         D500     IgG          0            2   
  1         D0       IgA          NA           42
  1         D100     IgA          1            28921
  1         D500     IgA          0            85      
  2         D0       IgG          NA           86
  2         D100     IgG          1            26039
  2         D500     IgG          0            54   
  2         D0       IgA          NA           99
  2         D100     IgA          1            29021
  2         D500     IgA          0            23  

Thanks 谢谢

Edit* finished updates 编辑*完成的更新

Edit2* Solution: Edit2 *解决方案:

ranges <- list(D0=c(1:100), D100=c(25000:32000), D500=c(1:100))
positives <- c(D0=NA, D100=1, D500=0)

testdata$positivity <- positives[testdata$visit]
testdata$responsetemp <- ranges[testdata$visit] 
testdata$reponse <- lapply(testdata$responsetemp, function(x) sample(x, 1))

You can do this with a named vector... 您可以使用命名向量来执行此操作...

testdata <- expand.grid(subject=subject, visit=visit, isotype=isotype) 
                                   #this way to get column names

positivityvalues <- c(D0=NA, D100=1, D500=0) #add names

testdata$positivity <- positivityvalues[testdata$visit] #adds value by name

You could do something similar with the parameters for the sample function in the response column. 您可以对“ response列中的sample函数的参数执行类似的操作。

Here is an option using tidyverse . 这是使用tidyverse的选项。 Create a named vector with the unique values of 'visit' (it is not clear how the values will be changed when there are more unique elements in 'visit'. Use that to match the visit elements and replace that with NA, 0, 1 of the matched vector, then split the data by 'visit', use map2 to sample from the range of corresponding vector 创建一个具有唯一值“ visit”的命名向量(尚不清楚“ visit”中有更多唯一元素时如何更改值。使用它来匹配访问元素并将其替换为NA,0、1匹配的向量,然后通过“访问”对数据进行split ,使用map2从相应vectorrange进行sample

library(tidyverse)
v1 <- setNames(c(NA, 1, 0), as.character(unique(testdata$visit)))
testdata %>% 
     mutate(positivity = v1[visit]) %>% 
     split(.$visit) %>%
     map2_df(., list(1:100, 20000:30000, 1:100), ~ 
           .x %>% 
           mutate(response = sample(.y, n())))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM