简体   繁体   English

模拟与现有二进制变量相关的连续变量

[英]Simulate continuous variable that is correlated to existing binary variable

I'm looking to simulate an age variable (constrained range 18-35) that is correlated 0.1 with an existing binary variable called use . 我正在寻找一个年龄变量(约束范围18-35),该变量与现有的名为use二进制变量相关联为0.1。 Most of the examples I've come across demonstrate how to simulate both variables simultaneously. 我遇到的大多数示例都演示了如何同时模拟两个变量。

# setup
  set.seed(493)
  n <- 134
  dat <- data.frame(partID=seq(1, n, 1),
                    trt=c(rep(0, n/2), 
                          rep(1, n/2)))

# set proportion
  a <- .8   
  b <- .2  
  dat$use <- c(rbinom(n/2, 1, b),
               rbinom(n/2, 1, a))

Not sure if this is the best way to approach this, but you might get close using the answer from here: https://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable 不确定这是否是解决此问题的最佳方法,但是您可能会使用此处的答案来接近: https : //stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined与现有变量的相关性

For example (using the code from the link): 例如(使用链接中的代码):

x1    <- dat$use               # fixed given data

rho   <- 0.1                   # desired correlation = cos(angle)
theta <- acos(rho)             # corresponding angle
x2    <- rnorm(n, 2, 0.5)      # new random data
X     <- cbind(x1, x2)         # matrix
Xctr  <- scale(X, center=TRUE, scale=FALSE)   # centered columns (mean 0)

Id   <- diag(n)                               # identity matrix
Q    <- qr.Q(qr(Xctr[ , 1, drop=FALSE]))      # QR-decomposition, just matrix Q
P    <- tcrossprod(Q)          # = Q Q'       # projection onto space defined by x1
x2o  <- (Id-P) %*% Xctr[ , 2]                 # x2ctr made orthogonal to x1ctr
Xc2  <- cbind(Xctr[ , 1], x2o)                # bind to matrix
Y    <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2)))  # scale columns to length 1

x <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1]     # final new vector


dat$age <- (1 + x) * 25 

cor(dat$use, dat$age)
# 0.1

summary(dat$age)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 20.17   23.53   25.00   25.00   26.59   30.50 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 模拟相关计数变量和连续变量之间的数据 - Simulating data between correlated count variable and a continuous variable 生成不相关的变量,每个变量都与现有的响应变量相关 - Generate uncorrelated variables each well correlated with existing response variable 绘制分组的连续变量与二进制变量 - Plotting grouped continuous variable vs. binary variable 针对连续自变量绘制二元因变量以进行探索性分析 - Plotting Binary dependent variable against continuous independent variable for exploratory analysis 根据连续变量值 (dplyr) 改变二进制变量 - Mutating a binary variable based on continuous variable values (dplyr) 用 R package 模拟数据 SimCorMultRes 相关二进制数据 - Simulate data with R package SimCorMultRes for correlated binary data 针对连续数据绘制二进制变量的折线图 - Plot Line Chart of Binary Variable Against Continuous Data 对连续的预测变量进行分类并计算二进制结果的比例 - Categorize a continuous predictor variable and calculate proportion of binary outcome 如何模拟与其他两个现有矢量相关(不同方式)的矢量 - How to simulate a vector that is correlated (in a different way) to two other existing vectors R package 适合识别与二元响应变量正相关的单词 - What R package is suited to identifying words that are positively correlated with a binary response variable
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM