I'm trying to create a dataset of randomly generate values that have some specific properties:
I've succeeded in something that generates data close to what I want, but it is very slow. I suspect it's slow because of the while loops.
simSession <- function(sessionid = 1) {
s <- data.frame(sessionid = sessionid, userid = seq(1:12))
total <- sample(48:72, 1)
mu = total / 4
sigma = 3
s$x <- as.integer(rnorm(mean=mu, sd=sigma, n=nrow(s)))
while(sum(s$x) > total) {
# i <- sample(nrow(s), 1)
i <- sample(rep(s$userid, s$x), 1)
if(s[i, ]$x > 1) {
s[i, ]$x <- s[i, ]$x - 1
} else {
s[i, ]$x = 1
}
}
s$y <- as.integer(rnorm(mean=mu, sd=sigma, n=nrow(s)))
while(sum(s$y) > sum(s$x)) {
# i <- sample(nrow(s), 1)
i <- sample(rep(s$userid, s$y), 1)
if(s[i, ]$y > 1) {
s[i, ]$y <- s[i, ]$y - 1
} else {
s[i, ]$y = 1
}
}
s$xyr <- s$x / s$y
return(s)
}
Is there something obvious I'm missing that would make this problem easier or an alternative function that would be faster?
Also, bonus points for being able to specify a parameter that skews the mode left or right.
If you don't mind that expected value and variance are equal, you could use the Poisson distribution:
randgen <- function(n,mu) {
x <- rpois(n,mu)
y <- rpois(n,mu)
d <- sum(y)-sum(x)
if (d<0) {
ind <- sample(seq_along(y),-d)
y[ind] <- y[ind]+1
} else {
ind <- sample(seq_along(x),d)
x[ind] <- x[ind]+1
}
cbind(x=as.integer(x),y=as.integer(y))
}
set.seed(42)
rand <- randgen(1000,15)
layout(c(1,2))
qqnorm(rand[,1]); qqline(rand[,1])
qqnorm(rand[,2]); qqline(rand[,2])
is.integer(rand)
#[1] TRUE
sum(rand<0)
#[1] 0
colSums(rand)
#x y
#15084 15084
mean(rand[,1])
#[1] 15.084
mean(rand[,2])
#[1] 15.084
sd(rand[,1])
#[1] 4.086275
sd(rand[,2])
#[1] 3.741249
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.