I want to create a random dummy (1 and 0) variable in R or Stata, but how can I make that, for example, 70% of observations be 1 and the rest 0. Thanks
Here's an approach with sample
from base R:
sample(c(1,0), size = 2000, prob = c(0.7,0.3), replace = TRUE)
# [1] 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1
#[58] 1 1 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1
As @Ben Bolker points out in the comments, it would be unusual for exactly 1400 to be 1
.
This approach will result in exactly 1400 1
s:
sample(rep(c(1,0),c(1400,600)), 2000)
If you wanted exactly 70% of 1s (or any other percentage), but a random ordering of the elements, you can use this function.
random_binary <- function(n, p){
# p is the proportion of 1s
x <- c(rep(1, times=n * p), rep(0, times=n * (1 - p)))
x[sample(length(x))] # or sample(x)
}
random_binary(10, 0.7)
#[1] 1 0 1 1 0 0 1 1 1 1
The times
argument of rep
can be non-integer, as mentioned in the documentation.
? rep
times
A double vector is accepted, other inputs being coerced to an integer or double vector.
But note that you may not get exactly the percentage desired (but as close as possible).
An alternative is to use rbinom
, since we're effectively sampling from the binomial distribution.
rbinom(10, size=1, p=0.7)
# [1] 0 0 0 0 1 1 1 0 1 0
This is similar to sample
with the prob
argument and, as shown above, does not guarantee to return exactly 70% of 1s.
In Stata for exactly 70% 1s and 30% 0s
set obs 2000
set seed 1606
gen wanted = cond(_n <= 70, 1, 0)
gen random = runiform()
sort random
For approximately 70% 1s and 30% 0s
gen better = runiform() < 0.7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.