I'm working on a Gender
column that has factors as values namely 'Male', 'Female' and 'Total'. The 'Total' is unneeded and so I've decided to replace half of the 'Total' values with males and the rest to be assigned females. The column is simple and I've converted all factors to numerals through the basic as.numeric(factor())
line:
Gender NewGender
Male 1
Female 2
Total 3
Total 3
.
.
Female 2
Now the next step is to replace all the 3s with 1s and 2s but in a random order .
There are a total of 55,399 observations of which 22,057 correspond to threes in the NewGender
column. I have tried some unique set of commands of which the closest one I think is:
# Experiment with 50 rows
for (row in data$NewGender[sample(which(data$NewGender, 50), ]) {
if (row == 3) {row <- 1; row <- row + 1}
}
This generates warnings though and doesn't seem to be replacing the threes. I could well use this:
data$NewGender[data$NewGender == 3] <- 1
But I'm unable to nest it with the sample()
method. What I want is Newgender
containing only ones and twos with half of all the threes replaced to ones and the rest half to be twos fully randomised. Any good suggestions? Thanks in advance.
I would say that the easiest is to use sample and ifelse , also you should probably sample based on the distribution of males/females.
# Some data
gender <- sample(c("male", "female", "other"), 100, prob = c(0.4, 0.3, 0.3), replace = TRUE)
# Calculating proportion of females vs males
male_prop <- sum(gender=="male")/(sum(gender=="male")+sum(gender=="female"))
female_prop <- sum(gender=="female")/(sum(gender=="male")+sum(gender=="female"))
# Replacing other at random
gender <- ifelse(gender=="other", sample(c("male", "female"), 1, prob = c(male_prop, female_prop), replace = TRUE), gender)
Note: As in markus answer, it is a good idea to set a seed to ensure reproducibility.
You can use replace
and sample
.
Given a vector containing numbers from 1 to 3:
set.seed(1)
NewGender <- sample(1:3, 20, TRUE)
table(NewGender)
#NewGender
#1 2 3
#5 7 8
We create a logical vector that is TRUE
where NewGender
equals 3.
idx <- NewGender == 3
Now we replace the 3's by a sample of 1's and 2's
out <- replace(NewGender, idx, sample(1:2, sum(idx), TRUE))
Check the distribution
table(out)
#out
# 1 2
#11 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.