简体   繁体   English

如何根据条件和随机抽样替换列的值?

[英]How to replace values of a column based on a condition and random sampling?

I'm working on a Gender column that has factors as values namely 'Male', 'Female' and 'Total'. 我正在处理“ Gender列,该列的要素值为“男性”,“女性”和“总计”。 The 'Total' is unneeded and so I've decided to replace half of the 'Total' values with males and the rest to be assigned females. 不需要“总计”,因此我决定将“总计”值的一半替换为男性,其余部分分配给女性。 The column is simple and I've converted all factors to numerals through the basic as.numeric(factor()) line: 该列很简单,我已经通过基本的as.numeric(factor())行将所有因子转换为数字:

Gender     NewGender
Male       1
Female     2
Total      3
Total      3
.
.
Female     2

Now the next step is to replace all the 3s with 1s and 2s but in a random order . 现在,下一步是将1和2替换为所有3,但顺序随机

There are a total of 55,399 observations of which 22,057 correspond to threes in the NewGender column. 共有55399个观测值,其中22,057个对应于NewGender列中的3。 I have tried some unique set of commands of which the closest one I think is: 我尝试了一些独特的命令集,我认为它们是最接近的:

# Experiment with 50 rows

for (row in data$NewGender[sample(which(data$NewGender, 50), ]) {
        if (row == 3) {row <- 1; row <- row + 1}
}

This generates warnings though and doesn't seem to be replacing the threes. 尽管这会生成警告,但似乎并不能代替三者。 I could well use this: 我很好用这个:

data$NewGender[data$NewGender == 3] <- 1

But I'm unable to nest it with the sample() method. 但是我无法将其与sample()方法嵌套在一起。 What I want is Newgender containing only ones and twos with half of all the threes replaced to ones and the rest half to be twos fully randomised. 我想要的是Newgender仅包含一和二,将三分之一的一半替换为一,而其余的一半则是完全随机的二。 Any good suggestions? 有什么好的建议吗? Thanks in advance. 提前致谢。

I would say that the easiest is to use sample and ifelse , also you should probably sample based on the distribution of males/females. 我想说,最简单的方法是使用sampleifelse ,您也应该根据男性/女性的分布情况进行抽样。

# Some data
gender <- sample(c("male", "female", "other"), 100, prob = c(0.4, 0.3, 0.3), replace = TRUE)

# Calculating proportion of females vs males
male_prop <- sum(gender=="male")/(sum(gender=="male")+sum(gender=="female"))
female_prop <- sum(gender=="female")/(sum(gender=="male")+sum(gender=="female"))

# Replacing other at random
gender <- ifelse(gender=="other", sample(c("male", "female"), 1, prob = c(male_prop, female_prop), replace = TRUE), gender)

Note: As in markus answer, it is a good idea to set a seed to ensure reproducibility. 注意:与马库斯答案一样,最好设置一个种子以确保可重复性。

You can use replace and sample . 您可以使用replacesample

Given a vector containing numbers from 1 to 3: 给定一个包含1到3的数字的向量

set.seed(1)
NewGender <- sample(1:3, 20, TRUE)
table(NewGender)
#NewGender
#1 2 3 
#5 7 8 

We create a logical vector that is TRUE where NewGender equals 3. 我们创建了一个合乎逻辑的载体,其是TRUE其中NewGender等于3。

idx <- NewGender == 3

Now we replace the 3's by a sample of 1's and 2's 现在我们用1和2的样本替换3

out <- replace(NewGender, idx, sample(1:2, sum(idx), TRUE))

Check the distribution 检查分布

table(out)
#out
# 1  2 
#11  9 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM