简体   繁体   English

如何在R中重新编码变量

[英]How to recode variables in R

I am trying to recode variables in an R dataframe. 我正在尝试在R数据框中重新编码变量。 Example - variable X from my dataset contains 1's and 0's. 示例-我数据集中的变量X包含1和0。 I want to create another variables Y which recodes 1's & 0's from X into Yes & No respectively. 我想创建另一个变量Y,分别将X的1和0分别编码为Yes和No。

I tried this to create the recoded Y variable: 我试图这样做来创建重新编码的Y变量:

w <- as.character()

for (i in seq_along(x))  {
    if (x[i] == 1)  {
        recode <- "Yes"
    } else if (x[i] == 0)  {
        recode <- "No"       
    }
    w <- cbind(w, recode)
}

Then I did this to line-up X and Y together: 然后,我这样做是为了将X和Y排列在一起:

y <- c(x, y)

What I got back was this: 我回来的是:

 y
 # [1] "1"   "1"   "0"   "1"   "0"   "0"   "1"   "1"   "0"   "1"   "0"   "0"   "Yes" "Yes" "No"  "Yes" "No"  "No" 

I was expecting a dataframe with X & Y columns. 我期待带有X和Y列的数据框。

Question: 题:

  1. How do I get X and Y into a dataframe? 如何将X和Y放入数据框?
  2. Is there a better way for recoding variables in a dataframe? 有没有更好的方法来重新编码数据帧中的变量?

Recoding is generally about applying new labels to the levels of a factor (categorical variable) 重新编码通常是将新标签应用于因子(分类变量)的级别

In R, you do that like this: 在R中,您可以这样操作:

w <- factor(x, levels = c(1,0), labels = c('yes', 'no'))

Using the following data: 使用以下数据:

x  <- c(rep.int(0, 10), rep.int(1, 10))
df <- as.data.frame(x)
df
#    x
# 1  0
# 2  0
# 3  0
# ...

I'd create a new variable and recode in one step: 我将创建一个新变量并一步重新编码:

df$y[df$x == 1] <- "yes"
df$y[df$x == 0] <- "no"
df
#    x   y
# 1  0  no
# 2  0  no
# 3  0  no
# ...
# 11 1 yes
# 12 1 yes
# 13 1 yes
# ...

Note for loops are not optimum in R, but your loop is basically correct. 请注意for循环在R中并非最佳,但您的循环基本上是正确的。 You need to replace w <- rbind(w, recode) with w <- cbind(w, recode) in the loop itself and, in the final step, you can cbind x and w : 您需要在循环本身w <- cbind(w, recode)替换为w <- rbind(w, recode) w <- cbind(w, recode) ,最后一步,您可以cbind xw cbind

w <- as.character()
for (i in seq_along(x))  {
  if (x[i] == 1)  {
    recode <- "Yes"
  } else if (x[i] == 0)  {
    recode <- "No"       
  }
  w <- rbind(w, recode)
}
y <- c(x, w)
y

rbind() appends rows, cbind() appends columns, and c() joins two strings together which is why you were getting two lists joined together into one. rbind()追加行, cbind()追加列, c()将两个字符串连接在一起,这就是为什么要将两个列表连接在一起的原因。

This is one of the many cases where you really shouldn't use a loop in R. 这是您实际上不应在R中使用循环的众多情况之一。

Instead, use vectorisation, ie ifelse or indexing. 而是使用向量化,即ifelse或索引。

result = data.frame(x = x, y = ifelse(x == 1, 'yes', 'no'))

(This assumes that there are only 1s and 0s in the input; if that isn't the case, you need a nested ifelse or a list containing the translations). (这假定输入中只有1和0;如果不是这种情况,则需要嵌套的ifelse或包含翻译的列表)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM