如何在R中重新编码变量

Question

I am trying to recode variables in an R dataframe. 我正在尝试在R数据框中重新编码变量。 Example - variable X from my dataset contains 1's and 0's. 示例-我数据集中的变量X包含1和0。 I want to create another variables Y which recodes 1's & 0's from X into Yes & No respectively. 我想创建另一个变量Y，分别将X的1和0分别编码为Yes和No。

I tried this to create the recoded Y variable: 我试图这样做来创建重新编码的Y变量：

w <- as.character()

for (i in seq_along(x))  {
    if (x[i] == 1)  {
        recode <- "Yes"
    } else if (x[i] == 0)  {
        recode <- "No"       
    }
    w <- cbind(w, recode)
}

Then I did this to line-up X and Y together: 然后，我这样做是为了将X和Y排列在一起：

y <- c(x, y)

What I got back was this: 我回来的是：

 y
 # [1] "1"   "1"   "0"   "1"   "0"   "0"   "1"   "1"   "0"   "1"   "0"   "0"   "Yes" "Yes" "No"  "Yes" "No"  "No"

I was expecting a dataframe with X & Y columns. 我期待带有X和Y列的数据框。

Question: 题：

How do I get X and Y into a dataframe? 如何将X和Y放入数据框？
Is there a better way for recoding variables in a dataframe? 有没有更好的方法来重新编码数据帧中的变量？

Answer 1

Recoding is generally about applying new labels to the levels of a factor (categorical variable) 重新编码通常是将新标签应用于因子（分类变量）的级别

In R, you do that like this: 在R中，您可以这样操作：

w <- factor(x, levels = c(1,0), labels = c('yes', 'no'))

Answer 2

Using the following data: 使用以下数据：

x  <- c(rep.int(0, 10), rep.int(1, 10))
df <- as.data.frame(x)
df
#    x
# 1  0
# 2  0
# 3  0
# ...

I'd create a new variable and recode in one step: 我将创建一个新变量并一步重新编码：

df$y[df$x == 1] <- "yes"
df$y[df$x == 0] <- "no"
df
#    x   y
# 1  0  no
# 2  0  no
# 3  0  no
# ...
# 11 1 yes
# 12 1 yes
# 13 1 yes
# ...

Note for loops are not optimum in R, but your loop is basically correct. 请注意for循环在R中并非最佳，但您的循环基本上是正确的。 You need to replace w <- rbind(w, recode) with w <- cbind(w, recode) in the loop itself and, in the final step, you can cbind x and w : 您需要在循环本身w <- cbind(w, recode)替换为w <- rbind(w, recode) w <- cbind(w, recode) ，最后一步，您可以cbind x和w cbind ：

w <- as.character()
for (i in seq_along(x))  {
  if (x[i] == 1)  {
    recode <- "Yes"
  } else if (x[i] == 0)  {
    recode <- "No"       
  }
  w <- rbind(w, recode)
}
y <- c(x, w)
y

rbind() appends rows, cbind() appends columns, and c() joins two strings together which is why you were getting two lists joined together into one. rbind()追加行， cbind()追加列， c()将两个字符串连接在一起，这就是为什么要将两个列表连接在一起的原因。

Answer 3

This is one of the many cases where you really shouldn't use a loop in R. 这是您实际上不应在R中使用循环的众多情况之一。

Instead, use vectorisation, ie ifelse or indexing. 而是使用向量化，即ifelse或索引。

result = data.frame(x = x, y = ifelse(x == 1, 'yes', 'no'))

(This assumes that there are only 1s and 0s in the input; if that isn't the case, you need a nested ifelse or a list containing the translations). （这假定输入中只有1和0；如果不是这种情况，则需要嵌套的ifelse或包含翻译的列表）。

如何在R中重新编码变量

问题描述

3 个解决方案

解决方案1
3 2015-12-07 12:42:50

解决方案2
1 2015-12-07 12:29:38

解决方案3
1 已采纳 2015-12-07 12:33:54

如何在R中重新编码变量

问题描述

3 个解决方案

解决方案1 3 2015-12-07 12:42:50

解决方案2 1 2015-12-07 12:29:38

解决方案3 1 已采纳 2015-12-07 12:33:54

解决方案1
3 2015-12-07 12:42:50

解决方案2
1 2015-12-07 12:29:38

解决方案3
1 已采纳 2015-12-07 12:33:54