使用 R 在 dataframe 中重新编码多个变量的最短和最干净的方法是什么？

Question

所以我在社会科学领域工作，我经常要做的就是操纵多个变量来改变值。 这通常意味着颠倒比例。 我已经使用SPSS很长时间了，那里的语法非常简单。 要更改您编写的多个变量的值：

RECODE var1 var2 var3 (1=5) (2=4) 4=2) (5=1) (ELSE=COPY).

要在新变量中写入新代码，请添加into newvar1 newvar1 newvar3. 在最后。 在括号中，您可以使用hi 、 lo 、 1 to 4等。

现在我正在努力进入R并且我正在努力寻找执行类似工作流程的最佳方法。 我找到了以下解决方案，但找不到简短的好方法：

## Packages -----
library(dplyr)
library(car)

## Data -----
tib <- tibble(v1 = 1:4, 
              v2 = 1:4,
              v3 = sample(1:5, 4, replace = FALSE))

vars <- c("v1", "v2", "v3")

基础方式：

tib$v2_rec <- NA
tib$v2_rec[tib$v2 == 1] <- 5 #1
tib$v2_rec[tib$v2 == 2] <- 4 #2
tib$v2_rec[tib$v2 == 3] <- 3 #3
tib$v2_rec[tib$v2 == 4] <- 2 #4
tib$v2_rec[tib$v2 == 5] <- 1 #5
# I'm forced to create a new variable here, otherwise #4 and #5 overwrite #1 and #2.
# Therefore I won't even bother to try to loop trough multiple variables.

来自 package 汽车的 recode()：

tib$v1 <- recode(tib$v1, "1=5; 2=4; 4=2; 5=1")
# This is nice, understandable and short
# To handle multiple variables the following solutions won't work, because the reload functions seems not to be able to iterate through lists:

tib[vars] <- recode(tib[vars], "1=5; 2=4; 4=2; 5=1")
tib[1:3] <- recode(tib[1:3], "1=5; 2=4; 4=2; 5=1")

# I'd be forced to loop:

for (i in vars) {
  tib[[i]] <- recode(tib[[i]], "1=5; 2=4; 4=2; 5=1")
}

我对此很满意，但我想知道是否有 function 可以为我完成循环工作。 我现在真的在 dplyer 功能上苦苦挣扎，我很不高兴我无法直观地弄清楚事情......

我试过变异：

#I get it for a single case and for multiple cases i got to a solution in combination with the recode() function:

tib <- tib %>%
  mutate_at(vars(v1:v3), 
            function(x) recode(x, "1=5; 2=4; 4=2; 5=1"))

这是最好的方法吗？ 为了清楚起见，我看到了一些其他使用 case_when()、replace() 或 mapvalues() 的解决方案，但我发现上面的解决方案更好，因为我喜欢一眼就看到什么值被重新编码为什么值。

我对 apply() function 有了一点了解，甚至无法用它重新编码一个变量。 我相信我也会很快掌握这一点，但目前我只是有点沮丧，我在 SPSS 中寻找工作流程花了我多长时间。 如果您知道任何比上述使用 apply() function 的解决方案更短、更清洁的解决方案，我将非常感激！

我对 R 和它的可能性感到满意，但现在我需要一个正确方向的提示来让我继续前进！ 先感谢您！

Answer 1

我认为如果使用正确， dplyr在这种情况下具有“最干净”的语法：

library(dplyr)
tib <- tibble(v1 = 1:4, 
              v2 = 1:4,
              v3 = sample(1:5, 4, replace = FALSE))

tib %>% 
  mutate_at(vars(v1:v3), recode, `1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1)
#> # A tibble: 4 x 3
#>      v1    v2    v3
#>   <dbl> <dbl> <dbl>
#> 1     5     5     2
#> 2     4     4     5
#> 3     3     3     4
#> 4     2     2     1

请注意，我必须添加3 = 3因为 recode 需要替换所有值。

我经常发现使用对我来说新的函数更明确地编写东西更容易，所以也许这可能会有所帮助：

tib %>% 
  mutate_at(.vars = vars(v1:v3), 
            .funs = function(x) recode(x, 
                                       `1` = 5, 
                                       `2` = 4, 
                                       `3` = 3, 
                                       `4` = 2, 
                                       `5` = 1))

如果您更喜欢从car recode function，您不应该加载car ，而是使用：

tib %>% 
  mutate_at(vars(v1:v3), car::recode, "1=5; 2=4; 4=2; 5=1")

这样你就不会在将dplyr与car混合时遇到麻烦（只要你不需要car来做其他事情。

Answer 2

这是仅使用基本函数的简单方法。 这假设这些是原始编码为 1 - 5 的 5 点 Likert 项目。例如，如果您有 7 点 Likert 项目，或编码为 0 - 4 或 -2 - 2，则需要调整它.

一些编码说明：您的数据集有一个伪随机生成元素（对sample()的调用）； 要使数据集完全可重现，请使用?set.seed 。 使用箭头赋值运算符 ( (var <- value) ) 时，您可以通过将其括在括号中来自动打印已赋值的变量或数据集。 R 是矢量化的，所以你不需要循环（虽然这里真的没问题——变量很少，不会导致明显的减速）。

set.seed(4636)  # this makes the example exactly reproducible
(d <- data.frame(v1 = 1:4, 
                 v2 = 1:4,
                 v3 = sample(1:5, 4, replace = FALSE)))  # adding outer ()'s prints
#   v1 v2 v3
# 1  1  1  1
# 2  2  2  2
# 3  3  3  5
# 4  4  4  4

d.orig <- d  # here's your original dataset, so they aren't overwritten
(d <- 6-d)  # adding outer ()'s prints
#   v1 v2 v3
# 1  5  5  5
# 2  4  4  4
# 3  3  3  1
# 4  2  2  2

rec.vars <- c("v2")
d.some   <- d.orig
(d.some[,rec.vars] <- 6-d.some[,rec.vars])
# [1] 5 4 3 2
d.some
#   v1 v2 v3
# 1  1  5  1
# 2  2  4  2
# 3  3  3  5
# 4  4  2  4

##### to do more than 1 variable
(rec.vars <- paste0("v", c(2,3)))
# [1] "v2" "v3"
d.some   <- d.orig
(d.some[,rec.vars] <- 6-d.some[,rec.vars])
#   v2 v3
# 1  5  5
# 2  4  4
# 3  3  1
# 4  2  2
d.some
#   v1 v2 v3
# 1  1  5  5
# 2  2  4  4
# 3  3  3  1
# 4  4  2  2

使用 R 在 dataframe 中重新编码多个变量的最短和最干净的方法是什么？

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-10-09 15:19:35

解决方案2
1 2019-10-09 15:17:18

使用 R 在 dataframe 中重新编码多个变量的最短和最干净的方法是什么？

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-10-09 15:19:35

解决方案2 1 2019-10-09 15:17:18

解决方案1
2 已采纳 2019-10-09 15:19:35

解决方案2
1 2019-10-09 15:17:18