使用lapply标记特定变量的值

Question

I would like to use lapply to label the values of specific variables. 我想使用lapply标记特定变量的值。 I have found an example that gets me close ( here ), but I can't get it to work for only certain variables in the data set. 我找到了一个使我接近的示例（在这里），但是我无法使它仅对数据集中的某些变量起作用。

Working example: 工作示例：

df1 <- tribble(
 ~var1, ~var2, ~var3, ~var4,
 "1",   "1",   "1", "a",
 "2",   "2",   "2", "b",
 "3",   "3",   "3", "c"
)

Here is the code that seems like it should work, but doesn't: 这是似乎应该起作用的代码，但无效：

df1["var1", "var2"] <- lapply(df1["var1", "var2"], factor,
                          levels=c(1, 
                                   2, 
                                   3), 
                          labels = c("Agree", 
                                     "Neither Agree/Disagree", 
                                     "Disagree"))

The code runs, but give the following output: 该代码运行，但是给出以下输出：

# A tibble: 4 x 4
  var1  var2  var3  var4
* <chr> <chr> <chr> <chr>
1     1     1     1     a
2     2     2     2     b
3     3     3     3     c
4  <NA>  <NA>  <NA>  <NA>

If I try with just one variable, it works: 如果我仅尝试使用一个变量，它将起作用：

df1["var1"] <- lapply(df1["var1"], factor,
                          levels=c(1, 
                                2, 
                                3), 
                          labels = c("Agree", 
                                  "Neither Agree/Disagree", 
                                  "Disagree"))

It gives the following output (which is correct): 它提供以下输出（正确）：

# A tibble: 3 x 4
                    var1  var2  var3  var4
                  <fctr> <chr> <chr> <chr>
1                  Agree     1     1     a
2 Neither Agree/Disagree     2     2     b
3               Disagree     3     3     c

I have tried a lot of different ways to change the code to get it to work, but I just can't figure it out. 我尝试了许多不同的方法来更改代码以使其正常工作，但我只是想不通。

Answer 1

You were close. 你近了 We need df1[c("var1", "var2")] to specify columns. 我们需要df1[c("var1", "var2")]来指定列。

df1[c("var1", "var2")] <- lapply(df1[c("var1", "var2")], factor,
                              levels=c("1", 
                                       "2", 
                                       "3"), 
                              labels = c("Agree", 
                                         "Neither Agree/Disagree", 
                                         "Disagree"))
df1
# # A tibble: 3 x 4
#                     var1                   var2  var3  var4
#                   <fctr>                 <fctr> <chr> <chr>
# 1                  Agree                  Agree     1     a
# 2 Neither Agree/Disagree Neither Agree/Disagree     2     b
# 3               Disagree               Disagree     3     c

Answer 2

Your problem is arising because you're trying to subset your data.frame incorrectly. 出现问题是因为您尝试错误地对data.frame进行子集data.frame 。

In a data.frame or tbl , extracting using [ works in a couple of ways. 在data.frame或tbl ，使用[提取的方式有两种。

Since the data is in a matrix -like rectangular form, you can use a [row, column] approach to get specific values. 由于数据是类似matrix的矩形形式，因此可以使用[row, column]方法来获取特定值。 For example to get a single value, you can do something like df1[2, 1] . 例如，要获取单个值，可以执行类似df1[2, 1] 。
Since a tbl / data.frame is a special type of list , if you don't supply a comma, it assumes you want the entire list element. 由于tbl / data.frame是list的特殊类型，如果不提供逗号，则假定您需要整个list元素。

Thus, when you did ["var1", "var2"] , it went into matrix subsetting mode and was looking for a row named "var1", which it couldn't find, so it inserted a row of NA values in your dataset. 因此，当您执行["var1", "var2"] ，它进入了matrix子集模式，并且正在寻找找不到的名为“ var1”的行，因此在数据集中插入了NA值的行。

Here's a small set of examples for you to experiment with. 这里有一些示例供您尝试。

Get rows 1:4 and columns 1:4 获取行1：4和列1：4

 df <- mtcars[1:4, 1:4] df # mpg cyl disp hp # Mazda RX4 21.0 6 160 110 # Mazda RX4 Wag 21.0 6 160 110 # Datsun 710 22.8 4 108 93 # Hornet 4 Drive 21.4 6 258 110

Extract a single value using a [row, column] approach 使用[row, column]方法提取单个值
```
 df["Mazda RX4", "mpg"] # [row, column] # [1] 21 
```
Check whether a data.frame is a list 检查data.frame是否为list
```
 is.list(df) # [1] TRUE 
```

Convert a data.frame to a list and try to extract using [row, column] . 将data.frame转换为list然后尝试使用[row, column]进行提取。

 L <- unclass(df) L["Mazda RX4", "mpg"] # A list doesn't have `dim`s. # Error in L["Mazda RX4", "mpg"] : incorrect number of dimensions

Providing just one value to extract from a data.frame or a list 仅提供一个值以从data.frame或list提取

 df["mpg"] # Treats it as asking for a single value from a list # mpg # Mazda RX4 21.0 # Mazda RX4 Wag 21.0 # Datsun 710 22.8 # Hornet 4 Drive 21.4 L["mpg"] # $mpg # [1] 21.0 21.0 22.8 21.4

Providing a vector of values to extract 提供要提取的值向量

 df[c("mpg", "hp")] # mpg hp # Mazda RX4 21.0 110 # Mazda RX4 Wag 21.0 110 # Datsun 710 22.8 93 # Hornet 4 Drive 21.4 110 L[c("mpg", "hp")] # $mpg # [1] 21.0 21.0 22.8 21.4 # # $hp # [1] 110 110 93 110

Since a data.frame is a special type of list with dim s, using an empty [, vals] would work 由于data.frame是带有dim s的特殊list类型，因此使用空的[, vals]将起作用
```
 df[, c("mpg", "hp")] # mpg hp # Mazda RX4 21.0 110 # Mazda RX4 Wag 21.0 110 # Datsun 710 22.8 93 # Hornet 4 Drive 21.4 110 
```
Looking for a row that is not there would return NA s 查找不存在的行将返回NA
```
 df["not here", ] # mpg cyl disp hp # NA NA NA NA NA 
```

Keeping those details in mind, your best approach is to just use (as suggested in @www's answer : 牢记这些细节，您最好的方法是仅使用（如@www的答案所建议：

df1[c("var1", "var2")]

使用lapply标记特定变量的值

问题描述

2 个解决方案

解决方案1
2 2017-12-22 02:39:56

解决方案2
2 已采纳 2017-12-22 04:29:53

使用lapply标记特定变量的值

问题描述

2 个解决方案

解决方案1 2 2017-12-22 02:39:56

解决方案2 2 已采纳 2017-12-22 04:29:53

解决方案1
2 2017-12-22 02:39:56

解决方案2
2 已采纳 2017-12-22 04:29:53