根据长度重新编码变量

Question

I have a large dataframe with a structure like this:我有一个大的dataframe ，其结构如下：

id v1 v2 v3 v4 v5
1  1  1 98  1  1
2  1  1  1  1  1
3  4  1  0 22  1
4  5  1  1  1  1
5  1  1 90  1  1

I would like to move from v2 all the way to v5 and if the variable value is greater than 1 character in length then it gets recoded to 9, so the resulting df would be:我想从v2一直移动到v5 ，如果变量值的长度大于 1 个字符，那么它会被重新编码为 9，因此生成的df将是：

id v1 v2 v3 v4 v5
1  1  1  9  1  1
2  1  1  1  1  1
3  4  1  0  9  1
4  5  1  1  1  1
5  1  1  9  1  1

Note: All variables are stored as strings that's why I'm looking to incorporate length as part of the answer.注意：所有变量都存储为字符串，这就是为什么我希望将长度纳入答案的一部分。

Answer 1

If this is a large dataframe , using the data.table library, you could do:如果这是一个大的dataframe ，使用data.table库，你可以这样做：

Reprex代表

Code代码

library(data.table)

cols <- paste0("v", 2:5)
setDT(df)[, (cols) := lapply(.SD, function(x) fifelse(nchar(x) > 1, 9, x)), .SDcols = cols][]

Output Output

#>    id v1 v2 v3 v4 v5
#> 1:  1  1  1  9  1  1
#> 2:  2  1  1  1  1  1
#> 3:  3  4  1  0  9  1
#> 4:  4  5  1  1  1  1
#> 5:  5  1  1  9  1  1

^{Created on 2022-03-14 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-03-14}

EDIT:编辑：

`dplyr` solution `dplyr`解决方案

Code代码

library(dplyr)

df %>% mutate(across(v2:v5, ~ ifelse(nchar(.x) > 1, 9, .x)))

Output Output

#>   id v1 v2 v3 v4 v5
#> 1  1  1  1  9  1  1
#> 2  2  1  1  1  1  1
#> 3  3  4  1  0  9  1
#> 4  4  5  1  1  1  1
#> 5  5  1  1  9  1  1

Base R solution基地R解决方案

Code代码

cols <- paste0("v", 2:5)
df[, cols] <- apply(df[, cols], c(1,2), function(x) ifelse(nchar(x) > 1, 9, x))

Output Output

df
#>   id v1 v2 v3 v4 v5
#> 1  1  1  1  9  1  1
#> 2  2  1  1  1  1  1
#> 3  3  4  1  0  9  1
#> 4  4  5  1  1  1  1
#> 5  5  1  1  9  1  1

^{Created on 2022-03-14 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-03-14}

Answer 2

A dplyr solution:一个dplyr解决方案：

library(dplyr)

 df1 %>%   mutate(across(v2:v5, ~ifelse(nchar(.x)>1, 9, .x)))

#> # A tibble: 5 x 6
#>      id    v1    v2    v3    v4    v5
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     1     1     9     1     1
#> 2     2     1     1     1     1     1
#> 3     3     4     1     0     9     1
#> 4     4     5     1     1     1     1
#> 5     5     1     1     9     1     1

^{Created on 2022-03-13 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-03-13}

data数据

df1 <- structure(list(id = c(1, 2, 3, 4, 5), v1 = c(1, 1, 4, 5, 1), 
                      v2 = c("1", "1", "1", "1", "1"), v3 = c("98", "1", "0", "1", 
                                                              "90"), v4 = c("1", "1", "22", "1", "1"), v5 = c("1", "1", 
                                                                                                              "1", "1", "1")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                       -5L))

Answer 3

df <- data.frame(id,v1,v2,v3,v4,v5)
n <- NROW(df)
m <- NCOL(df)

for (j in 1:m) {
 for (i in 1:n) { 
   ifelse(nchar(df[i,j]) > 1, df[i,j] <- 9, "")
    }
}

根据长度重新编码变量

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-03-13 23:15:19

EDIT:编辑：

`dplyr` solution `dplyr`解决方案

Base R solution基地R解决方案

解决方案2
1 2022-03-13 23:23:10

解决方案3
0 2022-03-13 23:58:54

根据长度重新编码变量

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-03-13 23:15:19

EDIT:编辑：

dplyr solution dplyr解决方案

Base R solution基地R解决方案

解决方案2 1 2022-03-13 23:23:10

解决方案3 0 2022-03-13 23:58:54

解决方案1
1 已采纳 2022-03-13 23:15:19

`dplyr` solution `dplyr`解决方案

解决方案2
1 2022-03-13 23:23:10

解决方案3
0 2022-03-13 23:58:54