简体   繁体   English

在 R 中越来越多地将多列重新编码为数字

[英]Recode multiple columns to numbers increasingly in R

I have 50 columns of names, but here I have presented only 4 columns for convenience.我有 50 列名称,但为了方便,这里我只显示了 4 列。

Name1       Name2         Name3      Name4
Rose,Ali    Van,Hall      Ghol,Dam   Murr,kate
Camp,Laura  Ka,Klo        Dan,Dan    Ali,Hoss
Rose,Ali    Van,Hall      Ghol,Dam   Kol,Kan
Murr,Kate   Ismal, Ismal  Sian,Rozi  Nas,Ami
Ghol,Dam    Ka,Klo        Rose,Ali   Nor,Ko
Murr,Kate   Ismal, Ismal  Dan,Dan    Nas,Ami

I want to assign numbers to each person based on the columns, a sequence of numbers.我想根据列(一系列数字)为每个人分配数字。

For example, in Name 1, we get the numbers from 1-4.例如,在 Name 1 中,我们获取 1-4 中的数字。 The repeated names will get the same numbers.重复的名字会得到相同的数字。

In Name 2, it should be started from 5 and so on.在名称 2 中,它应该从 5 开始,依此类推。 This will give me the following table:这将给我下表:

Assign1 Assian2 Assian3 Assian4
      1       5       8      12
      2       6       9      13
      1       5       8      14
      3       7      10      15
      4       6      11      17
      3       7       9      15

I would like to have it without a loop, ie, sapply ,ie, sapply(dat, function(x) match(x, unique(x))) .我希望它没有循环,即sapply ,即sapply(dat, function(x) match(x, unique(x)))

Using dplyr or tidyverse would be great.使用 dplyr 或 tidyverse 会很棒。

A tidyverse solution with purrr::accumulate() :使用purrr::accumulate()tidyverse解决方案:

library(tidyverse)

df %>%
  mutate(as_tibble(
    accumulate(across(Name1:Name4, ~ match(.x, unique(.x))), ~ .y + max(.x))
  ))

#   Name1 Name2 Name3 Name4
# 1     1     5     8    12
# 2     2     6     9    13
# 3     1     5     8    14
# 4     3     7    10    15
# 5     4     6    11    16
# 6     3     7     9    15

Because the values in each column depend on the values in the previous column, the calculations have to be done sequentially.由于每列中的值取决于前一列中的值,因此必须按顺序进行计算。 This is probably most succinctly achieved by a loop.这可能是通过循环最简洁地实现的。 Remember that lapply and sapply are simply loops-in-disguise, and won't be quicker than an explicit loop.请记住, lapplysapply只是变相循环,不会比显式循环更快。

Note that your expected output has a mistake in it (there is a number 17 which should be 16)请注意,您预期的 output 中有一个错误(数字 17 应该是 16)

output <- setNames(df, paste0('Assign', seq_along(df)))
                   
for(i in seq_along(output)) {
  output[[i]] <- match(output[[i]], unique(output[[i]]))
  if(i > 1) output[[i]] <- output[[i]] + max(output[[i - 1]])
}

output
#>    Assign1  Assign2  Assign3  Assign4
#> 1        1        5        8       12
#> 2        2        6        9       13
#> 3        1        5        8       14
#> 4        3        7       10       15
#> 5        4        6       11       16
#> 6        3        7        9       15

Edit编辑

If you really want it without an explicit loop, you can do:如果你真的想要它而没有显式循环,你可以这样做:

res <- sapply(seq_along(df), \(i) match(df[[i]], unique(df[[i]]))) 
res + t(replicate(nrow(df), head(c(0, cumsum(apply(res, 2, max))), -1))) |>
  as.data.frame() |>
  setNames(paste0('Assign', seq_along(df)))
#>   Assign1 Assign2 Assign3 Assign4
#> 1       1       5       8      12
#> 2       2       6       9      13
#> 3       1       5       8      14
#> 4       3       7      10      15
#> 5       4       6      11      16
#> 6       3       7       9      15

Created on 2023-01-13 with reprex v2.0.2创建于 2023-01-13,使用reprex v2.0.2


Data taken from question in reproducible format以可复制格式从问题中获取的数据

df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali", 
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall", 
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi", 
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan", 
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L), 
class = "data.frame")

Here is a tidyverse approach:这是一个tidyverse方法:

First paste the column name after each of the strings in all your columns, for sorting purpose later.首先将列名paste在所有列中的每个字符串之后,以便稍后进行排序。 Then pivot it into a two-column df so that we can assign ID to them by match .然后将pivot放入两列 df 中,以便我们可以通过match为它们分配 ID。 Finally pivot it back to a wide format and unnest the list columns.最后pivot它回到宽格式并取消嵌套列表列。

library(tidyverse)

df %>% 
  mutate(across(everything(), ~ paste0(.x, "_", cur_column()))) %>% 
  pivot_longer(everything(), names_to = "ab", values_to = "a") %>% 
  arrange(ab) %>% 
  mutate(b = match(a, unique(a)), .keep = "unused") %>% 
  pivot_wider(names_from = "ab", values_from = "b") %>% 
  unnest(everything())

# A tibble: 6 × 4
  Name1 Name2 Name3 Name4
  <int> <int> <int> <int>
1     1     5     8    12
2     2     6     9    13
3     1     5     8    14
4     3     7    10    15
5     4     6    11    16
6     3     7     9    15

Data数据

Taken from @Allan Cameron.取自@Allan Cameron。

df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali", 
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall", 
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi", 
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan", 
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L), 
class = "data.frame")

Update: The approach below is not ideal because ID's are not unique.更新:下面的方法并不理想,因为 ID 不是唯一的。 Sorry.对不起。

Using a lookup table with tidyverse :使用带有tidyverse的查找表:

library(dplyr)
library(tidyr)

lookup <-
  df |> 
  pivot_longer(everything()) |>
  distinct() |>
  arrange(name) |>
  transmute(name = value, value = row_number()) |>
  deframe()

df |>
  mutate(across(everything(), ~ recode(., !!!lookup)))

Output: Output:

  Name1 Name2 Name3 Name4
1     1     5     4    12
2     2     6     9    13
3     1     5     4    14
4     3     7    10    15
5     4     6     1    16
6     3     7     9    15

Data from @Allan Cameron, thanks.来自@Allan Cameron 的数据,谢谢。

A shorter way could be:更短的方法可能是:

colnames(df) <- map(seq(ncol(df)), function(n) paste0('assign', n))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM