[英]Recode multiple columns to numbers increasingly in R
I have 50 columns of names, but here I have presented only 4 columns for convenience.我有 50 列名称,但为了方便,这里我只显示了 4 列。
Name1 Name2 Name3 Name4
Rose,Ali Van,Hall Ghol,Dam Murr,kate
Camp,Laura Ka,Klo Dan,Dan Ali,Hoss
Rose,Ali Van,Hall Ghol,Dam Kol,Kan
Murr,Kate Ismal, Ismal Sian,Rozi Nas,Ami
Ghol,Dam Ka,Klo Rose,Ali Nor,Ko
Murr,Kate Ismal, Ismal Dan,Dan Nas,Ami
I want to assign numbers to each person based on the columns, a sequence of numbers.我想根据列(一系列数字)为每个人分配数字。
For example, in Name 1, we get the numbers from 1-4.例如,在 Name 1 中,我们获取 1-4 中的数字。 The repeated names will get the same numbers.重复的名字会得到相同的数字。
In Name 2, it should be started from 5 and so on.在名称 2 中,它应该从 5 开始,依此类推。 This will give me the following table:这将给我下表:
Assign1 Assian2 Assian3 Assian4
1 5 8 12
2 6 9 13
1 5 8 14
3 7 10 15
4 6 11 17
3 7 9 15
I would like to have it without a loop, ie, sapply
,ie, sapply(dat, function(x) match(x, unique(x)))
.我希望它没有循环,即sapply
,即sapply(dat, function(x) match(x, unique(x)))
。
Using dplyr or tidyverse would be great.使用 dplyr 或 tidyverse 会很棒。
A tidyverse
solution with purrr::accumulate()
:使用purrr::accumulate()
的tidyverse
解决方案:
library(tidyverse)
df %>%
mutate(as_tibble(
accumulate(across(Name1:Name4, ~ match(.x, unique(.x))), ~ .y + max(.x))
))
# Name1 Name2 Name3 Name4
# 1 1 5 8 12
# 2 2 6 9 13
# 3 1 5 8 14
# 4 3 7 10 15
# 5 4 6 11 16
# 6 3 7 9 15
Because the values in each column depend on the values in the previous column, the calculations have to be done sequentially.由于每列中的值取决于前一列中的值,因此必须按顺序进行计算。 This is probably most succinctly achieved by a loop.这可能是通过循环最简洁地实现的。 Remember that lapply
and sapply
are simply loops-in-disguise, and won't be quicker than an explicit loop.请记住, lapply
和sapply
只是变相循环,不会比显式循环更快。
Note that your expected output has a mistake in it (there is a number 17 which should be 16)请注意,您预期的 output 中有一个错误(数字 17 应该是 16)
output <- setNames(df, paste0('Assign', seq_along(df)))
for(i in seq_along(output)) {
output[[i]] <- match(output[[i]], unique(output[[i]]))
if(i > 1) output[[i]] <- output[[i]] + max(output[[i - 1]])
}
output
#> Assign1 Assign2 Assign3 Assign4
#> 1 1 5 8 12
#> 2 2 6 9 13
#> 3 1 5 8 14
#> 4 3 7 10 15
#> 5 4 6 11 16
#> 6 3 7 9 15
Edit编辑
If you really want it without an explicit loop, you can do:如果你真的想要它而没有显式循环,你可以这样做:
res <- sapply(seq_along(df), \(i) match(df[[i]], unique(df[[i]])))
res + t(replicate(nrow(df), head(c(0, cumsum(apply(res, 2, max))), -1))) |>
as.data.frame() |>
setNames(paste0('Assign', seq_along(df)))
#> Assign1 Assign2 Assign3 Assign4
#> 1 1 5 8 12
#> 2 2 6 9 13
#> 3 1 5 8 14
#> 4 3 7 10 15
#> 5 4 6 11 16
#> 6 3 7 9 15
Created on 2023-01-13 with reprex v2.0.2创建于 2023-01-13,使用reprex v2.0.2
Data taken from question in reproducible format以可复制格式从问题中获取的数据
df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali",
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall",
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi",
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan",
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L),
class = "data.frame")
Here is a tidyverse
approach:这是一个tidyverse
方法:
First paste
the column name after each of the strings in all your columns, for sorting purpose later.首先将列名paste
在所有列中的每个字符串之后,以便稍后进行排序。 Then pivot
it into a two-column df so that we can assign ID to them by match
.然后将pivot
放入两列 df 中,以便我们可以通过match
为它们分配 ID。 Finally pivot
it back to a wide format and unnest the list columns.最后pivot
它回到宽格式并取消嵌套列表列。
library(tidyverse)
df %>%
mutate(across(everything(), ~ paste0(.x, "_", cur_column()))) %>%
pivot_longer(everything(), names_to = "ab", values_to = "a") %>%
arrange(ab) %>%
mutate(b = match(a, unique(a)), .keep = "unused") %>%
pivot_wider(names_from = "ab", values_from = "b") %>%
unnest(everything())
# A tibble: 6 × 4
Name1 Name2 Name3 Name4
<int> <int> <int> <int>
1 1 5 8 12
2 2 6 9 13
3 1 5 8 14
4 3 7 10 15
5 4 6 11 16
6 3 7 9 15
Taken from @Allan Cameron.取自@Allan Cameron。
df <- structure(list(Name1 = c("Rose,Ali", "Camp,Laura", "Rose,Ali",
"Murr,Kate", "Ghol,Dam", "Murr,Kate"), Name2 = c("Van,Hall",
"Ka,Klo", "Van,Hall", "Ismal, Ismal", "Ka,Klo", "Ismal, Ismal"
), Name3 = c("Ghol,Dam", "Dan,Dan", "Ghol,Dam", "Sian,Rozi",
"Rose,Ali", "Dan,Dan"), Name4 = c("Murr,kate", "Ali,Hoss", "Kol,Kan",
"Nas,Ami", "Nor,Ko", "Nas,Ami")), row.names = c(NA, -6L),
class = "data.frame")
Update: The approach below is not ideal because ID's are not unique.更新:下面的方法并不理想,因为 ID 不是唯一的。 Sorry.对不起。
Using a lookup table with tidyverse
:使用带有tidyverse
的查找表:
library(dplyr)
library(tidyr)
lookup <-
df |>
pivot_longer(everything()) |>
distinct() |>
arrange(name) |>
transmute(name = value, value = row_number()) |>
deframe()
df |>
mutate(across(everything(), ~ recode(., !!!lookup)))
Output: Output:
Name1 Name2 Name3 Name4
1 1 5 4 12
2 2 6 9 13
3 1 5 4 14
4 3 7 10 15
5 4 6 1 16
6 3 7 9 15
Data from @Allan Cameron, thanks.来自@Allan Cameron 的数据,谢谢。
A shorter way could be:更短的方法可能是:
colnames(df) <- map(seq(ncol(df)), function(n) paste0('assign', n))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.