简体   繁体   English

识别 r 中跨列的重复字符

[英]Identify repeatd characters across columns in r

In RStudio, I have a df of different character strings across different groups in columns.在 RStudio 中,我在列中的不同组中有不同字符串的 df。 there are about 600 in each column and I am not sure if certain characters are repeated across all the columns/groups or just 2 or 3 columns.每列中大约有 600 个,我不确定某些字符是否在所有列/组中重复,或者仅在 2 或 3 列中重复。 I was wondering if there is a way to make a new df with just the repeated character in each column, and in which column/groups they repeat in.我想知道是否有一种方法可以在每列中仅使用重复字符以及它们在哪些列/组中重复来创建新的 df。

For example my df looks like this例如我的 df 看起来像这样

Group1 Group2 Group3 Group4 Group5
AB      FG    SA     KD      CD
CD      ZX    AB     ER      ZX 
ED      QW    OI     SA      AB
GD      AS    ZX     QW      KD 

Im not sure what the final df would look like;我不确定最终的 df 会是什么样子; but I want to be able to identify which characters are repeated in which groups, and then make a figure to display that information.但我希望能够识别哪些字符在哪些组中重复,然后制作一个图形来显示该信息。 I hope that makes sense.我希望这是有道理的。 Or how can I pick out which characters are repeated in two columns, then three the four columns, or in all 5 columns.或者我怎样才能找出哪些字符在两列中重复,然后在四列中重复三个,或者在所有 5 列中重复。 Thank you.谢谢你。

library(tidyverse)

data <- tribble(
  ~Group1, ~Group2, ~Group3, ~Group4, ~Group5,
  "AB", "FG", "SA", "KD", "CD",
  "CD", "ZX", "AB", "ER", "ZX",
  "ED", "QW", "OI", "SA", "AB",
  "GD", "AS", "ZX", "QW", "KD"
)


repeated_values <-
  data %>%
  pivot_longer(everything()) %>%
  group_by(value) %>%
  count() %>%
  filter(n >= 2) %>%
  pull(value)
repeated_values
#> [1] "AB" "CD" "KD" "QW" "SA" "ZX"

# in which rows are which repeated characters?
repeated_data <-
  data %>%
  mutate(row_id = row_number()) %>%
  pivot_longer(-row_id) %>%
  filter(value %in% repeated_values)
repeated_data
#> # A tibble: 14 x 3
#>    row_id name   value
#>     <int> <chr>  <chr>
#>  1      1 Group1 AB   
#>  2      1 Group3 SA   
#>  3      1 Group4 KD   
#>  4      1 Group5 CD   
#>  5      2 Group1 CD   
#>  6      2 Group2 ZX   
#>  7      2 Group3 AB   
#>  8      2 Group5 ZX   
#>  9      3 Group2 QW   
#> 10      3 Group4 SA   
#> 11      3 Group5 AB   
#> 12      4 Group3 ZX   
#> 13      4 Group4 QW   
#> 14      4 Group5 KD

# in how many rows are the repeated characters?
repeated_data %>%
  distinct(row_id, value) %>%
  count(value)
#> # A tibble: 6 x 2
#>   value     n
#>   <chr> <int>
#> 1 AB        3
#> 2 CD        2
#> 3 KD        2
#> 4 QW        2
#> 5 SA        2
#> 6 ZX        2

Created on 2021-11-11 by the reprex package (v2.0.1)reprex 包(v2.0.1) 于 2021 年 11 月 11 日创建

Here is an example of how to print out the Groups:以下是如何打印组的示例:

Data:数据:

dat <- structure(list(Group1 = c("AB", "CD", "ED", "GD"), Group2 = c("FG", 
"ZX", "QW", "AS"), Group3 = c("SA", "AB", "OI", "ZX"), Group4 = c("KD", 
"ER", "SA", "QW"), Group5 = c("CD", "ZX", "AB", "KD")), class = "data.frame", row.names = c(NA, 
-4L))

dat
  Group1 Group2 Group3 Group4 Group5
1     AB     FG     SA     KD     CD
2     CD     ZX     AB     ER     ZX
3     ED     QW     OI     SA     AB
4     GD     AS     ZX     QW     KD
  • Get the number of repeats:获取重复次数:
ta <- table(as.matrix(dat))

# all character strings
ta
AB AS CD ED ER FG GD KD OI QW SA ZX 
 3  1  2  1  1  1  1  2  1  2  2  3 

# only repeated
ta[ta > 1]
AB CD KD QW SA ZX 
 3  2  2  2  2  3 
  • Populate a list of character vectors to get the groups:填充字符向量列表以获取组:
sapply( names(table(as.matrix(dat))[table(as.matrix(dat)) > 1]),
  function(x) colnames(dat[grep(x, dat)]) )
$AB
[1] "Group1" "Group3" "Group5"
$CD
[1] "Group1" "Group5"
$KD
[1] "Group4" "Group5"
$QW
[1] "Group2" "Group4"
$SA
[1] "Group3" "Group4"
$ZX
[1] "Group2" "Group3" "Group5"
  • Also add the columns that match if you wish:如果您愿意,还可以添加匹配的列:
sapply( names(table(as.matrix(dat))[table(as.matrix(dat)) > 1]),
  function(x) dat[grep(x, dat)] )
$AB
  Group1 Group3 Group5
1     AB     SA     CD
2     CD     AB     ZX
3     ED     OI     AB
4     GD     ZX     KD
... etc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM