![](/img/trans.png)
[英]Is there a way to identify rows that match a condition several times across several columns in R?
[英]Identify repeatd characters across columns in r
在 RStudio 中,我在列中的不同组中有不同字符串的 df。 每列中大约有 600 个,我不确定某些字符是否在所有列/组中重复,或者仅在 2 或 3 列中重复。 我想知道是否有一种方法可以在每列中仅使用重复字符以及它们在哪些列/组中重复来创建新的 df。
例如我的 df 看起来像这样
Group1 Group2 Group3 Group4 Group5
AB FG SA KD CD
CD ZX AB ER ZX
ED QW OI SA AB
GD AS ZX QW KD
我不确定最终的 df 会是什么样子; 但我希望能够识别哪些字符在哪些组中重复,然后制作一个图形来显示该信息。 我希望这是有道理的。 或者我怎样才能找出哪些字符在两列中重复,然后在四列中重复三个,或者在所有 5 列中重复。 谢谢你。
library(tidyverse)
data <- tribble(
~Group1, ~Group2, ~Group3, ~Group4, ~Group5,
"AB", "FG", "SA", "KD", "CD",
"CD", "ZX", "AB", "ER", "ZX",
"ED", "QW", "OI", "SA", "AB",
"GD", "AS", "ZX", "QW", "KD"
)
repeated_values <-
data %>%
pivot_longer(everything()) %>%
group_by(value) %>%
count() %>%
filter(n >= 2) %>%
pull(value)
repeated_values
#> [1] "AB" "CD" "KD" "QW" "SA" "ZX"
# in which rows are which repeated characters?
repeated_data <-
data %>%
mutate(row_id = row_number()) %>%
pivot_longer(-row_id) %>%
filter(value %in% repeated_values)
repeated_data
#> # A tibble: 14 x 3
#> row_id name value
#> <int> <chr> <chr>
#> 1 1 Group1 AB
#> 2 1 Group3 SA
#> 3 1 Group4 KD
#> 4 1 Group5 CD
#> 5 2 Group1 CD
#> 6 2 Group2 ZX
#> 7 2 Group3 AB
#> 8 2 Group5 ZX
#> 9 3 Group2 QW
#> 10 3 Group4 SA
#> 11 3 Group5 AB
#> 12 4 Group3 ZX
#> 13 4 Group4 QW
#> 14 4 Group5 KD
# in how many rows are the repeated characters?
repeated_data %>%
distinct(row_id, value) %>%
count(value)
#> # A tibble: 6 x 2
#> value n
#> <chr> <int>
#> 1 AB 3
#> 2 CD 2
#> 3 KD 2
#> 4 QW 2
#> 5 SA 2
#> 6 ZX 2
由reprex 包(v2.0.1) 于 2021 年 11 月 11 日创建
以下是如何打印组的示例:
数据:
dat <- structure(list(Group1 = c("AB", "CD", "ED", "GD"), Group2 = c("FG",
"ZX", "QW", "AS"), Group3 = c("SA", "AB", "OI", "ZX"), Group4 = c("KD",
"ER", "SA", "QW"), Group5 = c("CD", "ZX", "AB", "KD")), class = "data.frame", row.names = c(NA,
-4L))
dat
Group1 Group2 Group3 Group4 Group5
1 AB FG SA KD CD
2 CD ZX AB ER ZX
3 ED QW OI SA AB
4 GD AS ZX QW KD
ta <- table(as.matrix(dat))
# all character strings
ta
AB AS CD ED ER FG GD KD OI QW SA ZX
3 1 2 1 1 1 1 2 1 2 2 3
# only repeated
ta[ta > 1]
AB CD KD QW SA ZX
3 2 2 2 2 3
sapply( names(table(as.matrix(dat))[table(as.matrix(dat)) > 1]),
function(x) colnames(dat[grep(x, dat)]) )
$AB
[1] "Group1" "Group3" "Group5"
$CD
[1] "Group1" "Group5"
$KD
[1] "Group4" "Group5"
$QW
[1] "Group2" "Group4"
$SA
[1] "Group3" "Group4"
$ZX
[1] "Group2" "Group3" "Group5"
sapply( names(table(as.matrix(dat))[table(as.matrix(dat)) > 1]),
function(x) dat[grep(x, dat)] )
$AB
Group1 Group3 Group5
1 AB SA CD
2 CD AB ZX
3 ED OI AB
4 GD ZX KD
... etc
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.