I have the following R list called codes
:
>codes
$`1`
[1] "000" "000" "111" "000" "100" "000" "100" "000" "100" "100"
$`2`
[1] "000" "001" "110" "000" "000" "000" "000" "000" "000" "000"
$`3`
[1] "000" "010" "100" "001" "001" "000" "000" "000" "001" "001"
$`4`
[1] "000" "100" "000" "011" "011" "000" "000" "000" "011" "011"
What I am trying to do is implement a way to check, across the entire list, if each position of the strings in that list element is variable.
For example, for the first vector [1], it will check only the first number/character for each string item and identify if they all start with 0 or 1 (so non variable) or least one is different. Then it will look at the second character in each string and do the same. And so on for the third position.
My intuition is to use some combination of lapply(codes, strsplit, split = "")
to get individual characters split apart along with length(unique(x))) != 1)
to get it to spit out a TRUE/FALSE if any of the positions are variable or not, but I'm not sure of an efficient way to implement this. Ultimately, the dataset will be much larger with codes up 10 characters in length and I want to ensure that every position for each three letter code in the vector is variable and apply that to each vector element in the list.
I'd appreciate any help or suggestions.
Here is one approach based on string splitting:
l <- list(rep.int(strrep("0", 6L), 8L),
rep.int(strrep(c("10", "01"), 3L), 4L),
rep.int(strrep(c("10", "100"), c(3L, 2L)), c(4L, 4L)))
l
## [[1]]
## [1] "000000" "000000" "000000" "000000"
## [5] "000000" "000000" "000000" "000000"
##
## [[2]]
## [1] "101010" "010101" "101010" "010101"
## [5] "101010" "010101" "101010" "010101"
##
## [[3]]
## [1] "101010" "101010" "101010" "101010"
## [5] "100100" "100100" "100100" "100100"
f <- function(l) {
m <- nchar(l[[1L]][1L])
n <- length(l)
f0 <- function(x) {
matrix(unlist(strsplit(x, ""), FALSE, FALSE), m)
}
X <- do.call(rbind, lapply(l, f0))
matrix(matrixStats::rowAnys(X != X[, 1L]), n, byrow = TRUE)
}
f(l)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE TRUE TRUE TRUE TRUE TRUE
## [3,] FALSE FALSE TRUE TRUE TRUE FALSE
If your codes can be read as decimal numbers less than or equal to .Machine$integer.max
, then you can optimize by replacing string splitting with integer arithmetic:
g <- function(l) {
m <- length(l)
n <- length(l[[1L]])
N <- nchar(l[[1L]][1L])
X <- matrix(as.integer(unlist(l, FALSE, FALSE)), m, n, byrow = TRUE)
g0 <- function(pow) {
Y <- X %/% pow
X <<- X - pow * Y
matrixStats::rowAnys(Y != Y[, 1L])
}
pow <- as.integer(10^((N - 1L):0))
matrix(unlist(lapply(pow, g0), FALSE, FALSE), m, N)
}
g(l)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE TRUE TRUE TRUE TRUE TRUE
## [3,] FALSE FALSE TRUE TRUE TRUE FALSE
If your codes are actually binary, then you can optimize slightly more and dispense with matrixStats
:
h <- function(l) {
m <- length(l[[1L]])
n <- length(l)
N <- nchar(l[[1L]][1L])
X <- matrix(as.integer(unlist(l, FALSE, FALSE)), m, n)
h0 <- function(p) {
Y <- X %/% p
X <<- X - p * Y
.colSums(Y, m, n) %% m > 0L
}
pow <- as.integer(10^((N - 1L):0))
matrix(unlist(lapply(pow, h0), FALSE, FALSE), n, N)
}
h(l)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE TRUE TRUE TRUE TRUE TRUE
## [3,] FALSE FALSE TRUE TRUE TRUE FALSE
Here is a benchmark on a length-10000 list of length-8 character vectors of 6-digit binary codes.
ll <- rep_len(l, 1e+04L)
microbenchmark::microbenchmark(f(ll), g(ll), h(ll))
## Unit: milliseconds
## expr min lq mean median uq max neval
## f(ll) 41.583143 55.960510 66.201555 64.211679 73.542807 127.47810 100
## g(ll) 8.612173 8.856123 9.725214 8.946077 9.116391 46.66698 100
## h(ll) 7.622679 7.824789 8.717184 7.887519 7.987128 46.32225 100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.