I want to apply strsplit in such a fashion that if there exist an identical pair of values with & (eg this is one pair with & (NINA & SAM)
) and also with | (eg this is another pair but with | (NINA | SAM)
) then keep the one with &
Below are 2 possible cases, and the length of these vectors (vec1, vec2) might vary among actual cases.
Case 1
> vec1
[1] "((PAUL & SAM) | (PAUL | SAM) | (NINA & SAM) | (NINA | SAM) | (NINA & PAUL) | (NINA | PAUL))"
> vec2
[1] "((PAUL | SAM) & (PAUL & SAM) & (NINA | SAM) & (NINA | PAUL) & (NINA & PAUL) & (NINA & SAM))"
Case 2
> vec1
[1] "((PAUL | SAM) | (PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
> vec2
[1] "((PAUL | SAM) & (PAUL & SAM) & (!NINA & SAM) & (!NINA & PAUL))"
This should be the outputs:
Case 1
> vec1
[1] "((PAUL & SAM) | (NINA & SAM) | (NINA & PAUL))"
> vec2
[1] "((PAUL & SAM) & (NINA & PAUL) & (NINA & SAM))"
Case 2
> vec1
[1] "((PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
> vec2
[1] "((PAUL & SAM) & (!NINA & SAM) & (!NINA & PAUL))"
What I have tried so far:
My idea was to first remove the ((
and ))
from the start and end of the vector then split the vec1 on " ) | (
" and vec2 on " ) & (
" . Then further split the indices on space*space
and check if sub index 1 and 2 matches with any other sub-index, if yes then keep the one which has &
. Then put everything back together. I have a limited knowledge of R and I was unable to implement what I have in my mind. Any help will be much appreciated!
I believe the following does what you want.
It is not very pretty but the outputs are correct.
keepAmpersand <- function(x) {
y <- sub("\\(\\(", "(", x) # get rid of the double
y <- sub("\\)\\)", ")", y) # parenthesis
# this regex is meant to replace either a '|' or a '&'
# with the same character between '#' (one '#' on each side)
y <- gsub("(\\) \\| \\(|\\) & \\()", ")#\\1#(", y)
# now use that special pattern, '# five chars #' to split
y <- unlist(strsplit(y, "#.{5}#"))
# keep the ones with the ampersand or with just one name
y <- grep("&|\\([[:alpha:]]+\\)", y, value = TRUE)
y <- paste0("(", paste(y, collapse = " | "), ")") # reassemble
y
}
Now apply the function to each of the cases.
Case 1
vec1 <-
"((PAUL & SAM) | (PAUL | SAM) | (NINA & SAM) | (NINA | SAM) | (NINA & PAUL) | (NINA | PAUL))"
vec2 <-
"((PAUL | SAM) & (PAUL & SAM) & (NINA | SAM) & (NINA | PAUL) & (NINA & PAUL) & (NINA & SAM))"
keepAmpersand(vec1)
#[1] "((PAUL & SAM) | (NINA & SAM) | (NINA & PAUL))"
keepAmpersand(vec2)
#[1] "((PAUL & SAM) | (NINA & PAUL) | (NINA & SAM))"
Case 2
vec1 <-
"((PAUL | SAM) | (PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
vec2 <-
"((PAUL | SAM) & (PAUL & SAM) & (!NINA & SAM) & (!NINA & PAUL))"
keepAmpersand(vec1)
#[1] "((PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
keepAmpersand(vec2)
#[1] "((PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
Case 3: case when there is just one name between parenthesis.
vec3 <-
"((PAUL | SAM) & (PAUL & SAM) & (NINA | SAM) & (NINA | PAUL) & (NINA & PAUL) & (NINA))"
keepAmpersand(vec3)
#[1] "((PAUL & SAM) | (NINA & PAUL) | (NINA))"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.