简体   繁体   中英

R strsplit and filtering based on sub-indices matching in the logical vectors

I want to apply strsplit in such a fashion that if there exist an identical pair of values with & (eg this is one pair with & (NINA & SAM) ) and also with | (eg this is another pair but with | (NINA | SAM) ) then keep the one with &

Below are 2 possible cases, and the length of these vectors (vec1, vec2) might vary among actual cases.

Case 1

> vec1
[1] "((PAUL & SAM) | (PAUL | SAM) | (NINA & SAM) | (NINA | SAM) | (NINA & PAUL) | (NINA | PAUL))"
> vec2
[1] "((PAUL | SAM) & (PAUL & SAM) & (NINA | SAM) & (NINA | PAUL) & (NINA & PAUL) & (NINA & SAM))"

Case 2

> vec1
[1] "((PAUL | SAM) | (PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
> vec2
[1] "((PAUL | SAM) & (PAUL & SAM) & (!NINA & SAM) & (!NINA & PAUL))"

This should be the outputs:

Case 1

> vec1
[1] "((PAUL & SAM) | (NINA & SAM) | (NINA & PAUL))"
> vec2
[1] "((PAUL & SAM) & (NINA & PAUL) & (NINA & SAM))"

Case 2

> vec1
[1] "((PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
> vec2
[1] "((PAUL & SAM) & (!NINA & SAM) & (!NINA & PAUL))"

What I have tried so far:

My idea was to first remove the (( and )) from the start and end of the vector then split the vec1 on " ) | ( " and vec2 on " ) & ( " . Then further split the indices on space*space and check if sub index 1 and 2 matches with any other sub-index, if yes then keep the one which has & . Then put everything back together. I have a limited knowledge of R and I was unable to implement what I have in my mind. Any help will be much appreciated!

I believe the following does what you want.
It is not very pretty but the outputs are correct.

keepAmpersand <- function(x) {
    y <- sub("\\(\\(", "(", x)  # get rid of the double
    y <- sub("\\)\\)", ")", y)  # parenthesis
    # this regex is meant to replace either a '|' or a '&'

    # with the same character between '#' (one '#' on each side)  
    y <- gsub("(\\) \\| \\(|\\) & \\()", ")#\\1#(", y)

    # now use that special pattern, '# five chars #' to split
    y <- unlist(strsplit(y, "#.{5}#"))

    # keep the ones with the ampersand or with just one name
    y <- grep("&|\\([[:alpha:]]+\\)", y, value = TRUE)
    y <- paste0("(", paste(y, collapse = " | "), ")")    # reassemble
    y
}

Now apply the function to each of the cases.

Case 1

vec1 <-
"((PAUL & SAM) | (PAUL | SAM) | (NINA & SAM) | (NINA | SAM) | (NINA & PAUL) | (NINA | PAUL))"
vec2 <-
"((PAUL | SAM) & (PAUL & SAM) & (NINA | SAM) & (NINA | PAUL) & (NINA & PAUL) & (NINA & SAM))"


keepAmpersand(vec1)
#[1] "((PAUL & SAM) | (NINA & SAM) | (NINA & PAUL))"

keepAmpersand(vec2)
#[1] "((PAUL & SAM) | (NINA & PAUL) | (NINA & SAM))"

Case 2

vec1 <-
"((PAUL | SAM) | (PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"
vec2 <- 
"((PAUL | SAM) & (PAUL & SAM) & (!NINA & SAM) & (!NINA & PAUL))"


keepAmpersand(vec1)
#[1] "((PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"

keepAmpersand(vec2)
#[1] "((PAUL & SAM) | (!NINA & SAM) | (!NINA & PAUL))"

Case 3: case when there is just one name between parenthesis.

vec3 <-
"((PAUL | SAM) & (PAUL & SAM) & (NINA | SAM) & (NINA | PAUL) & (NINA & PAUL) & (NINA))"

keepAmpersand(vec3)
#[1] "((PAUL & SAM) | (NINA & PAUL) | (NINA))"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM