简体   繁体   中英

Subsetting a string based on multiple conditions

I have a vector where each element is a string. I only want to keep the part of the string right before the '==' regardless of whether it is at the beginning of the string, after the & symbol, or after the |symbol. Here is my data:

data <- c("name=='John'", "name=='David'&age=='50'|job=='Doctor'&city=='Liverpool'", 
"job=='engineer'&name=='Andrew'", 
"city=='Manchester'", "age=='40'&city=='London'"
)

My ideal format would be something like this:

[1] "name"
[2] "name" "age" "job" "city"
[3] "job" "name"
[4] "city" 
[5] "age" "city"

The closest I have got is using genXtract from the qdap library, which puts the data in the format above, but I only know how to use it with one condition, ie

qdap::genXtract(data, "&", "==")

But I don't just want the part of the string between & and == but also between | and == or the beginning of the string and ==

What this regex does, is capture all a-zA-Z0-9 (=letters and numbers) before an occurence of == .

stringr::str_extract_all( data, "[0-9a-zA-Z]+(?=(==))")

[[1]]
[1] "name"
[[2]]
[1] "name" "age"  "job"  "city"
[[3]]
[1] "job"  "name"
[[4]]
[1] "city"
[[5]]
[1] "age"  "city"

if you want the output as a vector, use

L <- stringr::str_extract_all( data, "[0-9a-zA-Z]+(?=(==))" )
unlist( lapply( L, paste, collapse = " " ) )

results in

[1] "name"             
[2] "name age job city"
[3] "job name"         
[4] "city"             
[5] "age city"  

In base R , this can be done with regmatches/gregexpr

lst1 <- regmatches(data, gregexpr("\\w+(?=\\={2})", data, perl = TRUE))
sapply(lst1, paste, collapse = " ")
#[1] "name"     
#[2] "name age job city" 
#[3] "job name"       
#[4]  "city"      
#[5]  "age city"      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM