I have a vector where each element is a string. I only want to keep the part of the string right before the '==' regardless of whether it is at the beginning of the string, after the & symbol, or after the |symbol. Here is my data:
data <- c("name=='John'", "name=='David'&age=='50'|job=='Doctor'&city=='Liverpool'",
"job=='engineer'&name=='Andrew'",
"city=='Manchester'", "age=='40'&city=='London'"
)
My ideal format would be something like this:
[1] "name"
[2] "name" "age" "job" "city"
[3] "job" "name"
[4] "city"
[5] "age" "city"
The closest I have got is using genXtract from the qdap library, which puts the data in the format above, but I only know how to use it with one condition, ie
qdap::genXtract(data, "&", "==")
But I don't just want the part of the string between & and == but also between | and == or the beginning of the string and ==
What this regex does, is capture all a-zA-Z0-9 (=letters and numbers) before an occurence of ==
.
stringr::str_extract_all( data, "[0-9a-zA-Z]+(?=(==))")
[[1]]
[1] "name"
[[2]]
[1] "name" "age" "job" "city"
[[3]]
[1] "job" "name"
[[4]]
[1] "city"
[[5]]
[1] "age" "city"
if you want the output as a vector, use
L <- stringr::str_extract_all( data, "[0-9a-zA-Z]+(?=(==))" )
unlist( lapply( L, paste, collapse = " " ) )
results in
[1] "name"
[2] "name age job city"
[3] "job name"
[4] "city"
[5] "age city"
In base R
, this can be done with regmatches/gregexpr
lst1 <- regmatches(data, gregexpr("\\w+(?=\\={2})", data, perl = TRUE))
sapply(lst1, paste, collapse = " ")
#[1] "name"
#[2] "name age job city"
#[3] "job name"
#[4] "city"
#[5] "age city"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.