简体   繁体   中英

Parse comma-delimited string into vectors based on leading character

Given a string:

vals <- "-AB, CV, CL, -TS"

I would like to efficiently parse vals into two vectors (let's call them negative and positive ), one containing the values prefixed by - , and the others not. One catch is that I would also like to remove the - indicator.

Desired result:

> negative
[1] "AB" "TS"
> positive
[1] "CV" "CL"

Bonus points for a compact answer.

You can try:

s <- trimws(strsplit(vals, ",")[[1]])
negative <- s[grepl("^-", s)]
positive <- s[!grepl("^-", s)]

Alternatively you can use pure regex this way

library(stringr)
negative <- as.vector(str_match_all(vals, "-\\w+")[[1]])
positive <- as.vector(str_match_all(vals, "(?<!-)(?<=^|,| )\\w+")[[1]])

Try:

v <- trimws(strsplit(vals, ",")[[1]])

positive <- v[!startsWith(v, '-')]
negative <- substring(v[startsWith(v, '-')], 2)

Which outputs:

> negative
[1] "AB" "TS"
> positive
[1] "CV" "CL"

You may try to use grep with value = True option, also since your data has leading spaces, to remove them you may use trimws . I am using strsplit here with "," as a separator. Using zeallot library just to assign everything in one step.

library(zeallot)
c(negative, positive) %<-% list(grep("^-",trimws(strsplit(vals,",")[[1]]), value=T), grep("^[^-]",trimws(strsplit(vals,",")[[1]]), value=T))

Output :

#> negative
#[1] "-AB" "-TS"
#> positive
#[1] "CV" "CL"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM