简体   繁体   中英

Splitting text with strsplit regular expression

Actually, I want to extract "b" , "d" & "f" from "abcdef" . I am doing this:

strsplit("abcdef", "[ace]")

but it returns an extra "" . Like:

"" "b" "d" "f"

What to do? What should I change in this expression? And please explain how your solution works... I have tried str_extract though. It worked. But I want to know why this isn't working with strsplit .

在此处输入图片说明

When you split a string, the items you get in the result are all the parts of the string that appear in between matches, even with empty strings if the match turns out to be at the start/end of the string. See the string you have where - marks an empty location with the matches:

-a-b-c-d-e-f-
1| 2 | 3 | 4

As the last match is before the f , the trailing empty string is not included, but if you include f into the character set, you will get an empty trailing element:

strsplit("abcdef", "[acef]")
## => [1] ""  "b" "d" "" 

More, if your matches appear to be adjoining, you will also get empty elements:

strsplit("abcdef", "[abc]")
## => [1] ""    ""    ""    "def"

So, whenever you split a string with regex, you will almost always get empty strings.

You may actually match your strings with an "inverted" pattern:

x <- "abcdef"
regmatches(x, gregexpr("[^ace]+", x))
## => [1] "b" "d" "f"

See the R demo . Or, you may remove the empty items after the matches are found (see Rui Barradas answer ).

A non-regex solution would be to split every character in the string and get the characters which are not "a", "c" or "e" using setdiff .

setdiff(strsplit("abcdef", "")[[1]], c("a", "c", "e"))
#[1] "b" "d" "f"

A possibility is to remove the empty string a posteriori , after the split.
Assign the result of strsplit to a variable, then subset it with a logical vector.

res <- strsplit("abcdef", "[ace]")[[1]]
res[sapply(res, `!=`, "")]
#[1] "b" "d" "f"

Or even simpler, thanks to @snoram,

res[sapply(res, nzchar)]
[1] "b" "d" "f"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM