I'm trying to extract the n
th word from strings and found several links that suggest a method that doesn't seem to work in R.
myString <- "HANS CHRISTIAN ANDERSON III"
str_extract(myString,'(?:\\S+ ){1}(\\S+)')
# [1] "HANS CHRISTIAN"
str_extract(myString,'(?:\\S+ ){2}(\\S+)')
# [1] "HANS CHRISTIAN ANDERSON"
As you can see, my commands are returning both the non-capturing and capturing group. What's the solution to get only the specific n
th word?
The Regex is right. It's because you didn't get the group 1 value, but instead, you turn all the caught string by Regex.
library(stringr)
r <- "(?:\\S+ ){1}(\\S+)"
s <- "HANS CHRISTIAN ANDERSON III"
str_match_all(s, r)
#[[1]]
# [,1] [,2]
#[1,] "HANS CHRISTIAN" "CHRISTIAN"
The negation of character classes is formed when the first character is "^", so this finds all non-space characters and the first space in the first capture class.
# second space delimited name
gsub( '^([^ ]+[ ])([^ ]+)([ ]+.+$)', "\\2", myString)
[1] "CHRISTIAN"
Another strategy, arguably less failure prone:
# easy to use a numberic value to pick from a scan-read:
scan(text=myString, what="")[2]
Read 4 items
[1] "CHRISTIAN"
I'm partial to strsplit
:
strsplit(myString, ' ')[[1]][2]
# [1] "CHRISTIAN"
paste(strsplit(myString, ' ')[[1]][1:2], collapse = ' ')
# [1] "HANS CHRISTIAN"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.