简体   繁体   English

在 R 中查找以字符 position 开头的整个单词

[英]Find whole word that starts at character position in R

Str <- "I love chocolate pudding"
pos <- 8

I need to return the word that starts with the letter c at pos 8, which is chocolate.我需要在 pos 8 返回以字母 c 开头的单词,即巧克力。 How can I do that?我怎样才能做到这一点?

You can use substring to get everything after 8th character.您可以使用substring来获取第 8 个字符之后的所有内容。 Then remove everything after space using gsub :然后使用gsub删除空格后的所有内容:

gsub(" .*", "", substring(Str, pos))

In case you need to check for the "c":如果您需要检查“c”:

Str <- "I love dogs"

ifelse(
  substr(Str, pos, pos) == "c",
  gsub(" .*", "", substring(Str, pos)),
  ""
)
library(stringr)
str_extract(Str, "(?<=[\\w\\s]{7})\\bc\\w+\\b")
[1] "chocolate"

This solution uses str_extract and positive lookbehind (?<=[\\w\\s]{7}) , which can be glossed along these lines: "if you see seven characters consisting of alphanumeric characters ( \\w ) or white space ( \\s ) to the left, match the immediately next 'word' identified by its boundaries to either side ( \\b ) as well as the letter c occurring as the first letter of the word.此解决方案使用str_extract和正向后视(?<=[\\w\\s]{7}) ,可以按照以下方式进行修饰:“如果您看到由字母数字字符 ( \\w ) 或空格组成的七个字符( \\s ) 在左侧,将由其边界标识的紧接下一个“单词”与任一侧 ( \\b ) 以及作为单词的第一个字母出现的字母c

Alternatively, use sub and backreference:或者,使用sub和反向引用:

sub(".{7}(\\bc\\w+\\b).*", "\\1", Str)
[1] "chocolate"

Using stringr and ignoring the 'starting with 'c' condition:使用stringr并忽略 'starting with 'c' 条件:

Str %>%
  str_sub(pos) %>%
  word(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM