简体   繁体   English

如何找到R中向量中每个元素的文本字符串的匹配位置?

[英]How to find the match position of a text string for each element in a vector in R?

I have a vector of text characters, say month.name:我有一个文本字符向量,比如month.name:

> month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"      "July"     
 [8] "August"    "September" "October"   "November"  "December" 

What R function should I use to find the position of "ber" such that it returns a numeric vector in the form of c(-1,-1,-1,-1,-1,-1,-1,-1,7,5,6,6), ie, -1 for no match and 5 for the fifth character?我应该使用什么 R 函数来查找“ber”的位置,以便它以 c(-1,-1,-1,-1,-1,-1,-1,-1 的形式返回一个数字向量,7,5,6,6),即 -1 表示不匹配,5 表示第五个字符?

You could use stringr::str_locate .您可以使用stringr::str_locate It returns a matrix:它返回一个矩阵:

library(stringr)
str_locate(month.name, "ber")

      start end
 [1,]    NA  NA
 [2,]    NA  NA
 [3,]    NA  NA
 [4,]    NA  NA
 [5,]    NA  NA
 [6,]    NA  NA
 [7,]    NA  NA
 [8,]    NA  NA
 [9,]     7   9
[10,]     5   7
[11,]     6   8
[12,]     6   8

So str_locate(month.name, "ber")[, 'start'] returns a vector:所以str_locate(month.name, "ber")[, 'start']返回一个向量:

 [1] NA NA NA NA NA NA NA NA  7  5  6  6

Personally I think NA is a better choice for "no match" than -1.我个人认为 NA 是比 -1 更好的“不匹配”选择。 You could always substitute -1 later if you really want to do so.如果您真的想这样做,您可以随时替换 -1。 For example:例如:

pos <- str_locate(month.name, "ber")[, 'start']
ifelse(is.na(pos), -1, pos)

 [1] -1 -1 -1 -1 -1 -1 -1 -1  7  5  6  6

This is the exact output of ?regexpr (along with some other helpful attributes):这是?regexpr的确切输出(以及其他一些有用的属性):

regexpr("ber", month.name)
# [1] -1 -1 -1 -1 -1 -1 -1 -1  7  5  6  6
#attr(,"match.length")
# [1] -1 -1 -1 -1 -1 -1 -1 -1  3  3  3  3
#attr(,"index.type")
#[1] "chars"
#attr(,"useBytes")
#[1] TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM